Fast copying of files over the network

Wed 08 May 2013

Filed under Blog

Tags unix tips meteo

A common requirement when dealing with processing large datasets over multiple networked machines is to have a local staging space: copy the data on the local disk to improve access speed and not to bog down a NFS server when all different processes start accessing different files all at once in the server. In most cases, such insight is typically achieved post-facto, so you end up needing to copy all files across to your local staging directories, resulting in pretty much collapsing all NFS traffic and servers.

If your network allows this, using netcat can be a good and fast option for copying the files across. If you are on a trusted network, there's no need to encrypt traffic, so netcat is appropriate (it's just like cat, but over the network). First, on the target machine, go to the directory where you want to copy things, and set up a listener

nc -l 7000 | gunzip -v| tar -xvf -

In the previous example, I'm placing the listener on port 7000 (other ports could be used). Note that the output is being piped through gunzip into tar: we expect that the data will be compressed. On the server, we can issue the command that packs the files we want to transfer and sends them using netcat

tar c my_directory/ |gzip --fast -v | nc target 7000

In here, we tar and gzip (using the fastest compression to minimise overhead), and pipe stuff into netcat. netcat will output that stuff to a computer named target on port 7000, which is where our listener is waiting.


Comments


EO & DA Ramblings © J Gomez-Dans Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More