Many would be surprised to hear that several common GNU/Linux programs don't handle symlinks properly. By that, of course I mean that they don't handle them the way I would want them to, but close enough. For instance, if you want to copy a directory from one server to another, the command
scp -r source-dir target-dirlooks very attractive. Unfortunately
scpfollows symlinks, meaning instead of copying a link to some other part of the file system, it instead copies that other part of the file system. For a heavily symlinked directory this can be disastrous.
The correct and fool proof way to grab a portion of a file system from a server is to use
tar. Don't worry, this doesn't mean you have to actually create a
tarfile, you can use
tarto pipe the output over
sshand untar it on the other side.
tar -c some-files some-dirs \ | ssh -C my-server "tar -C path/to/extract/root -x"
If you want to download from a server…
ssh -C my-server \ "tar -C path/to/archive/root -c some-files some-dirs" | tar -x
tartells it to change directories prior to performing the operation. The
sshtells it to compress the traffic with gzip like compression. You can even use a better compression if you have a slower connection to the server or a pay by the bit plan, by including
p7zipin the pipe, or just passing a
-jswitch to both
tarcommands. By the way,
p7zipalso treats symlinks badly and you need to protect any hierarchy with a tar archive.
In case you are wondering why
scpdefaults to bad behavior, well all file systems aren't created equal. Since you are copying to a server and who knows what file system they have (for instance, it could be FAT), you might not be able to create symlinks there. So it is an alright decision to only copy files and not links to files. If you only deal with, hmmm, how to put it, modern file systems, this sure seems like incorrect behavior. Maybe someday this will change, but in the mean time, the tar method works great and has been the method of choice since
tar, pipes, and networks existed.
But wait, there's more. Even if you don't have symlinks, piping a tar archive over
sshmight be a good idea. Since
scpoperates on individual files, it incurs an overhead on each one. If you have many small files you want to transfer, small enough that the actual transfer time is almost insignificant, this overhead can become quite costly. In these cases the tar method will be faster.
smithzv@ciabatta:~$ ssh scandal "ls -R kappa-slices-3d | wc" 3993 3958 117715 smithzv@ciabatta:~$ ssh scandal "du -sh kappa-slices-3d" 36M kappa-slices-3d smithzv@ciabatta:~$ time scp -qr scandal:./kappa-slices-3d dat/ real 0m8.004s user 0m1.152s sys 0m1.184s smithzv@ciabatta:~$ time ssh scandal "tar -c kappa-slices-3d" \ | tar -x -C ~/dat/ real 0m2.442s user 0m0.824s sys 0m0.728s
This directory on our scandal cluster has 4000 small files in it which total up to 36 MB. Performing the piped tar method takes about a third the time of the recursive
scpcopy. Also, I should point out that the
scpprocess will, as far as I know, at best be as fast as the taring procedure. Of course, note that we didn't use compression here as this is a transfer of already compressed files over a fast connection and compression just slows both commands down. If you ever need to backup your computer over a your home LAN so you can reinstall an OS or something, this is a lifesaver (or at least a time saver).
So, piping a tar archive over ssh is a great tool. That being said, there is a program that does so much more and might be a better choice as long as it is installed on both systems; it's called
rsyncfollows symlinks just like
scpby default (for the same reasons), but it has a switch,
-afor archive mode, that allows it to perform the symlink preserving behavior as seen above.
rsynchas other benefits over just an
scpcopy (like incremental updates: i.e. only transmitting data that has changed) and really should be preferred in most cases if it is an option, but you have to read the man page first or it will bite you, especially if you have heavily internalized the way