Rsync tutorial

What is Rsync?

Rsync is a very useful alternative to rcp written by Andrew Tridgell and Paul Mackerras. This tool lets you copy files and directories between a local host and a remote host (source and destination can also be local if you need.) The main advantage of using Rsync instead of rcp, is that rsync can use SSH as a secure channel, send/receive only the bytes inside files that changed since the last replication, and remove files on the destination host if those files were deleted on the source host to keep both hosts in sync. In addition to using rcp/ssh for transport, you can also use Rsync itself, in which case you will connect to TCP port 873.

Whether you rely on SSH or use Rsync explicitely, Rsync still needs to be installed on both hosts. A Win32 port is available if you need, so you can have either one of the host or both be NT hosts. Rsync's web site has some good infos and links. There is also an HOWTO.

Configuring /etc/rsyncd.conf

Being co-written by Andrew Tridgell, author of Samba, it's no surprise that Rsync's configuration file looks just like Samba (and Windows' :-), and that Rsync lets you create projects that look like shared directories under Samba. Accessing remote resources through this indirect channel offers more independence, as it lets you move files on the source Rsync server without changing anything on the destination host.

Any parameters listed before any [module] section are global, default parameters.

Each module is a symbolic name for a directory on the local host. Here's an example:

#/etc/rsyncd.conf
secrets file = /etc/rsyncd.secrets
motd file = /etc/rsyncd.motd #Below are actually defaults, but to be on the safe side...
read only = yes
list = yes
uid = nobody
gid = nobody

[out]
comment = Great stuff from remote.acme.com
path = /home/rsync/out

[confidential]
comment = For your eyes only
path = /home/rsync/secret-out
auth users = joe,jane
hosts allow = *.acme.com
hosts deny = *
list = false

Note: Rsync will not grant access to a protected share if the password file (/etc/rsyncd.secrets, here) is world-readable.

Running RSYNCd

Per the manual page:

The rsync daemon is launched by specifying the --daemon option to rsync. You can launch it either via inetd or as a stand-alone daemon. When run via inetd you should add a line like this to /etc/services:

rsync 873/tcp

... and a single line something like this to /etc/inetd.conf:

rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon

You will then need to send inetd a HUP signal to tell it to reread its config file. Note that you should not send the rsync server a HUP signal to force it to reread the /etc/rsyncd.conf. The file is re-read on each client connection.

Per the HOWTO:

The rsync daemon is robust, so it is safe to launch it as a stand-alone server. The code that loops waiting for requests is only a few lines long then it forks a new copy. If the forked process dies then it doesn't harm the main daemon.
The big advantage of running as a daemon will come when the planned directory cache system is implemented. The caching system will probably only be enable when running as a daemon. For this reason, busy sites is recommended to run rsync as a daemon. Also, the daemon mode makes it easy to limit the number of concurrent connections.

Since it's not included in the 2.4.3 RPM package, here's the init script to be copied as /etc/rc.d/init.d/rsyncd with symlinks to /etc/rc.d/rc3.d:

#!/bin/sh
# Rsyncd This shell script takes care of starting and stopping the rsync daemon
# description: Rsync is an awesome replication tool.

# Source function library.
. /etc/rc.d/init.d/functions

[ -f /usr/bin/rsync ] || exit 0

case "$1" in
start)
action "Starting rsyncd: " /usr/bin/rsync --daemon
;;
stop)
action "Stopping rsyncd: " killall rsync
;;
*)
echo "Usage: rsyncd {start|stop}"
exit 1
esac
exit 0

Here's an example under Linux on how to set up a replication through SSH:

rsync -avz -e ssh rsync@remote.acme.com:/home/rsync/out/ /home/rsync/from_remote

An important thing here, is that the presence or absence of a trailing "/" in the source directory determines whether the directory itself is copied, or simply the contents of this source directory.

In other words, the above means that the local host must have a directory available (here, /home/rsync/from_remote to receive the contents of /home/rsync/out sitting on the remote host, otherwise Rsync will happily download all files into the path given as destination without asking for confirmation, and you could end up with a big mess.

On the other hand, rsync -avz -e ssh rsync@remote.acme.com:/home/rsync/out /home/rsync/from_remote means that the an "out" sub-directory is first created under /home/rsync/from_remote on the destination host, and will be populated with the contents of the remote directory ./out. In this case, files will be save on the local host in /home/rsync/from_remote/out, so the former commands looks like a better choice.

Here's how to replicate an Rsync share from a remote host:

rsync -avz rsync@remote.acme.com::out /home/rsync/in

Notice that we do not use a path to give the source resource, but instead just a name ("out"), and that we use :: to separate the server's name and the resource it offers. In the Rsync configuration that we'll see just below, this is shown as a [out] section. This way, admins on remote.acme.com can move files on their server; As long as they remember to update the actual path in the [out] section (eg. PATH=/home/rsync/out to PATH=/home/outgoing), remote Rsync users are not affected.

An Rsync server displays the list of available anonymous shares through rsync remote.acme.com::. Note the ::. For added security, it is possible to prompt for a password when listing private shares, so that only authorized remote users know about the Rsync shares available from your server.

Any NT version?

The NT port only requires the latest and greatest RSYNCx.EXE and Cygnus' CYGWIN1.DLL. The easiest is to keep both in the same directory, but the DLL can be located in any directory found in your PATH environment variable.

Robert Scholte's excellent tutorial on using the NT port of Rsync can be found here.

Instructions on how to install rsync as an NT service are here.

Here's an example based on the sample above:

C:\Rsync>rsync243 -avz joe@linux.acme.com::confidential ./confidential
Password:
receiving file list ... done
./
./
wrote 109 bytes read 123 bytes 66.29 bytes/sec
total size is 0 speedup is 0.00

Useful command-line switches

-v, --verbose increase verbosity
-q, --quiet decrease verbosity
-c, --checksum always checksum
-a, --archive archive mode. It is a quick way of saying you want recursion and want to preserve everything.
-r, --recursive recurse into directories
-R, --relative use relative path names
-u, --update update only (don't overwrite newer files)
-t, --times preserve times
-n, --dry-run show what would have been transferred
-W, --whole-file copy whole files, no incremental checks
-I, --ignore-times Normally rsync will skip any files that are already the same length and have the same time-stamp. This option turns off this behavior.
--existing only update files that already exist
--delete delete files that don't exist on the sending side
--delete-after delete after transferring, not before
--force force deletion of directories even if not empty
-c, --checksum always checksum
--size-only only use file size when determining if a file should be transferred
--progress show progress during transfer
-z, --compress compress file data
--exclude=PATTERN exclude files matching PATTERN
--daemon run as a rsync daemon
--password-file=FILE get password from FILE

Resources