Conclusion:
rsync does a fantastic job to sync and keep in sync content from one to another system. Backups are perfect with it, use rsnapshot to get even more out of it.
But rsync is not good in bi-directional syncing in the case of deletion of content. Despite, that you can delete all file on all content directories on all hosts, which is not an option for a the very-day user.
Keep in mind, that you need same timezones and synced clocks, when using the –update option
Look out to keep things in sync, with deletion too, take “unison” for example.
Unison
Here is how I get to this conclusion:
Rsync is a very good program to sync data from one host (Alice) to another host (Bob).
The advantage of rsync vs. normal methods of data copying lies in its algorithms to transmit only changes, after the first initial upload.
For the example I use two local directories, it would also work identical via network, with the following initial layout and content:
content_on_alice: total 0 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:25 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 0B 28 Okt 12:15 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt content_on_bob: total 0 drwxr-xr-x 2 brandy staff 68B 28 Okt 12:14 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../
So it is easy to do sth like:
rsync -azv --progress content_on_alice/ content_on_bob/ sending incremental file list ./ test_file_1.txt 0 100% 0.00kB/s 0:00:00 (xfer#1, to-check=1/3) test_file_2.txt 0 100% 0.00kB/s 0:00:00 (xfer#2, to-check=0/3) sent 159 bytes received 53 bytes 424.00 bytes/sec total size is 0 speedup is 0.00
The content change to:
content_on_alice: total 0 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:25 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 0B 28 Okt 12:15 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt content_on_bob: total 0 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:25 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 0B 28 Okt 12:15 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt
The inital copy from Alice to Bob has been done.
Now I change sth on Alice and rsync it again:
echo "1" >content_on_alice/test_file_1.txt rsync -azv --progress content_on_alice/ content_on_bob/ sending incremental file list test_file_1.txt 2 100% 0.00kB/s 0:00:00 (xfer#1, to-check=1/3) sent 126 bytes received 31 bytes 314.00 bytes/sec total size is 2 speedup is 0.01
The content change to:
content_on_alice: total 8 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:25 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt content_on_bob: total 8 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:25 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt
So we now Bob as an backup of Alice and a method to propagate changes on Alice to Bob.
Really? No, one case is missing.
What happens, if sth is delete on Alice, does it get deleted on Bob too?
rm content_on_alice/test_file_2.txt rsync -azv --progress content_on_alice/ content_on_bob/ sending incremental file list ./ sent 72 bytes received 15 bytes 174.00 bytes/sec total size is 2 speedup is 0.02
The content changed to:
content_on_alice: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt content_on_bob: total 8 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 0B 28 Okt 12:25 test_file_2.txt
Hmm, it is still there on Bob, not deleted.
But rsync has the –delete option, which should be handled with care
--delete This tells rsync to delete extraneous files from the receiving side (ones that aren't on the sending side), but only for the directories that are being synchronized. You must have asked rsync to send the whole directory (e.g. "dir" or "dir/") without using a wildcard for the directory's contents (e.g. "dir/*") since the wildcard is expanded by the shell and rsync thus gets a request to transfer individual files, not the files' parent directory. Files that are excluded from the transfer are also excluded from being deleted unless you use the --delete-excluded option or mark the rules as only matching on the sending side (see the include/exclude modifiers in the FILTER RULES section). Prior to rsync 2.6.7, this option would have no effect unless --recursive was enabled. Beginning with 2.6.7, deletions will also occur when --dirs (-d) is enabled, but only for directories whose contents are being copied. This option can be dangerous if used incorrectly! It is a very good idea to first try a run using the --dry-run option (-n) to see what files are going to be deleted. If the sending side detects any I/O errors, then the deletion of any files at the destination will be automatically disabled. This is to prevent temporary filesystem failures (such as NFS errors) on the sending side from causing a massive deletion of files on the destination. You can override this with the --ignore-errors option. The --delete option may be combined with one of the --delete-WHEN options without conflict, as well as --delete-excluded. However, if none of the --delete-WHEN options are specified, rsync will choose the --delete-during algorithm when talking to rsync 3.0.0 or newer, and the --delete-before algorithm when talking to an older rsync. See also --delete-delay and --delete-after.
So we now can propagate deletion on Alice to Bob too:
rsync -azv --progress --delete content_on_alice/ content_on_bob/ sending incremental file list deleting test_file_2.txt sent 69 bytes received 12 bytes 162.00 bytes/sec total size is 2 speedup is 0.02
And the content changed to:
content_on_alice: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt content_on_bob: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt
We now are able to have a 1:1 clone copy of the content of Alice on Bob.
Perfect for backups!
What will happen, if there are changes made on Bob? How are they affected?
Let’s try:
echo "2" >content_on_bob/new_file_on_bob.txt echo "33" >content_on_bob/test_file_1.txt
Now the content looks like:
content_on_alice: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt content_on_bob: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 12:53 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:53 new_file_on_bob.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 12:53 test_file_1.txt
Now we rsync from Alice to Bob:
rsync -azv --progress --delete content_on_alice/ content_on_bob/ sending incremental file list ./ deleting new_file_on_bob.txt test_file_1.txt 2 100% 0.00kB/s 0:00:00 (xfer#1, to-check=0/2) sent 114 bytes received 34 bytes 296.00 bytes/sec total size is 2 speedup is 0.01
The content changed to:
content_on_alice: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt content_on_bob: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt
It does exaclty what, we want, it makes a 1:1 copy of Alice on Bob, so the new file on Bob was deleted and the content of file 1 was overwritten.
rsync has the –update option. The manual reads:
-u, --update This forces rsync to skip any files which exist on the destina- tion and have a modified time that is newer than the source file. (If an existing destination file has a modification time equal to the source file's, it will be updated if the sizes are different.) Note that this does not affect the copying of symlinks or other special files. Also, a difference of file format between the sender and receiver is always considered to be important enough for an update, no matter what date is on the objects. In other words, if the source has a directory where the destination has a file, the transfer would occur regardless of the timestamps. This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests to be transferred.
Let’s test, if this would prevent deletion and/or overwriting on Bob.
We make the same changes as in the previous test.
rsync -azvu --progress --delete content_on_alice/ content_on_bob/ sending incremental file list ./ deleting new_file_on_bob.txt sent 72 bytes received 15 bytes 174.00 bytes/sec total size is 2 speedup is 0.02
The content changed to:
content_on_alice: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt content_on_bob: total 8 drwxr-xr-x 3 brandy staff 102B 28 Okt 12:34 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 3B 28 Okt 12:56 test_file_1.txt
Hmm it deleted the new file, but did not overwrite the changed file.
Does again exactly, what we have rsync told to do.
This does not look good for bi-directional sync.
What’s that any way?
Bi-directional syncing is the idea, that an user can add, change or delete any thing in the content, without caring about, where the user makes the change.
The “syncing” mechanim will take care to propagate the “latest” changes to all other content copies.
In principal rsync does that, with one caveat.
But first things first, let me show how far we can get with rsync.
We start with the following content:
content_on_alice: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:43 test_file_2.txt content_on_bob: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 13:44 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 3B 28 Okt 12:56 test_file_1.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt
First step, sync everything from Alice to Bob:
rsync -av content_on_alice/ content_on_bob/ sending incremental file list ./ test_file_1.txt test_file_2.txt sent 177 bytes received 53 bytes 460.00 bytes/sec total size is 4 speedup is 0.02
Second step, sync everything on Bob to Alice:
rsync -av content_on_bob/ content_on_alice/ sending incremental file list test_file_3.txt sent 144 bytes received 31 bytes 350.00 bytes/sec total size is 6 speedup is 0.03
The content changed to:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:43 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:43 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt
Well perfect sync, now we test changes (on both sides):
echo "22" >content_on_alice/test_file_2.txt echo "333" >content_on_bob/test_file_3.txt
So content looks like:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:43 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:50 test_file_3.txt
Now we make both steps again:
rsync -av content_on_alice/ content_on_bob/ sending incremental file list test_file_2.txt test_file_3.txt sent 190 bytes received 50 bytes 480.00 bytes/sec total size is 7 speedup is 0.03 rsync -av content_on_bob/ content_on_alice/ sending incremental file list sent 99 bytes received 12 bytes 222.00 bytes/sec total size is 7 speedup is 0.06
The content changed to:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt
What went wrong? The change on Bob was overwritten and then synced back to Alice.
Hmm, we remember the “–update” option?
Let’s try it with that one. I make the change on Bob again, so content is:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 2B 28 Okt 13:44 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt
We now make both steps with the –update aka -u:
rsync -avu content_on_alice/ content_on_bob/ sending incremental file list sent 99 bytes received 12 bytes 222.00 bytes/sec total size is 7 speedup is 0.06 rsync -avu content_on_bob/ content_on_alice/ sending incremental file list test_file_3.txt sent 146 bytes received 31 bytes 354.00 bytes/sec total size is 9 speedup is 0.05
The content changed to:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:43 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt
Wooh! Did exactly what we want.
But we need to test deletion too, so I delete file 2 on Alice and file 3 on Bob:
rm content_on_alice/test_file_2.txt rm content_on_bob/test_file_3.txt
The content is:
content_on_alice: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 13:59 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt content_on_bob: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 13:59 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt
Now we make our syncing with -u:
rsync -avu content_on_alice/ content_on_bob/ sending incremental file list ./ test_file_3.txt sent 134 bytes received 34 bytes 336.00 bytes/sec total size is 6 speedup is 0.04 rsync -avu content_on_bob/ content_on_alice/ sending incremental file list test_file_2.txt sent 145 bytes received 31 bytes 352.00 bytes/sec total size is 9 speedup is 0.05
The content changed to:
content_on_alice: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:59 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt content_on_bob: total 24 drwxr-xr-x 5 brandy staff 170B 28 Okt 13:59 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 3B 28 Okt 13:50 test_file_2.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt
Hmm, not what we want. I prevented deletion at all, perfect for backup and restore.
But how can we actually delete stuff and prevent it from coming back?
Remember the –delete option, let’s try this, we make the changes as in the previous test and do the rsyncing with –delete on both sides:
rm content_on_alice/test_file_2.txt rm content_on_bob/test_file_3.txt rsync -avu --delete content_on_alice/ content_on_bob/ sending incremental file list ./ deleting test_file_2.txt test_file_3.txt sent 134 bytes received 34 bytes 336.00 bytes/sec total size is 6 speedup is 0.04 rsync -avu --delete content_on_bob/ content_on_alice/ sending incremental file list sent 84 bytes received 12 bytes 192.00 bytes/sec total size is 6 speedup is 0.06
The content changed to:
content_on_alice: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 14:08 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt content_on_bob: total 16 drwxr-xr-x 4 brandy staff 136B 28 Okt 14:08 ./ drwxr-xr-x 4 brandy staff 136B 28 Okt 12:26 ../ -rw-r--r-- 1 brandy staff 2B 28 Okt 12:31 test_file_1.txt -rw-r--r-- 1 brandy staff 4B 28 Okt 13:55 test_file_3.txt
Hmm, the deletion on Alice worked, but not the one on Bob, because Alice first copied the file 3 back to Bob.
We could run the commands the other way round, but then file 2 would survive. Using cron to start the sync at the same time, will run into a race condition, on of the hosts will be faster and the result is still not what we want.
There is another flaw in the “–update”, which I need to mention. Both systems need to be in the same timezone and have synchronized clocks or the test on which change is “newer” will bitterly fail. (This has almost was a show stopper at a big migration once)
Conclusion:
rsync does a fantastic job to sync and keep in sync content from one to another system. Backups are perfect with it, use rsnapshot to get even more out of it.
But rsync is not good in bi-directional syncing in the case of deletion of content. Despite, that you can delete all file on all content directories on all hosts, which is not an option for a the very-day user.
Keep in mind, that you need same timezones and synced clocks, when using the –update option
Look out to keep things in sync, with deletion too, take “unison” for example.
http://www.cis.upenn.edu/~bcpierce/unison/
Excellent point and demonstrations. I’ve run headlong into the short-comings of rsync when files are moved (not deleted) between folders on the two drives. Depending on your choices of –update and –delete you end up with multiple copies of the moved files in both initial and moved locations, or you lose data.
Thank you for writing this; I found it useful as we look into bi-directional rsync.