SPONSORED ADS

Effortlessly Sync Your Data: Mastering the Power of rsync

Last Updated Apr 12, 2023

Rsync is a powerful command-line tool for efficiently synchronizing and transferring files between different computers or directories on the same computer.

Consider a scenario where you have the project config files on your local hard drive as well as a remote server. If you modify the file on your local drive, you would want the version on the remote server to reflect the same changes. This is where RSync comes in - it identifies the differences between the two files and only transfers the modified parts, rather than the entire file. In this way, RSync optimizes the data transfer process, minimizing the time and bandwidth required to synchronize the files.

The magic path of rsync is it can detect changes without having both versions of the file available on the same machine, so how it work?

How rysnc work ?

Rsync detects changes by comparing the attributes and contents of the files being synchronized. When you run rsync between a source and destination, it first examines the file's attributes, such as timestamps and permissions, to identify any differences.

If the attributes differ, rsync performs a more detailed comparison of the file's contents using a rolling checksum algorithm. Here is how it work under the hook:

Let's say there are two computers. Alice has an old version of a file, and Bob has a newer version.

First, Alice breaks the file into small fixed-size blocks, computes checksums for each block. Then she sends a list of these checksums to Bob.

Bob do the same with his file, and then compare his checksums with Alice. If a checksum match, then Bob knows that he can use the old piece of his file instead of sending it over again. Bob skips over that block and moves on to the next one. He keeps doing this until it has found all the blocks that match. The blocks that match are the parts of the file that are the same in the old and new versions.

The parts of the file that don't match are the changes, either new or modified data. Bob sends instructions back to Alice about how to create a new version of the file.

Alice receives this information and uses it to construct the new version of the file by combining the parts that haven't changed with the new data that was sent over. And that's basically how it works!

No deep shit right? Now you know how rysnc work, time to surprise your coworker!

Anyway understand how rsync work under the hook is good, but it does not make any real different in your career path to be honest, however learn real use case does.

use case 1: Copy the contents of dir1 into dir2

rsync -av dir1/ dir2

Notice the slash at the end of dir1, it tell rsync we really mean the contents of dir1, not dir1 itself, or in another word: do not create dir1 inside dir2.

about the -av flags: -a mean recursively and preserves users, groups, symbolic links, file permissions, and timestamps. I find it easy to remember this flag as ALL. -v verbose

use cass 2: Copy files from local to remote

rsync -avz path/on/local/dir1/ nodeepshit:/path/on/remote/dir2

nodeepshit is ssh alias you can create by adding a host section in your ~/.ssh/config like this

Host nodeepshit
  Hostname 194.233.76.xxx
  User root

-z flag mean compression This is useful when sending large amounts of data across the network. You dont need -z if your files are already compressed, for example: jpg, mp3, or video files. -z work best with sql file, csv or any text file.

use case 3: Same as 2 but without some dir in local

For example you want to upload/sync everything inside /web dir to remote while excluding the folder /web/uploads. It can be easy done with --exclude , like this:

rsync -avz path/on/local/dir1/ nodeepshit:/path/on/remote/dir2 \
	--exclude="/upload"

use case 4: resume when transfer failed

When transferring files through the internet, there are many potential issues that can arise, and Rsync is not immune to these problems. If a transfer fails, how can we resume or recover the transfer?

Answer: Use --partial-dir=.nodeepshit

.nodeepshit is the dir name for partial files, you could use whatever name you want, I just use the name here to find if some one else steal my post ;)

rsync -avz --partial-dir nodeepshit:/path/on/remote/dir2 path/on/local/dir1

During file transfers, the files are temporarily saved as hidden files with cryptic names in the target folder (e.g., .HereIsDeepShitFile.7up0do). If a transfer fails the hidden file will be deleted. However, if --partial-dir is set, the partial file will be there. In the next transfer, rsync will use a file found in this dir as data to speed up the resumption of the transfer and then delete it after it has served its purpose.

So to resume the transfer, just re run the cmd above.

There are many more helpful flags about rsync, you can read all here https://explainshell.com/explain/1/rsync

And that's all folks, see you in the next article.

Others articles about developer tooling

If you want to learn more about other developer tooling, here is some articles

  1. Resume interrupted download with curl

Hi there. Nodeepshit is a hobby website built to provide free information. There are no chargers to use the website.

If you enjoy our tutorials and examples, please consider supporting us with a cup of beer, we'll use the funds to create additional excellent tutorials.

If you don't want or unable to make a small donation please don't worry - carry on reading and enjoying the website as we explore more tutorials. Have a wonderful day!