Migrate files between NAS
Goal
Migrate a large number of files from a NAS onto another NAS, so that I could migrate the filesystem to btrfs. This means:
- a large number of files need to be transferred (perhaps in millions, e.g. project files), and large in total size (>3TB)
- need to keep file attributes and ownership
- need to be safe but fast
- in this case there is no spare disks - therefore cannot use dd
- specifically in this case, the two Synology NAS hosts are both relatively slow, and certain packages such as
pv
are missing on their OS
First attempt: rsync over SSH and Samba
# On source NAS
sudo rsync -avh <source> <nas>:/<destination>
rsync
is handy here because it persists the attributes and ownership, and it could resume if transmission is interrupted. Note in modern version of rsync
we could also do --info=progress2
for aggregated information on progress, however this is not supported on my NAS. Also running this with root
privileges because there were files owned by other users.
When I tested this, it is unfortunately slow (it was so slow that I didn’t even have to benchmark it). Thinking this may be due to the SSH overhead, I switched over to Samba. First mounted the destination to the source NAS [destination]
, then with a small change in path: sudo rsync -avh [source] [destination]
. This resulted in a much better throughput.
Can we do better?
- How can we get overall idea of the progress? 3TB takes at least 6.8 hours to complete theoretcally.
- Technically we are doing an one-time dump. We will not be “synchronizing” them. Therefore the “incremental” benefit of
rsync
is not relevant here. - Can we also compress to achieve an even better throughput? Compression over
rsync
is not ideal because it operates on a per file basis and we have huge number of files.
Second attempt: tar
# On source NAS
sudo tar -vczf - -C [source] . > [destination]
Here tar
places all the files into one archive and compresses with gzip
. However very quickly I realized the weak CPU on the source NAS now becomes the bottleneck. Also the verbose mode in tar is not very helpful as it only lists out the files it is touching. Ideally we want to show how much data has been processed so far.
Final solution
# On another PC
sudo tar cf - -C [source] . | ssh [pc] "pv --force | pigz" > [destination]
Here we work around the NAS CPU bottleneck by using a third PC, which also enables us to use pv
to show progress and pigz
to utilize all cores in compressing. Note here the destination NAS was mounted to <destination>
. Also --force
is required here as “pv will not output any visual display if standard error is not a terminal”1.