Saturday 27 February 2021

How to show file/Directory copy progess on Linux

Recently I had to do a big copy for a large zip archive - 2.9GB - on an NFS filesystem mounted on a 16Mbps WAN connection.
The copy was progressing very slowly, it took more than 3 hours to finish, the problem is, the standard linux cp command does not allow you to see the copy progress.
Thus, I was curious how to copy files and still show some progress statics, and I found out there are multiple ways to do it, but most will require additional installation requirements on your system.
Lets take a look at the options to do this.


Using gcp
gcp is an enhanced file copier, allows better options than cp, among those, it detects if the files being copied exists and would print a warning.
gcp will print the progress and the copy speed in MB/Sec, the size of the file copied and the time taken to copy.
sherif@Luthien:~/Downloads$ gcp -f datafari.deb datafari.deb_
Copying 640.06 MiB 100% |################################################| 213.51 MB/s Time: 0:00:03
sherif@Luthien:~/Downloads$ 
The only problem with gcp is that you will need to install it on your system, it doesn't come with the standard Linux installation.

Using progress
progress is a little utility that can monitor the progress and throughput of programs doing IO, like cp, or a web browser.
Like gcp, progress needs to be installed as it is not a part of the standard installation.
One trick about progress is that is uses an interactive terminal to show the progress and throughput information, thus, in order to save the progress while copying multiple files, we can use the tee command.
tee would then preserve the standard output of progress, and would save a copy in a file.
To show the file, we need to encode none printables, using cat -e
sherif@Luthien:~$ cp -r ./Downloads ./Download_  & progress -mp $! |tee test
[1] 1506
[1]+  Done                    cp -r ./Downloads ./Download_
sherif@Luthien:~$
sherif@Luthien:~$ cat -e test
^[[?1049h^[[22;0;0t^[[1;37r^[(B^[[m^[[4l^[[?7h^[[H^[[2JNo PID(s) currently monitored^M$
^[[H^[[2JNo PID(s) currently monitored^M$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/solr-8.3.0.zip^[[2;9H53.9% (96.2 MiB / 178.6 MiB) 79.4 MiB/s remaining 0:00:01^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/solr-8.3.0.zip^[[2;9H91.8% (164 MiB / 178.6 MiB) 73.5 MiB/s^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/solr-8.3.0.zip^[[2;9H91.8% (164 MiB / 178.6 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/enterprise-search-0.1.0-beta3.tar.gz^[[2;9H12.2% (20.2 MiB / 165.8 MiB) 53.3 MiB/s remaining 0:00:02^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/enterprise-search-0.1.0-beta3.tar.gz^[[2;9H74.0% (122.8 MiB / 165.8 MiB) 65.6 MiB/s^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/enterprise-search-0.1.0-beta3.tar.gz^[[2;9H74.0% (122.8 MiB / 165.8 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/datafari.deb^[[2;9H68.4% (437.5 MiB / 640.1 MiB) 103.8 MiB/s remaining 0:00:01^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/datafari.deb^[[2;9H68.4% (437.5 MiB / 640.1 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/keycloak-8.0.2.zip^[[2;9H57.5% (131.2 MiB / 228.4 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/wso2is-5.9.0.zip^[[2;9H50.9% (189 MiB / 371.2 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/OpenJDK8U-jdk_x64_linux_hotspot_8u232b09.tar.gz^[[2;9H67.5% (67.2 MiB / 99.7 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/OpenJDK11U-jdk_x64_linux_hotspot_11.0.5_10.tar.gz^[[2;9H99.2% (186.2 MiB / 187.8 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H15.9% (87.5 MiB / 550.1 MiB) 100.9 MiB/s remaining 0:00:04^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H31.2% (171.8 MiB / 550.1 MiB) 98.5 MiB/s remaining 0:00:03^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H41.1% (226.2 MiB / 550.1 MiB) 93.0 MiB/s remaining 0:00:03^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H48.0% (264.2 MiB / 550.1 MiB) 86.9 MiB/s remaining 0:00:03^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H85.2% (468.5 MiB / 550.1 MiB) 98.6 MiB/s^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/Anaconda3-2020.07-Linux-x86_64.sh^[[2;9H85.2% (468.8 MiB / 550.1 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/eclipse-installer/plugins/org.eclipse.justj.openjdk.hotspot.jre.minimal.stripped_14.0.2.v20200815-0932.jar^[[2;9H0.0% (0 / 18.9 KiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/eclipse-installer/configuration/org.eclipse.osgi/154/0/.cp/libswt-cairo-gtk-4936r26.so^[[2;9H0.0% (0 / 42.9 KiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/chrome-linux.zip^[[2;9H65.5% (73.8 MiB / 112.6 MiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/redash-setup/redash-master/client/app/assets/less/inc/growl.less^[[2;9H0.0% (0 / 476 B)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/redash-setup/redash-master/redash/query_runner/query_results.py^[[2;9H0.0% (0 / 5.3 KiB)^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/fess-13.4.2.zip^[[2;9H40.9% (58 MiB / 141.8 MiB) 94.5 MiB/s^M$
$
^[[H^[[2J[ 1506] cp /home/sherif/Downloads/fess-13.4.2.zip^[[2;9H40.9% (58 MiB / 141.8 MiB)^M$
$
^[[H^[[2JNo such pid: 1506, or wrong permissions.^M$
^[[37;1H^[[?1049l^[[23;0;0t^M^[[?1l^[>sherif@Luthien:~$


Using pv
pv is a filter tool similar to cat, that monitors the progress of data through a pipe.
pv can be handy to print progress and rate information for any data transfer that goes through a pipe, eg: if the data is being sent using netcat or the like
Like the above tools, pv needs to be installed by root, as it is not part of the standard installation.
sherif@Luthien:~/Downloads$ pv datafari.deb >datafari.deb_
 640MiB 0:00:00 [ 946MiB/s] [=======================================================>] 100%            
sherif@Luthien:~/Downloads$


Using rsync
rsync is the Swiss army knife when it comes to copying files between 2 hosts.
It is able able to do local copy and could offer a huge amount of options spcially while doing backups which can speed up the process.
rsync is installed by default in most linux systems and thus, it is probably the easies way to look for progress information while tools like progress can't be installed on the system.
I would recommend reading through the rsync options and putting it in mind when doing large file copies or backups.
sherif@Luthien:~/Downloads$ rsync --progress datafari.deb datafari.deb_
datafari.deb
    671,153,350 100%  113.72MB/s    0:00:05 (xfr#1, to-chk=0/1)
sherif@Luthien:~/Downloads$ man rsync
sherif@Luthien:~/Downloads$ rsync --progress -v datafari.deb datafari.deb_
datafari.deb
    671,153,350 100%  166.59MB/s    0:00:03 (xfr#1, to-chk=0/1)

sent 671,317,285 bytes  received 35 bytes  149,181,626.67 bytes/sec
total size is 671,153,350  speedup is 1.00
sherif@Luthien:~/Downloads$ rsync --progress -vv datafari.deb datafari.deb_
delta-transmission disabled for local transfer or --whole-file
datafari.deb
    671,153,350 100%  255.09MB/s    0:00:02 (xfr#1, to-chk=0/1)
total: matches=0  hash_hits=0  false_alarms=0 data=671153350

sent 671,317,285 bytes  received 102 bytes  268,526,954.80 bytes/sec
total size is 671,153,350  speedup is 1.00
sherif@Luthien:~/Downloads$ 
As you can see, passing multiple v options increases the verbosity of the output offering more information about the copy.