late morning
17 April 2009
Here’s how to dump and reload a PostgreSQL database over a network without having to save a dump file, copy it from one machine to another, and then reload the file.
You’ll need the nc (netcat) utility on both machines.
If it’s not already installed:
yum install ncconary update ncLet’s say you want to dump the foo database from a production database machine to a test box.
As the postgres user on the test box (target), run the following command:
dropdb foo
createdb foo
nc -l 16666 | gunzip -c | psql -f - foo
psql via gunzip.
On the production (source) database, run the following command as the postgres user:
pg_dump -o foo | gzip -c -9 | nc <target_host_name> 16666
foo database with OIDs to standard output, piping it through gzip with maximum compression and on to netcat, which will send it to the netcat listener on port 16666 at target_host_name.
I just used this method to dump an 8.7 GB database from production hardware to a server that is slated to replace the old hardware. The current production machine is running on bare metal, while the new database is running on virtualized hardware (VMware ESX with databases stored on iSCSI). A few weeks ago, using a standard database dump file generated with pg_dump with INSERT instead of COPY took over fifteen hours to reload on the virtualized hardware.
Using this method took about 20 minutes, give or take a few. Most of the speedups probably came from using COPY instead of INSERT, but not having to save a database dump and move it from one machine to another certainly sped things up as well.
Bandwidth used between the two machines was roughly about 4-7Mbps. The sending system was done sending before the target had flushed everything to disk.
gzip and gunzip on both sides for even better performance. I reran the test and took the compression out of the picture and the sending side completed in 12½ minutes while keeping the CPU usage down. Network utilization was about 35-40Mbps.ssh.Happy dumping!
(Hat tip: Michael K. Johnson, who introduced me to netcat.)
UPDATE: Added some notes to reflect mkj’s comments. Good stuff.
UPDATE 2: Added a note about security and ssh.
Comments
Michael K Johnson
17 April, 11:36 AM
I suspect that practically all the difference in this case was from using COPY instead of INSERT.
It’s rarely worth using gzip with netcat — netcat really shines when you are trying to avoid wasting time encoding data in any way. Especially, avoiding expensive (-9) compression. At that point you might as well just use ssh to transfer the data, given the speeds of today’s CPUs. I find that netcat is particularly useful when I want to transfer very large files over very fast networks without burdening the CPU — say, ISO images over gigabit networks (perhaps while you are busy compiling the kernel on one end (or both ends…) of the connection).
Note also that you can ask pg_dump to compress by itself, though again for this instance that’s a waste of perfectly good CPU power and (potentially) of time.
In general, you will need to disable the firewall only on the listening end (nc -l) of the connection, unless you have a firewall that is so restrictive that it disables outgoing connections.
HTH