Here’s how to dump and reload a PostgreSQL database over a network without having to save a dump file, copy it from one machine to another, and then reload the file.
What You’ll Need
You’ll need the nc (netcat) utility on both machines.
If it’s not already installed:
- On CentOS 5.x, run
yum install nc - If you are fortunate enough to be using a Conary-based distribution, run
conary update nc
An Example
Let’s say you want to dump the foo database from a production database machine to a test box.
As the postgres user on the test box (target), run the following command:
dropdb foo
createdb foo
nc -l 16666 | gunzip -c | psql -f – foo
This will set up a TCP socket listener on port 16666 which will receive data on that port and send it to standard output, which will then be piped to
psql via gunzip.
On the production (source) database, run the following command as the postgres user:
pg_dump -o foo | gzip -c -9 | nc16666
This will dump the
foo database with OIDs to standard output, piping it through gzip with maximum compression and on to netcat, which will send it to the netcat listener on port 16666 at target_host_name.
Results
I just used this method to dump an 8.7 GB database from production hardware to a server that is slated to replace the old hardware. The current production machine is running on bare metal, while the new database is running on virtualized hardware (VMware ESX with databases stored on iSCSI). A few weeks ago, using a standard database dump file generated with pg_dump with INSERT instead of COPY took over fifteen hours to reload on the virtualized hardware.
Using this method took about 20 minutes, give or take a few. Most of the speedups probably came from using COPY instead of INSERT, but not having to save a database dump and move it from one machine to another certainly sped things up as well.
Bandwidth used between the two machines was roughly about 4-7Mbps. The sending system was done sending before the target had flushed everything to disk.
Caveats and Considerations
- Doing this over a low bandwidth connection isn’t recommended; you’ll need at least 4 Mbps to make it reasonable.
- However, if you are doing this on a fast LAN, you can nix
gzipandgunzipon both sides for even better performance. I reran the test and took the compression out of the picture and the sending side completed in 12½ minutes while keeping the CPU usage down. Network utilization was about 35-40Mbps. - If
either machinethe target machine is running a firewall, you’ll have to open a port or disable the firewall for this to work. - Elliot notes that if using netcat will send all data over the network in the clear. Therefore, if you have sensitive data that needs to be dumped over a public network, you’ll be better off using
ssh.
Happy dumping!
(Hat tip: Michael K. Johnson, who introduced me to netcat.)
UPDATE: Added some notes to reflect mkj’s comments. Good stuff.
UPDATE 2: Added a note about security and ssh.