HOWTO: Dump and Reload a PostgreSQL Database Using netcat

Here’s how to dump and reload a PostgreSQL database over a network without having to save a dump file, copy it from one machine to another, and then reload the file.

What You’ll Need

You’ll need the nc (netcat) utility on both machines.

If it’s not already installed:

  • On CentOS 5.x, run yum install nc
  • If you are fortunate enough to be using a Conary-based distribution, run conary update nc

An Example

Let’s say you want to dump the foo database from a production database machine to a test box.

As the postgres user on the test box (target), run the following command:


dropdb foo
createdb foo
nc -l 16666 | gunzip -c | psql -f – foo

This will set up a TCP socket listener on port 16666 which will receive data on that port and send it to standard output, which will then be piped to psql via gunzip.

On the production (source) database, run the following command as the postgres user:


pg_dump -o foo | gzip -c -9 | nc 16666

This will dump the foo database with OIDs to standard output, piping it through gzip with maximum compression and on to netcat, which will send it to the netcat listener on port 16666 at target_host_name.

Results

I just used this method to dump an 8.7 GB database from production hardware to a server that is slated to replace the old hardware. The current production machine is running on bare metal, while the new database is running on virtualized hardware (VMware ESX with databases stored on iSCSI). A few weeks ago, using a standard database dump file generated with pg_dump with INSERT instead of COPY took over fifteen hours to reload on the virtualized hardware.

Using this method took about 20 minutes, give or take a few. Most of the speedups probably came from using COPY instead of INSERT, but not having to save a database dump and move it from one machine to another certainly sped things up as well.

Bandwidth used between the two machines was roughly about 4-7Mbps. The sending system was done sending before the target had flushed everything to disk.

Caveats and Considerations

  • Doing this over a low bandwidth connection isn’t recommended; you’ll need at least 4 Mbps to make it reasonable.
  • However, if you are doing this on a fast LAN, you can nix gzip and gunzip on both sides for even better performance. I reran the test and took the compression out of the picture and the sending side completed in 12½ minutes while keeping the CPU usage down. Network utilization was about 35-40Mbps.
  • If either machine the target machine is running a firewall, you’ll have to open a port or disable the firewall for this to work.
  • Elliot notes that if using netcat will send all data over the network in the clear. Therefore, if you have sensitive data that needs to be dumped over a public network, you’ll be better off using ssh.

Happy dumping!

(Hat tip: Michael K. Johnson, who introduced me to netcat.)

UPDATE: Added some notes to reflect mkj’s comments. Good stuff.

UPDATE 2: Added a note about security and ssh.

2009.04.17 · permalink