
pg_snapclone
------------
Simon Riggs		simon@2ndquadrant.com


A set of tools that are intended to allow tasks to work in parallel against a
large database. The toolset comprises 3 main sets of features:

* pg_snapclone_tools - a set of PostgreSQL C functions that will publish a
snapshot from a master session and reuse it within child sessions. This 
allows two or more sessions to share a common snapshot; in PostgreSQL this
means that two sessions can have exactly the same view of the data in the
database. This is the enabling feature that allows tasks to be split so they
can operate in parallel within the database. Applications may use these
functions directly to create parallel tasks.

* pg_snapclone_master - a program that can act as a master session, utilising
the functions to publish a snapshot from pg_snapshot_tools. Other client
sessions can then clone snapshots directly from the master session. 

* pg_dump patches - enhancements to pg_dump to reuse snapshots published by
pg_snapclone_master, allowing large pg_dumps to operate in parallel. These
need to be patched onto a PostgreSQL source tree to allow the modified
pg_dump binary to be compiled.

Usage Example
-------------

If we want to dump a database more quickly, we can now execute a script that 
looks like this to run multiple pg_dumps in parallel, yet seeing exactly
the same view of the database.

pg_dump -t table1 -t table2 --schema-pre-load postgres > pre_load.dmp&

snap="pg_snapclone.pid"
pg_dump -t table1 --data-only --snapshot $snap postgres > table1.dmp&
wait1=$!
pg_dump -t table2 --data-only --snapshot $snap postgres > table2.dmp&
wait2=$!
pg_snapclone_master&
wait $wait1 $wait2
rm $snap

pg_dump -t table1 -t table2 --schema-post-load postgres > post_load.dmp&

Notice how we have split the pg_dump of the schema into two parts,
pre-load and post-load.




