Monday, November 14, 2011

New Replication and Recovery Features in PostgreSQL 9.1

1. Streaming Replication and Continuous Archiving Features.
a. Synchronous Replication
PostgreSQL 9.1 came with Synchronous Replication. In Synchronous Replication, all commited transaction will be transferred to standby synchronously. When Primary is in Synchronous replication, then each commit transaction will wait untill transaction get transfered to Slave/Replication Server. This reduces the loss of any committed transaction. And gives high degree of durability.
For setting up synchronous replication you can look at my blog: 

2. New parameter replication_timeout:
 This is a new parameter which has been added in 9.1. Using this parameter user can terminate replication connection, if connection is inactive more than replication_timeout seconds. This is useful parameter for Primary Server to detect Replication Server Crash or network outage.

3. New role/permission for Streaming Replication
 REPLICATION is a new role, which has been added in PostgreSQL 9.1 for Streaming Replication. If a user has REPLICATION role granted then user would be able to initiate Streaming Replication. Some example is given below:
b. Assign REPLICATION Privilege to a user:
REPLICATION privileged can also used for Online Backup of PostgreSQL.

4. New tool for Base Backup: pg_basebackup
pg_basebackup is a new Backup tool introduce in PostgreSQL 9.1. Using this tool user can take Base Backups of running PostgreSQL database cluster. To use pg_basebackup, user has to make following in Changes in PostgreSQL Cluster

 a. Make following changes in pg_hba.conf file:
      host    replication     postgres        [Ipv4 address of client]/32  trust
b. Make following changes in postgresql.conf file of PostgreSQL Cluster:
    archive_command = 'cp -i %p /Users/postgres/archive/%f'
    archive_mode = on # Require Restart
    max_wal_senders = 3 # Maximum 'wal_senders'
    wal_keep_segments = # How many WAL segments (=files) should be kept on the server 
c. After making above changes, user can restart the PostgreSQL Cluster and can use pg_basebackup to take the backup as given below:
     pg_basebackup -D /Users/vibhor/testbackup -v -Fp -l Testbackup -h -U postgres
Note:: User which can use pg_basebackup should have either SUPERUSER or REPLICATION privilege.

5. Functions to Control Streaming Replication replay:
 New version of PostgreSQL has New Streaming Replication Control Function. Information on it is given below:

  a. pg_xlog_replay_pause(): 
Using this function user can pause recovery of Standby and would be able to take consistent backup of Standby Data Directory.
Example is given below:
postgres=# select  pg_xlog_replay_pause();
(1 row)
b. pg_is_xlog_replay_paused():
 Using this function user would be able to check the Standby/Streaming Replication is paused or not. Example is given below:
 postgres=# select  pg_is_xlog_replay_paused();
(1 row)
c. pg_xlog_replay_resume():
 Using this function user would be able resume replication of standby/streaming replication, if its recovery/replay is paused. Example is given below:
postgres=# select pg_xlog_replay_resume();
(1 row)

postgres=# select  pg_is_xlog_replay_paused();
(1 row)
6. New in Replication Monitoring:

a. View: pg_stat_replication:
This view displays information on WAL sender processes. View contains one row for each WAL sender process shows, information on processid, user (oid), username, application name, host name (if available) and port number, time at which the server process began execution, and the current WAL sender state and transaction log location.
Example is given below:
postgres=# select * from pg_stat_replication;
 procpid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |          backend_start           |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
    2319 |       10 | postgres | sync_replication | ::1         |                 |       50224 | 2011-11-10 21:39:45.424503+05:30 | streaming | 0/33002798    | 0/33002798     | 0/33002798     | 0/33002798      |             0 | async
(1 row)
Note: pg_stat_replication view will give information on Master. Executing Query against pg_stat_replication view on Standby/streaming will return zero rows.

  b. Function: pg_last_xact_replay_timestamp():
Above function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. Example is given below:
postgres=# select * from pg_last_xact_replay_timestamp();
 2011-11-10 22:17:26.431321+05:30
(1 row)
7. New in Hot Standby:

  a. New parameter hot_standby_feedback:
 This is a new parameter which has been added in postgresql.conf file for Standby server. Using this parameter, now user would be able avoid canceling of Queries. This enable Hot Standby to postpone of cleaning old version of rows if any SELECT query running on Standby and makes Hot Standby to send feedback, once as per wal_receive_status_interval, to primary about queries currently executing on Standby. Note:: Setting this parameter may result in bloat on primary.

  b. New column:(conflicts) in pg_stat_database In PostgreSQL 9.1, New column conflicts has been added in pg_stat_database. This columns gives the total number of queries canceled due to conflict with recovery on standby. Example is given below:
postgres=# select datname, conflicts from pg_stat_database;
  datname  | conflicts 
 template1 |         0
 template0 |         0
 postgres  |         0
 korean    |         0
(4 rows)

c. New view: pg_stat_database_conflicts
 In 9.1, pg_stat_database_conflicts view has been added for monitoring and finding the cancelled queries due dropped tablespaces/ lock timeouts/old snapshots/pinned buffers/deadlocks. This view contains one row per database, which gives information on database OID, database name and the number of queries that have been canceled in this database due to dropped tablespaces, lock timeouts, old snapshots, pinned buffers and deadlocks. Example is given below:
postgres=# select * from pg_stat_database_conflicts;
 datid |  datname  | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock 
     1 | template1 |                0 |          0 |              0 |               0 |              0
 12172 | template0 |                0 |          0 |              0 |               0 |              0
 12180 | postgres  |                0 |          0 |              0 |               0 |              0
 25354 | korean    |                0 |          0 |              0 |               0 |              0
(4 rows)
Note: User has to run queries against this view on Standby, since conflicts occurs on Standby.

  d. Increase the maximum values for max_standby_archive_delay and max_standby_streaming_delay. 
In PostgreSQL 9.0, maximum value for max_standby_archive_delay and max_standby_streaming_delay were 35 minutes. Now in PostgreSQL 9.1, user can mention much larger value. These parameters determines the maximum total time allowed to apply any WAL segment/WAL data. For more detail about these parameter, please refer following page of document:

  Error Code    Condition Name
      57P04 database_dropped
Till 9.0, PostgreSQL used to use ERRCODE_ADMIN_SHUTDOWN for recovery conflict( on Standby) caused by Database Drop on Master. Now, in 9.1, ERRCODE_T_R_DATABASE_DROPPED will be used for Same situation. This change made for poolers to handle such situation (where database on Master no longer exists) correctly like pgpool.

  8. New in Recovery Control:

  a. pg_create_restore_point(text) function and recovery_target_name parameter:
 PostgreSQL 9.1 has come with the special function pg_create_restore_point(text). Using this function, admin/DBA can now create their own recovery/restore point. pg_create_restore_point returns Transaction log location, upto which user can restore their Hotbackup. To support the named restore point, PostgreSQL 9.1, has new parameter recovery_target_name for recovery.conf file Let see these two in action.
 i. Set following parameters in postgresql.conf file of cluster:
    archive_command='cp -i %p /Users/postgres/archive/%f'
ii. Now, lets take base backup using pg_basebackup as given below:
edbs-MacBook-Pro:standby postgres$ pg_basebackup -D /Users/postgres/test_backup -v -Fp -l "Standby Backup" -h localhost
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
pg_basebackup: base backup completed
iii. Now lets connect to database and make Named restore point using pg_create_restore_point
postgres=# select pg_create_restore_point('myrecoverylocation');
(1 row)
To verify the behavior of recovery_target_name parameter, lets create a table and perform some activities:
postgres=# create table test_id as select id from generate_series(1,1000000) as t(id);
SELECT 1000000
iv. Now, create recovery.conf file as given below in Backup directory:
restore_command = 'cp -i /Users/postgres/archive/%f %p'  # e.g. 'cp /mnt/server/archivedir/%f %p'
recovery_target_name = myrecoverylocation
and change the port number of restored postgresql.conf as given below:
vi /Users/postgres/test_backup/postgresql.conf
v. Now, start the restored PostgreSQL cluster as given below:
pg_ctl -D /Users/postgres/test_backup start
After executing above command, PostgreSQL will perform recovery till recovery_target_name location. To verify this, user can check PostgreSQL logfile, which will display message something like given below:
2011-11-14 17:15:18 IST LOG:  database system was interrupted; last known up at 2011-11-14 17:10:51 IST
2011-11-14 17:15:18 IST LOG:  creating missing WAL directory "pg_xlog/archive_status"
2011-11-14 17:15:18 IST LOG:  starting point-in-time recovery to "myrecoverylocation"
With this, after connecting to restore database, user can see that test_id table which we had created, after pg_create_restore_point('myrecoverylocation'), will not be part of restored database as given below:
--Restored Database:
edbs-MacBook-Pro:test_backup postgres$ psql -p 5433
psql (9.1.1)
Type "help" for help.

postgres=# \dt
             List of relations
 Schema |     Name      | Type  |  Owner   
 public | dept          | table | postgres
 public | emp           | table | postgres
 public | employees     | table | postgres
 public | entities      | table | postgres
 public | jobhist       | table | postgres
 public | like_op_table | table | postgres
 public | person_job    | table | postgres
 public | test          | table | postgres
 public | test_default  | table | postgres
(9 rows)

-- Primary Database:
postgres=# \dt
             List of relations
 Schema |     Name      | Type  |  Owner   
 public | dept          | table | postgres
 public | emp           | table | postgres
 public | employees     | table | postgres
 public | entities      | table | postgres
 public | jobhist       | table | postgres
 public | like_op_table | table | postgres
 public | person_job    | table | postgres
 public | test          | table | postgres
 public | test_default  | table | postgres
 public | test_id       | table | postgres
(10 rows)

b. Standby recovery to switch to a new timeline automatically Till 9.0, when user specify recovery_target_timeline='latest', postgreSQL scan for the latest timeline at the beginning of recovery, and pick that as the target. If new timelines appear during recovery, PostgreSQL stick to the target chosen in the beginning, the new timelines are ignored. To make PostgreSQL to notice about the new timeline, user has to restart the Standby. Now in 9.1, Standby servers scan the archive directory for new timelines periodically and switch to new timeline new timeline appears during recovery.

c. New parameter: restart_after_crash = on/off By default till 9.0, whenever, there is backend crash PostgreSQL used to automatically get restarted and there was no control on restarting of PostgreSQL (like for clusterware solutions, where clusterware userd to try to restart the PostgreSQL). Adding this parameter gives control on restart of postgreSQL. If value of restart_after_crash is on, then PostgreSQL will restart automatically, after a backend crash. If value is off, then PostgreSQL will not restart, after backend crash and will be get shutdown.

Thursday, November 3, 2011

New Features in Slony 2.1

Slony 2.1 has been released on 19th Oct. 2011. So, I thought to look at some important improvement done in this new release, which can make users life easier.

Before discussing about the changes, lets setup slony replication. Following Codes can be use for setting up slony replication.
####   Preable Scripts:

cluster name=slonytest;
NODE 1 ADMIN CONNINFO = 'dbname=postgres host=localhost user=postgres port=5432 password=postgres';
NODE 2 ADMIN CONNINFO = 'dbname=repdb host=localhost user=postgres port=5432 password=postgres';
Adding Node script:
### create_nodes.slonik

include ;
init cluster (id=1, comment='slonytest node 1');
store node (id=2, comment='slonytest subscriber node 2', event node=1);
Storing Path:
### store_paths.slonik

include ;
STORE PATH (SERVER=1, CLIENT=2, CONNINFO='dbname=postgres host=localhost user=postgres port=5432 password=Empid#042');
STORE PATH (SERVER=2, CLIENT=1, CONNINFO='dbname=repdb host=localhost user=postgres port=5432 password=Empid#042');
Master Slon Process conf file:
## Master Slon Conf:

cleanup_interval="10 minutes"
conn_info='dbname=postgres host=localhost user=postgres port=5432 password=postgres'
sql_on_connection="SET log_min_duration_statement TO '1000';"
Slave Slon process conf file:
## Slave Slon Conf:
cleanup_interval="10 minutes"
conn_info='dbname=repdb host=localhost user=postgres port=5432 password=Empid#042'
sql_on_connection="SET log_min_duration_statement TO '1000';"
Ater creating above configuration, user can setup Slony Replication. Now, lets look at the features added in slony 2.1

  1. Support for adding tables in bulk
 In Previous Version of Slony, if user has to add tables of a particular schema, then he has to write slonik command as given below:
include ;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table (id=1, set id=1, origin=1, fully qualified name='public.dept', comment='slonytest table public.dept');
set add table (id=2, set id=1, origin=1, fully qualified name='public.emp', comment='slonytest table public.emp');
set add table (id=3, set id=1, origin=1, fully qualified name='public.employees', comment='slonytest table public.employees');
Which was a bit work, where user has to list all the tables and then add into slonik command Or user has to write a script which can generate slonik command.
 Now, in slony 2.1, user can write single command to add all tables of a schema in slony replication as given below:
include ;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');
Snapshot of activity is given below:
edbs-MacBook-Pro:slony vibhor$ cat create_set.slonik
include ;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');
edbs-MacBook-Pro:slony vibhor$ 
edbs-MacBook-Pro:slony vibhor$ slonik < create_set.slonik
User can verify the added tables in bulk using following command:
postgres=# select * from _slonytest.sl_table;
 tab_id | tab_reloid | tab_relname | tab_nspname | tab_set |   tab_idxname   | tab_altered |   tab_comment    
      1 |      16438 | dept        | public      |       1 | dept_pk         | f           | replicated table
      2 |      16445 | emp         | public      |       1 | emp_pk          | f           | replicated table
      3 |      16995 | employees   | public      |       1 | employees_pkey  | f           | replicated table
      4 |      16987 | entities    | public      |       1 | entities_pkey   | f           | replicated table
      5 |      16456 | jobhist     | public      |       1 | jobhist_pk      | f           | replicated table
      6 |      16967 | person_job  | public      |       1 | person_job_pkey | f           | replicated table
(6 rows)
If user wants to add all sequences of a particular schema in Slony Replication, then he can try something like given below:
include ;
set add sequences(set id=1, tables='public.*seq'); ## Adding all sequences of Public schema
User can also use some patterns while adding schema as given below:
include ;
set add tables(set id=1, tables='public.slonytest_[1234]',add sequences=true); ## adding tables: slonytest_1, slonytest_2, slonytest_3
2. Implicit WAIT FOR
 In Previous version of Slony, user has to use WAIT FOR Clause to before executing command like MergeSet/MoveSet
     MERGE SET ( ID = 1, ADD ID = 999, ORIGIN = 1 );
Now, in 2.1, if user forgets WAIT FOR clause, then slony will use implicit WAIT FOR whenever it requires. However, user has to keep following in mind:
 a. User should not run multiple slonik script parallely. Since, running parallel slonik script may confuse Slonu implicit WAIT FOR behavior.
 b. Slonik Command will use WAIT FOR if event node changes.
 c. Slonik Command will wait before executing SUBSCRIVE SET/DROP NODE/CLONE NODE Commands, till primary caught up with other slave nodes.
 d. Slonik Command CREATE SET will wait, untill all outstanding DROP SET confirmed by master and slaves in a Slony Replication cluster. e. WAIT FOR command will not work in TRY block. --- (Incompatibilities of other version)

  3. Support for replicating TRUNCATE

 Slony 2.1 has introduced feature of Replication TRUNCATE Command on slaves. TRUNCATE replication is possible for PostgreSQL version >=8.4.
 When user subscibe a table using slony 2.1, then slony creates a _slonytest_truncatetrigger trigger on master to replicate those truncate command.
 Lets see its work:
edbs-MacBook-Pro:~ vibhor$ psql -c "truncate table person_job" postgres
edbs-MacBook-Pro:~ vibhor$ psql -c "select * from person_job" repdb
 pid | job 
(0 rows)
4. Health checks at startup
 slony 2.1, now checks the status of Each node during startup and give proper message to fix the issue. In health check it checks the configuration information and returns OK, if there is no problem.
Pl/pgSQL function which has been added for health check is given below:
CREATE OR REPLACE FUNCTION _slonytest.slon_node_health_check()
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
  prec record;
  all_ok boolean;
  all_ok := 't'::boolean;
  -- validate that all tables in sl_table have:
  --      sl_table agreeing with pg_class
  for prec in select tab_id, tab_relname, tab_nspname from
  "_slonytest".sl_table t where not exists (select 1 from pg_catalog.pg_class c, pg_catalog.pg_namespace n
    where c.oid = t.tab_reloid and c.relname = t.tab_relname and c.relnamespace = n.oid and n.nspname = t.tab_nspname) loop
    all_ok := 'f'::boolean;
    raise warning 'table [id,nsp,name]=[%,%,%] - sl_table does not match pg_class/pg_namespace', prec.tab_id, prec.tab_relname, prec.tab_nspname;
  end loop;
  if not all_ok then
     raise warning 'Mismatch found between sl_table and pg_class.  Slonik command REPAIR CONFIG may be useful to rectify this.';
  end if;
  return all_ok;
5. Performance improvement in cases of large backlog
 Slony 2.1 came with performance improvement in case of large replication backlog. In previous versions, user used to experience performance impact when sl_log_* tables grows and replication of data to subscriber takes so long. Due to which Slony Replication takes huge time to catch up. Main cause of this performance issue was: Slony used to do sequential scan while pulling data from sl_log_* tables. Now, in 2.1, queries which pulls data from sl_log_* have been modified and now it uses index scan.

  6. Monitoring thread to provide better monitoring data
 2.1, also came with one monitoring table sl_components. This table is very useful in monitoring the Slony Threads. Description/detail of sl_components can be found in following link:
Snapshot of informantion display by sl_components is given below:
postgres=# select * from sl_components;
       co_actor       | co_pid | co_node | co_connection_pid |   co_activity    |       co_starttime        |  co_event  | co_eventtype 
 local_listen         |  26688 |       1 |             26692 | thread main loop | 2011-11-03 12:07:56+05:30 |            | n/a
 local_sync           |  26688 |       0 |             26697 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remote listener      |  26688 |       2 |             26695 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remoteWorkerThread_2 |  26688 |       2 |             26694 | SYNC             | 2011-11-03 12:08:00+05:30 | 5000004917 | SYNC
 local_monitor        |  26688 |       0 |             26698 | thread main loop | 2011-11-03 00:32:19+05:30 |            | n/a
 local_cleanup        |  26688 |       0 |             26696 | cleanupEvent     | 2011-11-03 12:00:15+05:30 |            | n/a
(6 rows)