Queries Improvement in PostgreSQL 9.1

1. True serializable isolation level:
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database. Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.

Following is a link on more details:

http://wiki.postgresql.org/wiki/SSI

2. INSERT/UPDATE/DELETE in WITH CLAUSE.

Now in 9.1, User would be able include INSERT/UPDATE/DELETE in WITH Clause and can pass data to the containing query.
Following are some example:

-- INSERT ... RETURNING
WITH t AS (
    INSERT INTO y
    VALUES
        (11),
        (12),
        (13),
        (14),
        (15),
        (16),
        (17),
        (18),
        (19),
        (20)
    RETURNING *
)
SELECT * FROM t;

-- UPDATE ... RETURNING
WITH t AS (
    UPDATE y
    SET a=a+1
    RETURNING *
)
SELECT * FROM t;

-- DELETE ... RETURNING
WITH t AS (
    DELETE FROM y
    WHERE a <= 10
    RETURNING *
)
SELECT * FROM t;

Also, user can attache WITH CLAUSE to INSERT/UPDATE and DELETE Statements as given below:

-- data-modifying WITH in a modifying statement
WITH t AS (
    DELETE FROM y
    WHERE a <= 10
    RETURNING *
)
INSERT INTO y SELECT -a FROM t RETURNING *;

3. GROUP BY enhancement for missing columns
Previous version of PostgreSQL used to throw error message, if user forgets to specify any columns of target list in GROUP BY Clause, even if primary key is specified, as given below:

select version();
                                                     version                                                      
------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.0.5 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit

SELECT count(*), entity_name, entity_address as address
FROM entities JOIN employees using (entity_name)
GROUP BY entity_name;
ERROR:  column "entities.entity_address" must appear in the GROUP BY clause or be used in an aggregate function

Now in PostgreSQL 9.1, GROUP BY Clause can gues about the missing Column as given below:

SELECT count(*), entity_name, entity_address as address
FROM entities JOIN employees using (entity_name)
GROUP BY entity_name;
 count | entity_name | address  
-------+-------------+----------
     2 | HR          | address1
     2 | SALES       | address2
(2 rows)

4. Per Column Collation.
In Previous version of PostgreSQL, collation (the sort ordering of text string) was supported only at database level. Now, in 9.1, user can set collation per column, index, or expression via COLLATE Clause. Some example is given below:

postgres=#  CREATE TABLE french_messages (message TEXT COLLATE "fr_FR");
CREATE TABLE
postgres=# select * from french_messages order by 1;
 message 
---------
 Élève
 élever
 élevé
 élève
(4 rows)

postgres=# SELECT a, b, a < b as lt FROM
postgres-#   (VALUES ('a', 'B'), ('A', 'b' COLLATE "C")) v(a,b);
 a | b | lt 
---+---+----
 a | B | f
 A | b | t
(2 rows)

5. Planner Improvements a. Inheritance Table.
Now in 9.1, Some planner improvements has been done, like MIN/MAX for Inheritance tables. This improvement will boost up performance queries for user, if they are using paritioning. lets see those improvements:
i) Create partition table as given below:

      CREATE TABLE main(id integer, val text);
      CREATE TABLE parition_1(CHECK(id >=1 and id <=20)) INHERITS(main);
      CREATE TABLE parition_2(CHECK(id >=21 and id <=40)) INHERITS(main);
      CREATE TABLE parition_3(CHECK(id >=41 and id <=60)) INHERITS(main);
      CREATE TABLE parition_other(CHECK(id >=61)) INHERITS(main);
      CREATE INDEX parition_1_idx on parition_1(id);
      CREATE INDEX parition_2_idx on parition_2(id);
      CREATE INDEX parition_3_idx on parition_3(id);
      CREATE INDEX parition_other_idx on parition_other(id);

ii) Create trigger as given below:

       CREATE OR REPLACE FUNCTION main_insert_direct( )
       RETURNS  trigger
       LANGUAGE plpgsql
       AS $function$
            BEGIN
                   IF NEW.id >=1 AND NEW.id <=20 THEN
                       INSERT INTO parition_1 values(NEW.*);
                   ELSIF NEW.id >=21 AND NEW.ID <=40 THEN
   INSERT INTO parition_2 values(NEW.*);
                   ELSIF NEW.id >=41 AND NEW.ID <=60 THEN
   INSERT INTO parition_3 values(NEW.*);
                   ELSE 
                         INSERT INTO parition_other VALUES(NEW.*);
                   END IF;
                   RETURN NULL;
           END;
       $function$;

       CREATE TRIGGER insert_on_main
         BEFORE INSERT ON main
         FOR EACH ROW EXECUTE PROCEDURE main_insert_direct();

iii). INSERT some values as given below:

        INSERT INTO main SELECT * FROM generate_series(1,1000000000);

Now lets see plan. On previous version of PostgreSQL:

postgres=# explain analyze select id from main order by id desc limit 10

                                                                  QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1148.68..1148.70 rows=10 width=4) (actual time=147.799..147.812 rows=10 loops=1)
   ->  Sort  (cost=1148.68..1220.94 rows=28905 width=4) (actual time=147.796..147.800 rows=10 loops=1)
         Sort Key: public.main.id
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Result  (cost=0.00..524.05 rows=28905 width=4) (actual time=0.018..117.908 rows=50000 loops=1)
               ->  Append  (cost=0.00..524.05 rows=28905 width=4) (actual time=0.017..74.136 rows=50000 loops=1)
                     ->  Seq Scan on main  (cost=0.00..22.30 rows=1230 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                     ->  Seq Scan on parition_1 main  (cost=0.00..22.30 rows=1230 width=4) (actual time=0.014..0.073 rows=100 loops=1)
                     ->  Seq Scan on parition_2 main  (cost=0.00..22.30 rows=1230 width=4) (actual time=0.011..0.070 rows=100 loops=1)
                     ->  Seq Scan on parition_3 main  (cost=0.00..22.30 rows=1230 width=4) (actual time=0.009..0.057 rows=100 loops=1)
                     ->  Seq Scan on parition_other main  (cost=0.00..434.85 rows=23985 width=4) (actual time=0.021..32.197 rows=49700 loops=1)
 Total runtime: 147.921 ms
(12 rows)

However, in 9.1:

    postgres=# explain analyze select id from main order by id desc limit 10                                                          ;
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.07..6.44 rows=10 width=4) (actual time=6.828..6.849 rows=10 loops=1)
   ->  Result  (cost=1.07..10790.41 rows=20072 width=4) (actual time=6.825..6.844 rows=10 loops=1)
         ->  Merge Append  (cost=1.07..10790.41 rows=20072 width=4) (actual time=6.824..6.840 rows=10 loops=1)
               Sort Key: public.main.id
               ->  Sort  (cost=1.01..1.01 rows=1 width=4) (actual time=0.026..0.026 rows=0 loops=1)
                     Sort Key: public.main.id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on main  (cost=0.00..1.00 rows=1 width=4) (actual time=0.020..0.020 rows=0 loops=1)
               ->  Index Scan Backward using parition_1_idx on parition_1 main  (cost=0.00..13.17 rows=61 width=4) (actual time=0.028..0.028 rows=1 loops=1)
               ->  Index Scan Backward using parition_2_idx on parition_2 main  (cost=0.00..13.15 rows=60 width=4) (actual time=0.024..0.024 rows=1 loops=1)
               ->  Index Scan Backward using parition_3_idx on parition_3 main  (cost=0.00..13.15 rows=60 width=4) (actual time=0.024..0.024 rows=1 loops=1)
               ->  Index Scan Backward using parition_other_idx on parition_other main  (cost=0.00..10233.63 rows=19890 width=4) (actual time=6.716..6.727 rows=10 loops=1)
 Total runtime: 6.932 ms
(13 rows)

Looking at above Plans, in PostgreSQL 9.0, Executor will take all from all child table will do sorting before returning 10 records. However, in PostgreSQL 9.1, Executor will take records from sorted child table and will use indexes (if available) to merge them with the sorted one to return 10 records. Looking at Total runtime, 9.1 is faster.
b. FULL OUTER JOIN improvement:
In 9.1, FULL OUTER JOIN can now use HASH Algorithms, which is faster than old method FULL OUTER JOIN (i.e. 2 Sorts) as given below:

i) Create tables and insert some values, as given below:

     CREATE TABLE test1 (a int);
     CREATE TABLE test2 (a int);
     INSERT INTO test1 SELECT generate_series(1,100000);
     INSERT INTO test2 SELECT generate_series(100,1000);

ii). EXPLAIN ANALYZE: In Previous version of PostgreSQL:

        postgres=#  EXPLAIN ANALYZE SELECT * FROM test1 FULL OUTER JOIN test2 USING (a);
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=10952.05..11465.56 rows=100000 width=8) (actual time=214.420..370.898 rows=100000 loops=1)
   Merge Cond: (test1.a = test2.a)
   ->  Sort  (cost=10894.82..11144.82 rows=100000 width=4) (actual time=213.512..269.596 rows=100000 loops=1)
         Sort Key: test1.a
         Sort Method:  external sort  Disk: 1368kB
         ->  Seq Scan on test1  (cost=0.00..1393.00 rows=100000 width=4) (actual time=0.010..50.044 rows=100000 loops=1)
   ->  Sort  (cost=57.23..59.48 rows=901 width=4) (actual time=0.894..1.309 rows=901 loops=1)
         Sort Key: test2.a
         Sort Method:  quicksort  Memory: 38kB
         ->  Seq Scan on test2  (cost=0.00..13.01 rows=901 width=4) (actual time=0.012..0.436 rows=901 loops=1)
 Total runtime: 412.315 ms
(11 rows)

In PostgreSQL 9.1:

postgres=#  EXPLAIN ANALYZE SELECT * FROM test1 FULL OUTER JOIN test2 USING (a);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=24.27..1851.28 rows=100000 width=8) (actual time=0.588..74.434 rows=100000 loops=1)
   Hash Cond: (test1.a = test2.a)
   ->  Seq Scan on test1  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..21.683 rows=100000 loops=1)
   ->  Hash  (cost=13.01..13.01 rows=901 width=4) (actual time=0.563..0.563 rows=901 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 32kB
         ->  Seq Scan on test2  (cost=0.00..13.01 rows=901 width=4) (actual time=0.014..0.217 rows=901 loops=1)
 Total runtime: 82.555 ms
(7 rows)

Looking at above EXPLAIN ANALYZE, PostgreSQL 9.1 needs to create a HASH on the smallest table. However, in Previous Version, PostgreSQL required 2 sorts (one on smallest table test2 and one on test1) which is costly, which shows that 9.1 optimizer is smarter and giving better plan.

Search This Blog

Database Technologies

Queries Improvement in PostgreSQL 9.1

Comments

Post a Comment

Popular posts from this blog

Does UPDATE Change ROWID in Oracle?

PostgreSQL Database Link to Oracle Database on Linux

Fix of "ORA-29275: partial multibyte character"