SlideShare a Scribd company logo
1 of 96
Download to read offline
8.3
A Story of Many Patches




                            December 2007
                                  FOSS.in
        Josh Berkus, PostgreSQL Core Team
PostgreSQL India?
PostgreSQL 8.3 In Beta
Many, Many Patches
E.1. Release 8.3
Release date: 2007-12-??

Release date: CURRENT AS OF 2007-10-24

E.1.1. Overview
This release represents a major leap forward for PostgreSQL by adding significant new functionality and
performance enhancements. This was made possible by a growing community that has dramatically accelerated the
pace of development. This release adds the follow major capabilities:



Full text search now fully integrated into the core database system

Support the SQL/XML standard, including new operators and an XML data type

Support for enumerated data types (ENUM)

Add Universally Unique Identifier (UUID) data type

Support arrays of composite types

Add control over whether NULLs sort first or last

Support updatable cursors
PostgreSQL 8.3 Features
●   Developer                 ●   Consistency
    –   SQL/XML                   –   HOT
    –   Integrated TSearch2       –   Load Distributed
    –   UUID, ENUM                    Checkpoint
    –   PL/pgSQL debugging
                              ●   Performance
●   Admin                         –   Synchronized Scan
    –   CSV Logging
                                  –   Asynch Commit
    –   Better Stats
                              ●   Accessories
    –   pgStandby                 –   pgBouncer
                                  –   pgSNMP
Many Developers
Tom Lane, USA                  Teodor Sigaev, Russia          Steve Marshall
Peter Eisentraut, Germany      Alvaro Herrera, Chile          Paul Bayer
Bruce Momjian, USA             Mark Kirkwood, New Zealand     Doug Knight
Dave Page, England             Joachim Wieland                Greg Sabino Mullane, USA
Pavan Deolasee, India          Henry Hotz, USA                Chad Wagner
Itagaki Takahiro, Japan        Magnus Haglander, Sweden       Brendan Jurd
Greg Smith, USA                Tatsuo Ishii, Japan            Euler Taviera de Oliveira, Braz
David Fetter, USA              Victor Wagner                  Joe Conway, USA
Pavel Stehule, Czech           Bill Moran, USA                Simon Riggs, England
Greg Stark, England            Andrew Dunstan, USA            Guillaume Smet, France
Jan Wieck, USA                 Arul Shaji, Australia          Hiroshi Saito, Japan
Oleg Bartunov, Russia          Nickolay Samokhvalov, Russia   Chris Marcellino, Italy
Florian Pflug                  Neil Conway, Canada            Dave Cramer, Canada
Jeff Davis, USA                Marc Fournier, Canada          Devrim Gunduz, Turkey
Trevor Hardcastle              Jaime Casanova, Ecuador        Gavin Sherry, Australia
Nikhil S, India                Albert Cervera                 Jeremy Drake
Holdger Schurig                Bernd Helmle, Germany          Marko Kreen, Estonia
D'Arcy Cain, Canada            Glen Parker                    Kris Jurka, USA
Gevik Babakhani, Netherlands   Heikki Linnakangas, Finland    Tom Dunstan, USA
Many Developers
Tom Lane, USA                  Teodor Sigaev, Russia       Steve Marshall
Peter Eisentraut, Germany      Alvaro Herrera, Chile       Paul Bayer
Bruce Momjian, USA             Mark Kirkwood, New Zealand Doug Knight
Dave Page, England             Joachim Wieland             Greg Sabino Mullane, USA
Pavan Deolasee, India          Henry Hotz, USA             Chad Wagner
Itagaki Takahiro, Japan        Magnus Haglander, Sweden    Brendan Jurd
Greg Smith, USA                Tatsuo Ishii, Japan         Euler Taviera de Oliveira, Braz
David Fetter, USA              Victor Wagner               Joe Conway, USA
Pavel Stehule, Czech           Bill Moran, USA             Simon Riggs, England
Greg Stark, England            Andrew Dunstan, USA         Guillaume Smet, France
Jan Wieck, USA                 Arul Shaji, Australia       Hiroshi Saito, Japan
Oleg Bartunov, Russia          Nickolay Samokhvalov, RussiaChris Marcellino, Italy
Florian Pflug                  Neil Conway, Canada         Dave Cramer, Canada
Jeff Davis, USA                Marc Fournier, Canada       Devrim Gunduz, Turkey
Trevor Hardcastle              Jaime Casanova, Ecuador     Gavin Sherry, Australia
Nikhil S, India                Albert Cervera              Jeremy Drake
Holdger Schurig                Bernd Helmle, Germany       Marko Kreen, Estonia
D'Arcy Cain, Canada            Glen Parker                 Kris Jurka, USA
Gevik Babakhani, Netherlands   Heikki Linnakangas, Finland Tom Dunstan, USA
PostgreSQL 8.3 Features
●   Developer                 ●   Consistency
    – SQL/XML                     – HOT
    –   Integrated TSearch2       –   Load Distributed
    –   UUID, ENUM                    Checkpoint
    –   PL/pgSQL debugging
                              ●   Performance
●   Admin                         –   Synchronized Scan
    –   CSV Logging
                                  –   Asynch Commit
    –   Better Stats
                              ●   Accessories
    –   pgStandby                 –   pgBouncer
                                  –   pgSNMP
Why contribute?
●   PostgreSQL is a community project
    –   owned by the community, run by the community
    –   if you contribute, you are a full participant
         ●   unlike some other databases
●   Tinker with cool database stuff
    –   we are hard-core database geeks
    –   learn a lot from top database hackers
●   Improve your employment prospects
    –   database engineers are always in demand
SQL/XML
XMLROOT (                     <?xml version=’1.0’
   XMLELEMENT (                  standalone=’yes’ ?>
      NAME ’gazonk’,             <gazonk name=’val’
                                     num=’2’>
      XMLATTRIBUTES (                <qux>foo</qux>
          ’val’ AS ’name’,       </gazonk>
          1 + 1 AS ’num’),
   XMLELEMENT (
      NAME ’qux’,
      ’foo’)
                             table_to_xml(tbl regclass,
   ),
                                nulls boolean,
   VERSION ’1.0’,
                                tableforest boolean,
   STANDALONE YES )
                                targetns text)

         SELECT *
         FROM table1
         WHERE (xpath(’//person/name/text()’,
         xdata))[1]::text = ’John Smith’;
SQL/XML Prior Work
                                                          2002
●   TorchBox contributes /contrib/xml2
    –   Some XML functionality:
         ●   Xpath functions
         ●   XSLT functions
    –   BUT
         ●   Non-standard, completely PostgreSQL syntax
         ●   No real data type
         ●   Many features missing
              –   Charset support
              –   DTD support
SQL/XML Prior Work
                                         2004
●   Peter Eisentraut writes XML export
    –   Export table to XML
    –   BUT
         ●   prototype only
         ●   not useful without
             other XML functionality
         ●   syntax requires changing
             PostgreSQL parser
SQL/XML Prior Work
                                          2005
●   Pavel Stuehle writes SQL/XML syntax demo
    –   First standard syntax example
    –   BUT
         ●   depends on PL/perl
         ●   prototype only
         ●   does not integrate
             with /contrib/xml2
Nikolay Samokhvalov
         ●   Graduate Student at
             University of Moscow
         ●   Met major contributor
             Oleg Bartunov in 2005
             –   ported MoiKrug.ru to
                 PostgreSQL
         ●   Masters Thesis:
             updatable XML views
             in RDBMS
●   Google funds 700 students to work on Open
    Source
    –   PostgreSQL gets 7
●   Nickolay proposes project for SQL/XML
    –   Proposal accepted
    –   Peter will mentor
SoC Proposal
[SoC Proposal] Initial support of XMLType for PostgreSQL
Summary
Primary goal is introduction of special type support for storing XML data in ORDBMS PostgreSQL, querying this data and
modifying it. This project is intended to develop manipulation abilities rather than special storage engine (VARCHAR as
initial storage implicit type).

At the moment there is no good general vision of most suitable storage for XMLType. Moreover, from my point of view,
DBMS should have support of different index types for XMLType - every for its special purpose. And which is more
important is an open question. That's why I propose to work on 'external' things rather than 'internals' (data structure for
index) and strictly follow standards. But anyway, I've included path index (#7 in the list of Deliverables), because now I
suppose that it is most expectative type of index (this item is optional, because here community's feedback is highly
needed).
Deliverables
Ability to define any column as of XMLType. Initially, this means that only well-formed XML documents could be stored
      in such a column.
Automatic validation of XML documents being inserted/modified against XML schema, if definition of column contains it
(reference to it). DTD and/or XML Schema could be used for this.
Subset of SQL/XML standard [1] for mixing relational and XML data in queries. This includes at least following:
XMLELEMENT, XMLAGG, XMLFOREST, XMLCONCAT expressions; implementation of mapping rules for basic
types. (See Project Details for more details).
XML domains support: possibility to define domain based on XMLType, using XML schema (DTD / XML Schema).
Basic XPath support (existing experience - contrib/xml2 module - should be taken into account).
Basic XSLT support (existing experience - contrib/xml2 module - should be taken into account).
Path indexes for fast retrieval of XML documents (queries with XPath expressions in WHERE clause). [OPTIONAL]
Documentation (definition rules for XMLType, SQL/XML expressions, etc).
Mentor: Peter Eisentraut
●   From Aachen, Germany
●   Core Team member since 2004
●   In charge of PostgreSQL Documentation
●   Prior XML work
Specification Research
●   ANSI SQL 2006 -- SQL/XML
XML Publishing Functions
xmlelement()       Creates an XML element, allowing the name to be specified.
xmlattributes()    Creates XML attributes from columns, using the name of each
column as the name of the corresponding attribute.
xmlroot()          Creates the root node of an XML document.
xmlcomment()       Creates an XML comment.
xmlpi()            Creates an XML processing instruction.
xmlparse()         Parses a string as XML and returns the resulting XML structure.
xmlforest()        Creates XML elements from columns, using the name of each
column as the name of the corresponding element.
xmlconcat()        Combines a list of individual XML values to create a single value
containing an XML forest.
xmlagg()           Combines a collection of rows, each containing a single XML value,
to create a single value containing an XML forest.
Specification Research
●   ANSI SQL 2006 -- SQL/XML
Specification Proposal
 Re: Google SoC--Idea Request

 From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>
 Subject: Re: Google SoC--Idea Request
 Date: Tue, 2 May 2006 12:34:43 +0400

 Proposal: XMLType for PostgreSQL.

 *** Minimum: ***
 to have special type support for storing XML data and working with it.
 This means following:
 - ability to define any column of a table as of XMLType; internally,
 all data is stored as VARCHAR;
 - auto validation of documents against XML schema, if it was
 specified in column
 definition or in XML data sheets themselves (DTD, XSD or at least one
 of them) /*contrib/xml2 has such feature, but it uses libxml, what
 means DOM interface. Maybe it's better to use some SAX parser to solve
 this task*/;
 - XPath indexes for queries with path expressions in WHERE clause /*I
 suppose this kind of indexes would be most frequently used. I propose
 using good labeling schema and GIST and/or Gin here*/;
 - some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has
 more than 400 pages now and contains some established constructions,
 that are using in other DBMSes. There is the some patch already
 written by Pavel Stehule:
 http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
 it was kept for 8.2, so what is the result?) I've tested it several
 months ago, basic SQL/XML functions worked fine. It changes grammar,
Specification Revisions
XML export function signatures

From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>
To: pgsql-hackers ( at ) postgresql ( dot ) org
Subject: XML export function signatures
Date: Mon, 12 Feb 2007 20:18:59 +0100

Here are the proposed signatures for the XML export functions.

While I have seen the output formats in use elsewhere, I could not find
any useful information on how to invoke these mappings, so the
following is purely my own invention.

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xml
query_to_xml(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xml
table_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xml
query_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xml
table_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text)
RETURNS xml
query_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text)
RETURNS xml
cursor_get_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text)
RETURNS xml
cursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text) RETURNS
     xml
Specification Revisions
 Re: XML export function signatures

 From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>
 To: Andrew Dunstan <andrew ( at ) dunslane ( dot ) net>
 Subject: Re: XML export function signatures
 Date: Mon, 12 Feb 2007 23:57:49 +0100

 Andrew Dunstan wrote:
 > . table_to_xml_and_xmlschema   seems like a mouthful - can we shorten
 > it a bit?

 Well, it gives you back a mouthful of data, too. :)

 > . what are the two ways of representing data that tableforest
 > distinguishes?

 tableforest = false gives you something like

 <tablename>
  <row> <!-- where "row" is constant -->
   <col1name>data</col1name>
   <col2name>data</col2name>
  </row>
  <row>
Approved Specification
●   XML Data Type
    –   type-safe XML storage
    –   supports XML operators, functions
●   XML Functions
    –   Generation
    –   Manipulation (XLST)
    –   Export
    –   XPath Query
●   XML Expressions
    –   IS DOCUMENT, etc.
Code
Modifica-
 tions
Initial Versions of Patch
  Updated XML patch

  From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>
  To: pgsql-patches ( at ) postgresql ( dot ) org
  Subject: Updated XML patch
  Date: Thu, 14 Dec 2006 23:02:05 +0100

  Attached is an updated patch for XML functionality, which subsumes all
  earlier patches on the subject. This includes a data type with format
  checking, and functions to mangle values. For the moment, I have cut
  out the inessential stuff such as xpath. The included regression test
  file xml.sql shows some of the things that work.

  This patch already covers most of the parser work. What is left
  hereafter is adjusting all the corner cases, the escaping rules, and
  the various XML parser behaviors.

  Use configure --with-libxml to build.

  --
  Peter Eisentraut
  http://developer.postgresql.org/~petere/
  Attachment: current-xml-patch.bz2Description: BZip2 compressed data
static void              Initial Patch
+_outXmlExpr(StringInfo str, XmlExpr *node)
+{
+    WRITE_NODE_TYPE("XMLEXPR");
+
+    WRITE_ENUM_FIELD(op, XmlExprOp);
+    WRITE_STRING_FIELD(name);
+    WRITE_NODE_FIELD(named_args);
+    WRITE_NODE_FIELD(args);
+}
+
+static void
 _outCoerceToDomain(StringInfo str, CoerceToDomain *node)
 {
     WRITE_NODE_TYPE("COERCETODOMAIN");
@@ -2019,6 +2030,9 @@
              case T_BooleanTest:
                  _outBooleanTest(str, obj);
                  break;
+             case T_XmlExpr:
+                 _outXmlExpr(str, obj);
+                 break;
              case T_CoerceToDomain:
                  _outCoerceToDomain(str, obj);
                  break;
diff -Nru -x configure ../cvs-pgsql/src/backend/nodes/readfuncs.c
./src/backend/nodes/readfuncs.c
--- ../cvs-pgsql/src/backend/nodes/readfuncs.c 2006-12-12 16:31:46.000000000 +0100
+++ ./src/backend/nodes/readfuncs.c     2006-12-14 21:20:08.000000000 +0100
@@ -765,6 +765,22 @@
 }
Patch Revisions
Re: xml type and encodings

From: "Andrew Dunstan" <andrew ( at ) dunslane ( dot ) net>
To: "Peter Eisentraut" <peter_e ( at ) gmx ( dot ) net>
Subject: Re: xml type and encodings
Date: Mon, 15 Jan 2007 16:35:13 -0600 (CST)

Peter Eisentraut wrote:
> Florian G. Pflug wrote:
>> Couldn't the server change the encoding declaration inside the xml to
>> the correct
>> one (the same as client_encoding) before returning the result?
>
> The data type output function doesn't know what the client encoding is
> or whether the data will be shipped to the client at all. But what I'm
> thinking is that we should remove the encoding declaration if possible.
> At least that would be less confusing, albeit still potentially
> incorrect if the client continues to process the document without care.

The XML SPec says:

"In the absence of information provided by an external transport protocol
(e.g. HTTP or MIME), it is a fatal error for an entity including an
encoding declaration to be presented to the XML processor in an encoding
other than that named in the declaration, or for an entity which begins
with neither a Byte Order Mark nor an encoding declaration to use an
encoding other than UTF-8. Note that since ASCII is a subset of UTF-8,
ordinary ASCII entities do not strictly need an encoding declaration."
More Patch Revisions
xpath_array with namespaces support

From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>
To: PGSQL-Patches <pgsql-patches ( at ) postgresql ( dot ) org>
Subject: xpath_array with namespaces support
Date: Wed, 21 Feb 2007 02:46:33 +0300

As a result of discussion with Peter, I provide modified patch for
xpath_array() with namespaces support.

The signature is:
 _xml xpath_array(text xpathQuery, xml xmlValue[, _text namespacesBindings])

The third argument is 2-dimensional array defining bindings for
namespaces. Simple examples:

xmltest=# SELECT xpath_array('//text()', '<local:data
xmlns:local="http://127.0.0.1";><local:piece id="1">number
one</local:piece><local:piece id="2" /></local:data>');
 xpath_array
----------------
{"number one"}
(1 row)
Yet More Revisions
correct format for date, time, timestamp for XML functionality

From: "Pavel Stehule" <pavel ( dot ) stehule ( at ) hotmail ( dot ) com>
To: pgsql-patches ( at ) postgresql ( dot ) org
Subject: correct format for date, time, timestamp for XML functionality
Date: Tue, 20 Feb 2007 13:27:42 +0100

Hello,

this patch ensures independency datetime fields on current datestyle setting. Add new internal
datestyle USE_XSD_DATESTYLE. It's almoust same to USE_ISO_DATESTYLE. Differences are for timestamp:
ISO: yyyy-mm-dd hh24:mi:ss
XSD: yyyy-mm-ddThh24:mi:ss

I found one link about this topic:
http://forums.oracle.com/forums/thread.jspa?threadID=467278&tstart=0
Regards
Pavel Stehule
Patch Accepted

From: Bruce Momjian <bruce ( at ) momjian ( dot ) us>
To: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>
Subject: Re: [HACKERS] xml2 contrib patch supporting default XML namespaces
Date: Thu, 22 Mar 2007 16:16:16 -0400 (EDT)

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
Write Documentation
9.14. XML Functions
The functions and function-like expressions described in this section operate on values of type xml. Check Section 8.13 for
information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from type
xml are not repeated here. Use of many of these functions requires the installation to have been built with configure --
with-libxml.

9.14.1. Producing XML Content
A set of functions and function-like expressions are available for producing XML content from SQL data. As such, they are
particularly suitable for formatting query results into XML documents for processing in client applications.

9.14.1.1. xmlcomment
xmlcomment(text)

The function xmlcomment creates an XML value containing an XML comment with the specified text as content. The text
cannot contain -- or end with a - so that the resulting construct is a valid XML comment. If the argument is null, the result
is null.

Example:
SELECT xmlcomment('hello');

  xmlcomment
--------------
<!--hello-->
XML in 8.3 Beta
E.1. Release 8.3
Release date: 2007-12-??

Release date: CURRENT AS OF 2007-10-24

E.1.1. Overview
This release represents a major leap forward for PostgreSQL by adding significant new functionality and
performance enhancements. This was made possible by a growing community that has dramatically accelerated the
pace of development. This release adds the follow major capabilities:



Full text search now fully integrated into the core database system

Support the SQL/XML standard, including new operators and an XML data
   type
Support for enumerated data types (ENUM)

Add Universally Unique Identifier (UUID) data type
SQL/XML Feature Set
●   XML Parsing
●   XML Functions
●   XML Export
●   XPath B-tree Index
Future XML Projects
●   Use of HSTORE for advanced XML indexing
●   Automated XML decomposition
    –   XML-to-Table
    –   XML-to-Schema
●   PL/XSLT
    –   XHTML query
●   XQuery support
HOT
Fastest OSDB
        J2EE Througput                                        Acquisition Cost Comparison
900                                                           200000

800                                                           180000

700                                                           160000




                                         Cost in US Dollars
                                                              140000
600
                                                              120000
500
                                                              100000
400
                                                              80000
300
                                                              60000
200
                                                              40000

100                                                           20000

 0                                                                0
      MySQL   PostgreSQL   Proprietary                                 MySQL   PostgreSQL   Proprietary
Most Scalable
The Consistency
    Problem
VACUUM
What's MVCC?
●   Multi-Version Concurrency Control
    –   Each user gets their own “version” of the data
    –   Allows parallelization of updates/reads
    –   Without it, scalability is not possible
         ●   You have to lock everything
         ●   Or violate ACID transactions
MVCC
Row Version 1
MVCC
Row Version 1   Row Version 1


                Row Version 2


                Row Version 3




SELECT ...      SELECT ...
                BEGIN UPDATE
                BEGIN UPDATE
MVCC
Row Version 1   Row Version 1


                Row Version 2   Row Version 2


                Row Version 3   ROLLBACK



SELECT ...      SELECT ...      SELECT ...
                BEGIN UPDATE    COMMIT
                BEGIN UPDATE    COMMIT
MVCC Them & Us
The Overwriting Model
InnoDB & Oracle

                    Base Relation            Rollback Segment




                                              Old Row Version
                                         y
                                     C op
           UPDATE    Overwrite Row
MVCC Them & Us
The Overwriting Model
InnoDB & Oracle

 ●   Advantages
     –   Low table/index maintenance requirements
     –   Latest row version fast access
 ●   Disadvantages
     –   Transaction isolation can break
     –   Long-running transactions expensive
     –   Rollbacks very expensive
     –   Rollback segment bottleneck
MVCC Them & Us
The Non-overwriting Model
PostgreSQL & Firebird

                        Base Relation



           UPDATE       Old Row Version


                                          Copy


                        New Row Version
MVCC Them & Us
The Non-overwriting Model
PostgreSQL & Firebird

 ●   Advantages
     –   Transaction isolation effortless
     –   Rollbacks free
     –   Long-running transactions not a problem
 ●   Disadvantages
     –   High table/index maintenance
     –   “Frequently updated table” problem
Frequently Updated
            Tables
                    Tuplestore


                  Row C: Version 1
                    small update
Indexes Updated   Row C: Version 2
                    small update
Indexes Updated   Row C: Version 3

                    small update

Indexes Updated   Row C: Version 4

                    large update
Indexes Updated   Row C: Version 5
Frequently Updated
            Tables
                    Tuplestore


                  Row C: Version 1
                    small update
Indexes Updated   Row C: Version 2
                    small update
Indexes Updated   Row C: Version 3

                    small update

Indexes Updated   Row C: Version 4

                    large update
Indexes Updated   Row C: Version 5
Frequently Updated
      Tables
       Tuplestore




     Row C: Version 5
Frequently Updated
      Tables
       Tuplestore




     Row C: Version 5
Frequently Updated
      Tables
       Tuplestore




     Row C: Version 5
Poor Performance
Pavan Deolasee
     ●   Graduated IIT Bombay
         –   focus on databases
     ●   Worked for VERITAS
     ●   Lead Engineer at
         EnterpriseDB
         –   PostgreSQL vendor
         –   Contributes performance
             patches to community
     ●   Lives in Pune
Team Effort




●   Simon Riggs         ●   Heikki Linnakangas, Tom
    –   original            Lane and others
        proposal            –   revisions
    –   prototypes          –   code review
    –   specification       –   bug fixes
Meeting at EnterpriseDB
Initial proposal
●   Update-
    in-Place   Base Relation      HOT File

    with
    HOT file   Row C: Version 1
Initial proposal
●   Update-
    in-Place         Base Relation                    HOT File

    with
    HOT file         Row C: Version 2
                                        copy old version
             A  TE
         U PD                                    Row C: Version 1
Initial proposal
●   Update-
    in-Place       Base Relation                    HOT File

    with
    HOT file       Row C: Version 3

              TE
           DA                                    Row C: Version 1
         UP                           copy old
                                       version
                                                   tuple chain

                                                 Row C: Version 2
Initial proposal
●   Update-
    in-Place       Base Relation                    HOT File

    with
    HOT file       Row C: Version 4

              TE
           DA                                    Row C: Version 1
         UP

                                      copy old     tuple chain
                                       version
                                                 Row C: Version 2

                                                   tuple chain

                                                 Row C: Version 3
Initial proposal
●   Update-
    in-Place   Base Relation

    with
    HOT file   Row C: Version 4
First proposal
             to pgsql-hackers
Frequent Update Project: Design Overview of HOT Updates

From: "Simon Riggs" <simon ( at ) 2ndquadrant ( dot ) com>
To: <pgsql-hackers ( at ) postgresql ( dot ) org>
Subject: Frequent Update Project: Design Overview of HOT Updates
Date: Thu, 09 Nov 2006 17:13:16 +0000

Design Overview of HOT Updates
------------------------------

The objective is to increase the speed of the UPDATE case, while
minimizing the overall negative effects of the UPDATE. We refer to the
general requirement as *Frequent Update Optimization*, though this
design proposal is for Heap Overflow Tuple (HOT) Updates. It is similar
in some ways to the design for SITC already proposed, though has a
number of additional features drawn from other designs to make it a
practical and effective implementation.

EnterpriseDB have a working, performant prototype of this design. There
are still a number of issues to resolve and the intention is to follow
an open community process to find the best way forward. All required
detail will be provided for the work conducted so far.

Current PGSQL behaviour is for UPDATEs to create a new tuple version
within the heap, so acts from many perspectives as if it were an INSERT.
All of the tuple versions are chained together, so that whichever of the
tuples is visible to your Snapshot, you can walk the chain to find the
most recent tuple version to update.
Revisions:
   Reverse Order
            Normal Tuples              HOT Relation File



            Row C: Version 1
                               in-page update
    A  TE
U PD                                   Row C: Version 2

                                        in-page update

                                       Row C: Version 3

                                        in-page update

                                       Row C: Version 4
Revisions:
Chains, not files
            Normal Tuples              HOT Tuple Chain



            Row C: Version 1
                               in-page update
    A  TE
U PD                                   Row C: Version 2

                                        in-page update

                                       Row C: Version 3

                                        in-page update

                                       Row C: Version 4
Add microvacuum
                       Normal Tuples               HOT Tuple Chain



                       Row C: Version 1
                                           in-page update
              A   TE
          U PD                                      Row C: Version 2

                                                     in-page update

                           microvacuum              Row C: Version 3

                                                     in-page update

                                                    Row C: Version 4

                                       new page / index update
Indexes Updated        Row C: Version 5
Submit patch draft v.1
HOT WIP Patch - version 1

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>
To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>
Subject: HOT WIP Patch - version 1
Date: Wed, 14 Feb 2007 15:34:46 +0530


This is a WIP patch based on the recent posting by Simon and discussions
thereafter. We are trying to do one piece at a time and intention is to post
the work ASAP so that we could get early and continuous feedback from
the community. We could then incorporate those suggestions in the next
WIP patch.

To start with, this patch implements HOT-update for a simple case
when there is enough free space in the same block so that it can
accommodate the new version of the tuple. A necessary condition for
doing HOT-update is that none of the index columns is changed.
The old version is marked as HEAP_UPDATE_ROOT and the new
version is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated,
no new index entry is added.
Feature Freeze
Submit another version
 HOT WIP Patch - version 2

 From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>
 To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,
      pgsql-patches ( at ) postgresql ( dot ) org
 Subject: HOT WIP Patch - version 2
 Date: Tue, 20 Feb 2007 12:08:14 +0530


 Reposting - looks like the message did not get through in the first
 attempt. My apologies if multiple copies are received.


 This is the next version of the HOT WIP patch. Since the last patch that
 I sent out, I have implemented the HOT-update chain pruning mechanism.

 When following a HOT-update chain from the index fetch, if we notice that
 the root tuple is dead and it is HOT-updated, we try to prune the chain to
 the smallest possible length. To do that, the share lock is upgraded to an
 exclusive lock and the tuple chain is followed till we find a live/recently-dead
 tuple. At that point, the root t_ctid is made point to that tuple. In order to
 preserve the xmax/xmin chain, the xmax of the root tuple is also updated
 to xmin of the found tuple. Since this xmax is also < RecentGlobalXmin
Submit another version
  HOT WIP Patch - version 3.2

  From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>
  To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,
       pgsql-patches ( at ) postgresql ( dot ) org
  Subject: HOT WIP Patch - version 3.2
  Date: Sun, 25 Feb 2007 00:06:04 +0530


  Please see the attached WIP HOT patch - version 3.2. It now
  implements the logic for reusing heap-only dead tuples. When a
  HOT-update chain is pruned, the heap-only tuples are marked
  LP_DELETE. The lp_offset and lp_len fields in the line pointer are
  maintained.

  When a backend runs out of free space in a page when doing an
  UPDATE, it searches the line pointers to find a slot
  which is marked LP_DELETEd and has enough space to accommodate
  the new tuple. If such a slot is found, its reused. We might
  waste some space if the slot is larger than the tuple, but
  that gets reclaimed at VACUUM time.
Yet another version
HOT WIP Patch - version 6.3

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>
To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>
Subject: HOT WIP Patch - version 6.3
Date: Mon, 2 Apr 2007 17:51:13 +0530

Please see the HOT version 6.3 patch posted on pgsql-patches.
I've implemented support for CREATE INDEX and CREATE INDEX
CONCURRENTLY based on the recent discussions. The implementation
is not yet complete and needs some more testing/work/discussion
before we can start considering it for review.

One of the regression test case fails because CIC now works in
three phases. In the first phase, we just create the catalog entry
for the index and commit the transaction. If the index_build fails
because of any error (say, unique key constraint) the index creation
fails, but the catalog entry remains.
Many issues resolved
●   CREATE INDEX
    –   including CONCURRENTLY
●   Re-using dead tuples
●   Interaction with Cluster
●   Plan invalidation
●   Utilities & tools
But still not reviewed
Tom Lane says:
     “break it up, please!”
●   Too big a patch for reviewers
    –   almost 12,000 lines
●   Broken up into 5 parts
    –   1. The basic HOT implementation
    –   2. Retain vacuum, chain pruning and other tricks
    –   3. Fix the broken VACUUM and VACUUM FULL
        code
    –   4. Fix the broken CREATE INDEX
    –   pg_stats and other misc. utilities
Code Reviewed
PostgreSQL Beta
HOT Performance
SKYLINE OF

SKYLINE OF [DISTINCT] d1 [MIN | MAX |
DIFF],  .., dm [MIN | MAX | DIFF]


SELECT *
FROM books
SKYLINE OF rating MAX, price MIN;
CDE @ IIIT, Hyderabad
Feature proposed 3/3
Extension to SQL syntax

  SKYLINE OF [DISTINCT] d1
  [MIN | MAX | DIFF],  .., dm
  [MIN | MAX | DIFF]
Approximate Queries
Approximate Queries
SELECT *
FROM Books
SKYLINE OF rating MAX, price MIN;
Lots of discussion
Problems with the Patch
 Re: PostgreSQL - 'SKYLINE OF' clause added!

 From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>
 To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>
 Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!
 Date: Thu, 08 Mar 2007 01:12:22 -0500

 Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:
 > Tom Lane wrote:
 >> Well, whether it's horrible or not is in the eye of the beholder, but
 >> this is certainly a non-standard syntax extension.

 > Being non-standard should not be the only reason to reject a worthwhile
 > feature.

 No, but being non-standard is certainly an indicator that the feature
 may not be of widespread interest --- if it were, the SQL committee
 would've gotten around to including it; seems they've managed to include
 everything but the kitchen sink already. Add to that the complete lack
 of any previous demand for the feature, and you have to wonder where the
 market is.

 > The fact that several
 > different groups have been mentioned to be working on this feature would
 > indicate that it is worth considering.
Problems with the Patch
●   Not part of the ANSI SQL standard
    –   possibly low general applicability
    –   might get added to standard with different syntax
    –   might never get standardized at all
●   Requires changes to PostgreSQL parser
    –   new keyword break applications
    –   possible side effects
●   Not coded to PostgreSQL standards
    –   would need refactoring
Rejected!
Re: PostgreSQL - 'SKYLINE OF' clause rejected

From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>
To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>
Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!
Date: Sun, 11 Mar 2007 23:44:41 -0400

Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:
> If we consider this thoroughly and compile a suitable syntax that covers
> all bases it could be used as the basis of the standard definition or be
> close to what ends up in the standard.

I'll bet you a very good dinner that the word SKYLINE will never be seen
in the standard.

To me, the proposed feature seems an extremely narrow, special-purpose
thing. The SQL committee have never been into that very much, and seem
even less interested in the last couple of revisions. They like
mechanisms that can be used to solve a wide variety of problems, and
are not afraid to introduce conceptual complexity to get there.
Two examples for you: outer joins and recursive queries. Oracle's
(+) syntax is more compact than what got into the spec, but less
precise and less functional. For recursive queries, CONNECT BY is
way simpler than what got into the spec, but again doesn't cover as
much ground. The SKYLINE clause seems to me to be right about on
par with CONNECT BY ... it does something useful, but only one thing.
Solution: pgFoundry
Contributor Resources
Mailing Lists
●   Hackers list
    –   pgsql-hackers
    –   main list for development discussion
●   Patch list
    –   pgsql-patches
    –   submit your patch here after discussion on -hackers
●   Specific feature lists
    –   pgsql-jdbc, pgsql-performance, pgsql-sql, etc.
    –   subscribe at www.postgresql.org/community/lists
Web Sites
●   www.postgresql.org
    –   main site
●   www.pgfoundry.org
    –   add-ins, drivers, tools
●   developer.postgresql.org
    –   developer wiki, including TODO lists
●   archives.postgresql.org
    –   mailing list archives -- search for your idea here
Documentation
●   www.postgresql.org/docs
    –   main documentation
    –   internals:/docs/current/static/internals.html
    –   code conventions: /docs/current/static/source.html
●   doxygen.postgresql.org
    –   annotated source code
●   www.postgresql.org/docs/faqs.FAQ_DEV.html
    –   developer FAQ
The PostgreSQL Year
RC and Branch               December 2007
    Development Period
    Patch Commit Fest       February 1, 2008

    Development Period
    Patch Commit Fest       April 1, 2008

    Development Period
    Patch Commit Fest       June 1, 2008
    Development Period
Feature Freeze              August 1, 2008
     Integration & Review
           (1 month)
Beta    Beta Testing        September, 2008
        (1-2 months)

RC and Branch               October, 2008
Other tips on submitting
●   Don't get discouraged.
    –   Be prepared to argue.
    –   One hacker rejecting your idea doesn't mean
        everyone does.
    –   Committers (esp. Tom Lane) are often more
        concerned about maintainability than cool stuff.
●   Be flexible: you will have to make changes.
    –   Corporate and academic coding standards are
        generally lower than the project's.
Other tips on submitting
●   Don't use the wrong arguments
    –   “MySQL/Oracle does it this way.”
    –   “Based on this hot academic trend.”
●   Some things make a patch harder to accept
    –   New syntax
    –   Backwards compatibility issues
    –   High code counts
●   Don't get discouraged.
Now, go write some
      code.
or contribute in some easier way
Contact Information
●   Josh Berkus                                ●    Pavan Deolasee
    –   josh@postgresql.org                           –    pavan.deolasee
    –   blogs.ittoolbox.com/                               @enterprisedb.com
        database/soup                                 –    www.enterprisedb.com
    –   www.sun.com/postgresql
●   PostgreSQL India
    –   in@postgresql.org




                 This talk is copyright 2007 Josh Berkus, and is licensed under the creative commons attribution license

More Related Content

Similar to Development of 8.3 In India

The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Nikolay Samokhvalov
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
MongoDB
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
MongoDB
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
MongoDB
 

Similar to Development of 8.3 In India (20)

The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Postgres level up
Postgres level upPostgres level up
Postgres level up
 
What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015
 
An evening with Postgresql
An evening with PostgresqlAn evening with Postgresql
An evening with Postgresql
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
 
An Introduction to Postgresql
An Introduction to PostgresqlAn Introduction to Postgresql
An Introduction to Postgresql
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Java Course 12: XML & XSL, Web & Servlets
Java Course 12: XML & XSL, Web & ServletsJava Course 12: XML & XSL, Web & Servlets
Java Course 12: XML & XSL, Web & Servlets
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
Presentation MIG-T GeoPackage.pdf
Presentation MIG-T GeoPackage.pdfPresentation MIG-T GeoPackage.pdf
Presentation MIG-T GeoPackage.pdf
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 

More from PostgreSQL Experts, Inc.

Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
PostgreSQL Experts, Inc.
 

More from PostgreSQL Experts, Inc. (20)

Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
Give A Great Tech Talk 2013
Give A Great Tech Talk 2013Give A Great Tech Talk 2013
Give A Great Tech Talk 2013
 
Pg py-and-squid-pypgday
Pg py-and-squid-pypgdayPg py-and-squid-pypgday
Pg py-and-squid-pypgday
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
Five steps perform_2013
Five steps perform_2013Five steps perform_2013
Five steps perform_2013
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
PWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with PerlPWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with Perl
 
10 Ways to Destroy Your Community
10 Ways to Destroy Your Community10 Ways to Destroy Your Community
10 Ways to Destroy Your Community
 
Open Source Press Relations
Open Source Press RelationsOpen Source Press Relations
Open Source Press Relations
 
5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community
 
Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
50 Ways To Love Your Project
50 Ways To Love Your Project50 Ways To Love Your Project
50 Ways To Love Your Project
 
8.4 Upcoming Features
8.4 Upcoming Features 8.4 Upcoming Features
8.4 Upcoming Features
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 
9.1 Mystery Tour
9.1 Mystery Tour9.1 Mystery Tour
9.1 Mystery Tour
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Development of 8.3 In India

  • 1. 8.3 A Story of Many Patches December 2007 FOSS.in Josh Berkus, PostgreSQL Core Team
  • 4. Many, Many Patches E.1. Release 8.3 Release date: 2007-12-?? Release date: CURRENT AS OF 2007-10-24 E.1.1. Overview This release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities: Full text search now fully integrated into the core database system Support the SQL/XML standard, including new operators and an XML data type Support for enumerated data types (ENUM) Add Universally Unique Identifier (UUID) data type Support arrays of composite types Add control over whether NULLs sort first or last Support updatable cursors
  • 5. PostgreSQL 8.3 Features ● Developer ● Consistency – SQL/XML – HOT – Integrated TSearch2 – Load Distributed – UUID, ENUM Checkpoint – PL/pgSQL debugging ● Performance ● Admin – Synchronized Scan – CSV Logging – Asynch Commit – Better Stats ● Accessories – pgStandby – pgBouncer – pgSNMP
  • 6. Many Developers Tom Lane, USA Teodor Sigaev, Russia Steve Marshall Peter Eisentraut, Germany Alvaro Herrera, Chile Paul Bayer Bruce Momjian, USA Mark Kirkwood, New Zealand Doug Knight Dave Page, England Joachim Wieland Greg Sabino Mullane, USA Pavan Deolasee, India Henry Hotz, USA Chad Wagner Itagaki Takahiro, Japan Magnus Haglander, Sweden Brendan Jurd Greg Smith, USA Tatsuo Ishii, Japan Euler Taviera de Oliveira, Braz David Fetter, USA Victor Wagner Joe Conway, USA Pavel Stehule, Czech Bill Moran, USA Simon Riggs, England Greg Stark, England Andrew Dunstan, USA Guillaume Smet, France Jan Wieck, USA Arul Shaji, Australia Hiroshi Saito, Japan Oleg Bartunov, Russia Nickolay Samokhvalov, Russia Chris Marcellino, Italy Florian Pflug Neil Conway, Canada Dave Cramer, Canada Jeff Davis, USA Marc Fournier, Canada Devrim Gunduz, Turkey Trevor Hardcastle Jaime Casanova, Ecuador Gavin Sherry, Australia Nikhil S, India Albert Cervera Jeremy Drake Holdger Schurig Bernd Helmle, Germany Marko Kreen, Estonia D'Arcy Cain, Canada Glen Parker Kris Jurka, USA Gevik Babakhani, Netherlands Heikki Linnakangas, Finland Tom Dunstan, USA
  • 7. Many Developers Tom Lane, USA Teodor Sigaev, Russia Steve Marshall Peter Eisentraut, Germany Alvaro Herrera, Chile Paul Bayer Bruce Momjian, USA Mark Kirkwood, New Zealand Doug Knight Dave Page, England Joachim Wieland Greg Sabino Mullane, USA Pavan Deolasee, India Henry Hotz, USA Chad Wagner Itagaki Takahiro, Japan Magnus Haglander, Sweden Brendan Jurd Greg Smith, USA Tatsuo Ishii, Japan Euler Taviera de Oliveira, Braz David Fetter, USA Victor Wagner Joe Conway, USA Pavel Stehule, Czech Bill Moran, USA Simon Riggs, England Greg Stark, England Andrew Dunstan, USA Guillaume Smet, France Jan Wieck, USA Arul Shaji, Australia Hiroshi Saito, Japan Oleg Bartunov, Russia Nickolay Samokhvalov, RussiaChris Marcellino, Italy Florian Pflug Neil Conway, Canada Dave Cramer, Canada Jeff Davis, USA Marc Fournier, Canada Devrim Gunduz, Turkey Trevor Hardcastle Jaime Casanova, Ecuador Gavin Sherry, Australia Nikhil S, India Albert Cervera Jeremy Drake Holdger Schurig Bernd Helmle, Germany Marko Kreen, Estonia D'Arcy Cain, Canada Glen Parker Kris Jurka, USA Gevik Babakhani, Netherlands Heikki Linnakangas, Finland Tom Dunstan, USA
  • 8. PostgreSQL 8.3 Features ● Developer ● Consistency – SQL/XML – HOT – Integrated TSearch2 – Load Distributed – UUID, ENUM Checkpoint – PL/pgSQL debugging ● Performance ● Admin – Synchronized Scan – CSV Logging – Asynch Commit – Better Stats ● Accessories – pgStandby – pgBouncer – pgSNMP
  • 9. Why contribute? ● PostgreSQL is a community project – owned by the community, run by the community – if you contribute, you are a full participant ● unlike some other databases ● Tinker with cool database stuff – we are hard-core database geeks – learn a lot from top database hackers ● Improve your employment prospects – database engineers are always in demand
  • 10. SQL/XML XMLROOT ( <?xml version=’1.0’ XMLELEMENT ( standalone=’yes’ ?> NAME ’gazonk’, <gazonk name=’val’ num=’2’> XMLATTRIBUTES ( <qux>foo</qux> ’val’ AS ’name’, </gazonk> 1 + 1 AS ’num’), XMLELEMENT ( NAME ’qux’, ’foo’) table_to_xml(tbl regclass, ), nulls boolean, VERSION ’1.0’, tableforest boolean, STANDALONE YES ) targetns text) SELECT * FROM table1 WHERE (xpath(’//person/name/text()’, xdata))[1]::text = ’John Smith’;
  • 11. SQL/XML Prior Work 2002 ● TorchBox contributes /contrib/xml2 – Some XML functionality: ● Xpath functions ● XSLT functions – BUT ● Non-standard, completely PostgreSQL syntax ● No real data type ● Many features missing – Charset support – DTD support
  • 12. SQL/XML Prior Work 2004 ● Peter Eisentraut writes XML export – Export table to XML – BUT ● prototype only ● not useful without other XML functionality ● syntax requires changing PostgreSQL parser
  • 13. SQL/XML Prior Work 2005 ● Pavel Stuehle writes SQL/XML syntax demo – First standard syntax example – BUT ● depends on PL/perl ● prototype only ● does not integrate with /contrib/xml2
  • 14. Nikolay Samokhvalov ● Graduate Student at University of Moscow ● Met major contributor Oleg Bartunov in 2005 – ported MoiKrug.ru to PostgreSQL ● Masters Thesis: updatable XML views in RDBMS
  • 15. Google funds 700 students to work on Open Source – PostgreSQL gets 7 ● Nickolay proposes project for SQL/XML – Proposal accepted – Peter will mentor
  • 16. SoC Proposal [SoC Proposal] Initial support of XMLType for PostgreSQL Summary Primary goal is introduction of special type support for storing XML data in ORDBMS PostgreSQL, querying this data and modifying it. This project is intended to develop manipulation abilities rather than special storage engine (VARCHAR as initial storage implicit type). At the moment there is no good general vision of most suitable storage for XMLType. Moreover, from my point of view, DBMS should have support of different index types for XMLType - every for its special purpose. And which is more important is an open question. That's why I propose to work on 'external' things rather than 'internals' (data structure for index) and strictly follow standards. But anyway, I've included path index (#7 in the list of Deliverables), because now I suppose that it is most expectative type of index (this item is optional, because here community's feedback is highly needed). Deliverables Ability to define any column as of XMLType. Initially, this means that only well-formed XML documents could be stored in such a column. Automatic validation of XML documents being inserted/modified against XML schema, if definition of column contains it (reference to it). DTD and/or XML Schema could be used for this. Subset of SQL/XML standard [1] for mixing relational and XML data in queries. This includes at least following: XMLELEMENT, XMLAGG, XMLFOREST, XMLCONCAT expressions; implementation of mapping rules for basic types. (See Project Details for more details). XML domains support: possibility to define domain based on XMLType, using XML schema (DTD / XML Schema). Basic XPath support (existing experience - contrib/xml2 module - should be taken into account). Basic XSLT support (existing experience - contrib/xml2 module - should be taken into account). Path indexes for fast retrieval of XML documents (queries with XPath expressions in WHERE clause). [OPTIONAL] Documentation (definition rules for XMLType, SQL/XML expressions, etc).
  • 17. Mentor: Peter Eisentraut ● From Aachen, Germany ● Core Team member since 2004 ● In charge of PostgreSQL Documentation ● Prior XML work
  • 18. Specification Research ● ANSI SQL 2006 -- SQL/XML XML Publishing Functions xmlelement() Creates an XML element, allowing the name to be specified. xmlattributes() Creates XML attributes from columns, using the name of each column as the name of the corresponding attribute. xmlroot() Creates the root node of an XML document. xmlcomment() Creates an XML comment. xmlpi() Creates an XML processing instruction. xmlparse() Parses a string as XML and returns the resulting XML structure. xmlforest() Creates XML elements from columns, using the name of each column as the name of the corresponding element. xmlconcat() Combines a list of individual XML values to create a single value containing an XML forest. xmlagg() Combines a collection of rows, each containing a single XML value, to create a single value containing an XML forest.
  • 19. Specification Research ● ANSI SQL 2006 -- SQL/XML
  • 20. Specification Proposal Re: Google SoC--Idea Request From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com> Subject: Re: Google SoC--Idea Request Date: Tue, 2 May 2006 12:34:43 +0400 Proposal: XMLType for PostgreSQL. *** Minimum: *** to have special type support for storing XML data and working with it. This means following: - ability to define any column of a table as of XMLType; internally, all data is stored as VARCHAR; - auto validation of documents against XML schema, if it was specified in column definition or in XML data sheets themselves (DTD, XSD or at least one of them) /*contrib/xml2 has such feature, but it uses libxml, what means DOM interface. Maybe it's better to use some SAX parser to solve this task*/; - XPath indexes for queries with path expressions in WHERE clause /*I suppose this kind of indexes would be most frequently used. I propose using good labeling schema and GIST and/or Gin here*/; - some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has more than 400 pages now and contains some established constructions, that are using in other DBMSes. There is the some patch already written by Pavel Stehule: http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it? it was kept for 8.2, so what is the result?) I've tested it several months ago, basic SQL/XML functions worked fine. It changes grammar,
  • 21. Specification Revisions XML export function signatures From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net> To: pgsql-hackers ( at ) postgresql ( dot ) org Subject: XML export function signatures Date: Mon, 12 Feb 2007 20:18:59 +0100 Here are the proposed signatures for the XML export functions. While I have seen the output formats in use elsewhere, I could not find any useful information on how to invoke these mappings, so the following is purely my own invention. table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xml query_to_xml(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xml table_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xml query_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xml table_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xml query_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xml cursor_get_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text) RETURNS xml cursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text) RETURNS xml
  • 22. Specification Revisions Re: XML export function signatures From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net> To: Andrew Dunstan <andrew ( at ) dunslane ( dot ) net> Subject: Re: XML export function signatures Date: Mon, 12 Feb 2007 23:57:49 +0100 Andrew Dunstan wrote: > . table_to_xml_and_xmlschema seems like a mouthful - can we shorten > it a bit? Well, it gives you back a mouthful of data, too. :) > . what are the two ways of representing data that tableforest > distinguishes? tableforest = false gives you something like <tablename> <row> <!-- where "row" is constant --> <col1name>data</col1name> <col2name>data</col2name> </row> <row>
  • 23. Approved Specification ● XML Data Type – type-safe XML storage – supports XML operators, functions ● XML Functions – Generation – Manipulation (XLST) – Export – XPath Query ● XML Expressions – IS DOCUMENT, etc.
  • 25. Initial Versions of Patch Updated XML patch From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net> To: pgsql-patches ( at ) postgresql ( dot ) org Subject: Updated XML patch Date: Thu, 14 Dec 2006 23:02:05 +0100 Attached is an updated patch for XML functionality, which subsumes all earlier patches on the subject. This includes a data type with format checking, and functions to mangle values. For the moment, I have cut out the inessential stuff such as xpath. The included regression test file xml.sql shows some of the things that work. This patch already covers most of the parser work. What is left hereafter is adjusting all the corner cases, the escaping rules, and the various XML parser behaviors. Use configure --with-libxml to build. -- Peter Eisentraut http://developer.postgresql.org/~petere/ Attachment: current-xml-patch.bz2Description: BZip2 compressed data
  • 26. static void Initial Patch +_outXmlExpr(StringInfo str, XmlExpr *node) +{ + WRITE_NODE_TYPE("XMLEXPR"); + + WRITE_ENUM_FIELD(op, XmlExprOp); + WRITE_STRING_FIELD(name); + WRITE_NODE_FIELD(named_args); + WRITE_NODE_FIELD(args); +} + +static void _outCoerceToDomain(StringInfo str, CoerceToDomain *node) { WRITE_NODE_TYPE("COERCETODOMAIN"); @@ -2019,6 +2030,9 @@ case T_BooleanTest: _outBooleanTest(str, obj); break; + case T_XmlExpr: + _outXmlExpr(str, obj); + break; case T_CoerceToDomain: _outCoerceToDomain(str, obj); break; diff -Nru -x configure ../cvs-pgsql/src/backend/nodes/readfuncs.c ./src/backend/nodes/readfuncs.c --- ../cvs-pgsql/src/backend/nodes/readfuncs.c 2006-12-12 16:31:46.000000000 +0100 +++ ./src/backend/nodes/readfuncs.c 2006-12-14 21:20:08.000000000 +0100 @@ -765,6 +765,22 @@ }
  • 27. Patch Revisions Re: xml type and encodings From: "Andrew Dunstan" <andrew ( at ) dunslane ( dot ) net> To: "Peter Eisentraut" <peter_e ( at ) gmx ( dot ) net> Subject: Re: xml type and encodings Date: Mon, 15 Jan 2007 16:35:13 -0600 (CST) Peter Eisentraut wrote: > Florian G. Pflug wrote: >> Couldn't the server change the encoding declaration inside the xml to >> the correct >> one (the same as client_encoding) before returning the result? > > The data type output function doesn't know what the client encoding is > or whether the data will be shipped to the client at all. But what I'm > thinking is that we should remove the encoding declaration if possible. > At least that would be less confusing, albeit still potentially > incorrect if the client continues to process the document without care. The XML SPec says: "In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration."
  • 28. More Patch Revisions xpath_array with namespaces support From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com> To: PGSQL-Patches <pgsql-patches ( at ) postgresql ( dot ) org> Subject: xpath_array with namespaces support Date: Wed, 21 Feb 2007 02:46:33 +0300 As a result of discussion with Peter, I provide modified patch for xpath_array() with namespaces support. The signature is: _xml xpath_array(text xpathQuery, xml xmlValue[, _text namespacesBindings]) The third argument is 2-dimensional array defining bindings for namespaces. Simple examples: xmltest=# SELECT xpath_array('//text()', '<local:data xmlns:local="http://127.0.0.1";><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>'); xpath_array ---------------- {"number one"} (1 row)
  • 29. Yet More Revisions correct format for date, time, timestamp for XML functionality From: "Pavel Stehule" <pavel ( dot ) stehule ( at ) hotmail ( dot ) com> To: pgsql-patches ( at ) postgresql ( dot ) org Subject: correct format for date, time, timestamp for XML functionality Date: Tue, 20 Feb 2007 13:27:42 +0100 Hello, this patch ensures independency datetime fields on current datestyle setting. Add new internal datestyle USE_XSD_DATESTYLE. It's almoust same to USE_ISO_DATESTYLE. Differences are for timestamp: ISO: yyyy-mm-dd hh24:mi:ss XSD: yyyy-mm-ddThh24:mi:ss I found one link about this topic: http://forums.oracle.com/forums/thread.jspa?threadID=467278&tstart=0 Regards Pavel Stehule
  • 30. Patch Accepted From: Bruce Momjian <bruce ( at ) momjian ( dot ) us> To: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com> Subject: Re: [HACKERS] xml2 contrib patch supporting default XML namespaces Date: Thu, 22 Mar 2007 16:16:16 -0400 (EDT) Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it.
  • 31. Write Documentation 9.14. XML Functions The functions and function-like expressions described in this section operate on values of type xml. Check Section 8.13 for information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from type xml are not repeated here. Use of many of these functions requires the installation to have been built with configure -- with-libxml. 9.14.1. Producing XML Content A set of functions and function-like expressions are available for producing XML content from SQL data. As such, they are particularly suitable for formatting query results into XML documents for processing in client applications. 9.14.1.1. xmlcomment xmlcomment(text) The function xmlcomment creates an XML value containing an XML comment with the specified text as content. The text cannot contain -- or end with a - so that the resulting construct is a valid XML comment. If the argument is null, the result is null. Example: SELECT xmlcomment('hello'); xmlcomment -------------- <!--hello-->
  • 32. XML in 8.3 Beta E.1. Release 8.3 Release date: 2007-12-?? Release date: CURRENT AS OF 2007-10-24 E.1.1. Overview This release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities: Full text search now fully integrated into the core database system Support the SQL/XML standard, including new operators and an XML data type Support for enumerated data types (ENUM) Add Universally Unique Identifier (UUID) data type
  • 33. SQL/XML Feature Set ● XML Parsing ● XML Functions ● XML Export ● XPath B-tree Index
  • 34. Future XML Projects ● Use of HSTORE for advanced XML indexing ● Automated XML decomposition – XML-to-Table – XML-to-Schema ● PL/XSLT – XHTML query ● XQuery support
  • 35. HOT
  • 36. Fastest OSDB J2EE Througput Acquisition Cost Comparison 900 200000 800 180000 700 160000 Cost in US Dollars 140000 600 120000 500 100000 400 80000 300 60000 200 40000 100 20000 0 0 MySQL PostgreSQL Proprietary MySQL PostgreSQL Proprietary
  • 38. The Consistency Problem
  • 40. What's MVCC? ● Multi-Version Concurrency Control – Each user gets their own “version” of the data – Allows parallelization of updates/reads – Without it, scalability is not possible ● You have to lock everything ● Or violate ACID transactions
  • 42. MVCC Row Version 1 Row Version 1 Row Version 2 Row Version 3 SELECT ... SELECT ... BEGIN UPDATE BEGIN UPDATE
  • 43. MVCC Row Version 1 Row Version 1 Row Version 2 Row Version 2 Row Version 3 ROLLBACK SELECT ... SELECT ... SELECT ... BEGIN UPDATE COMMIT BEGIN UPDATE COMMIT
  • 44. MVCC Them & Us The Overwriting Model InnoDB & Oracle Base Relation Rollback Segment Old Row Version y C op UPDATE Overwrite Row
  • 45. MVCC Them & Us The Overwriting Model InnoDB & Oracle ● Advantages – Low table/index maintenance requirements – Latest row version fast access ● Disadvantages – Transaction isolation can break – Long-running transactions expensive – Rollbacks very expensive – Rollback segment bottleneck
  • 46. MVCC Them & Us The Non-overwriting Model PostgreSQL & Firebird Base Relation UPDATE Old Row Version Copy New Row Version
  • 47. MVCC Them & Us The Non-overwriting Model PostgreSQL & Firebird ● Advantages – Transaction isolation effortless – Rollbacks free – Long-running transactions not a problem ● Disadvantages – High table/index maintenance – “Frequently updated table” problem
  • 48. Frequently Updated Tables Tuplestore Row C: Version 1 small update Indexes Updated Row C: Version 2 small update Indexes Updated Row C: Version 3 small update Indexes Updated Row C: Version 4 large update Indexes Updated Row C: Version 5
  • 49. Frequently Updated Tables Tuplestore Row C: Version 1 small update Indexes Updated Row C: Version 2 small update Indexes Updated Row C: Version 3 small update Indexes Updated Row C: Version 4 large update Indexes Updated Row C: Version 5
  • 50. Frequently Updated Tables Tuplestore Row C: Version 5
  • 51. Frequently Updated Tables Tuplestore Row C: Version 5
  • 52. Frequently Updated Tables Tuplestore Row C: Version 5
  • 54. Pavan Deolasee ● Graduated IIT Bombay – focus on databases ● Worked for VERITAS ● Lead Engineer at EnterpriseDB – PostgreSQL vendor – Contributes performance patches to community ● Lives in Pune
  • 55. Team Effort ● Simon Riggs ● Heikki Linnakangas, Tom – original Lane and others proposal – revisions – prototypes – code review – specification – bug fixes
  • 57. Initial proposal ● Update- in-Place Base Relation HOT File with HOT file Row C: Version 1
  • 58. Initial proposal ● Update- in-Place Base Relation HOT File with HOT file Row C: Version 2 copy old version A TE U PD Row C: Version 1
  • 59. Initial proposal ● Update- in-Place Base Relation HOT File with HOT file Row C: Version 3 TE DA Row C: Version 1 UP copy old version tuple chain Row C: Version 2
  • 60. Initial proposal ● Update- in-Place Base Relation HOT File with HOT file Row C: Version 4 TE DA Row C: Version 1 UP copy old tuple chain version Row C: Version 2 tuple chain Row C: Version 3
  • 61. Initial proposal ● Update- in-Place Base Relation with HOT file Row C: Version 4
  • 62. First proposal to pgsql-hackers Frequent Update Project: Design Overview of HOT Updates From: "Simon Riggs" <simon ( at ) 2ndquadrant ( dot ) com> To: <pgsql-hackers ( at ) postgresql ( dot ) org> Subject: Frequent Update Project: Design Overview of HOT Updates Date: Thu, 09 Nov 2006 17:13:16 +0000 Design Overview of HOT Updates ------------------------------ The objective is to increase the speed of the UPDATE case, while minimizing the overall negative effects of the UPDATE. We refer to the general requirement as *Frequent Update Optimization*, though this design proposal is for Heap Overflow Tuple (HOT) Updates. It is similar in some ways to the design for SITC already proposed, though has a number of additional features drawn from other designs to make it a practical and effective implementation. EnterpriseDB have a working, performant prototype of this design. There are still a number of issues to resolve and the intention is to follow an open community process to find the best way forward. All required detail will be provided for the work conducted so far. Current PGSQL behaviour is for UPDATEs to create a new tuple version within the heap, so acts from many perspectives as if it were an INSERT. All of the tuple versions are chained together, so that whichever of the tuples is visible to your Snapshot, you can walk the chain to find the most recent tuple version to update.
  • 63. Revisions: Reverse Order Normal Tuples HOT Relation File Row C: Version 1 in-page update A TE U PD Row C: Version 2 in-page update Row C: Version 3 in-page update Row C: Version 4
  • 64. Revisions: Chains, not files Normal Tuples HOT Tuple Chain Row C: Version 1 in-page update A TE U PD Row C: Version 2 in-page update Row C: Version 3 in-page update Row C: Version 4
  • 65. Add microvacuum Normal Tuples HOT Tuple Chain Row C: Version 1 in-page update A TE U PD Row C: Version 2 in-page update microvacuum Row C: Version 3 in-page update Row C: Version 4 new page / index update Indexes Updated Row C: Version 5
  • 66. Submit patch draft v.1 HOT WIP Patch - version 1 From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com> To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org> Subject: HOT WIP Patch - version 1 Date: Wed, 14 Feb 2007 15:34:46 +0530 This is a WIP patch based on the recent posting by Simon and discussions thereafter. We are trying to do one piece at a time and intention is to post the work ASAP so that we could get early and continuous feedback from the community. We could then incorporate those suggestions in the next WIP patch. To start with, this patch implements HOT-update for a simple case when there is enough free space in the same block so that it can accommodate the new version of the tuple. A necessary condition for doing HOT-update is that none of the index columns is changed. The old version is marked as HEAP_UPDATE_ROOT and the new version is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated, no new index entry is added.
  • 68. Submit another version HOT WIP Patch - version 2 From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com> To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>, pgsql-patches ( at ) postgresql ( dot ) org Subject: HOT WIP Patch - version 2 Date: Tue, 20 Feb 2007 12:08:14 +0530 Reposting - looks like the message did not get through in the first attempt. My apologies if multiple copies are received. This is the next version of the HOT WIP patch. Since the last patch that I sent out, I have implemented the HOT-update chain pruning mechanism. When following a HOT-update chain from the index fetch, if we notice that the root tuple is dead and it is HOT-updated, we try to prune the chain to the smallest possible length. To do that, the share lock is upgraded to an exclusive lock and the tuple chain is followed till we find a live/recently-dead tuple. At that point, the root t_ctid is made point to that tuple. In order to preserve the xmax/xmin chain, the xmax of the root tuple is also updated to xmin of the found tuple. Since this xmax is also < RecentGlobalXmin
  • 69. Submit another version HOT WIP Patch - version 3.2 From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com> To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>, pgsql-patches ( at ) postgresql ( dot ) org Subject: HOT WIP Patch - version 3.2 Date: Sun, 25 Feb 2007 00:06:04 +0530 Please see the attached WIP HOT patch - version 3.2. It now implements the logic for reusing heap-only dead tuples. When a HOT-update chain is pruned, the heap-only tuples are marked LP_DELETE. The lp_offset and lp_len fields in the line pointer are maintained. When a backend runs out of free space in a page when doing an UPDATE, it searches the line pointers to find a slot which is marked LP_DELETEd and has enough space to accommodate the new tuple. If such a slot is found, its reused. We might waste some space if the slot is larger than the tuple, but that gets reclaimed at VACUUM time.
  • 70. Yet another version HOT WIP Patch - version 6.3 From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com> To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org> Subject: HOT WIP Patch - version 6.3 Date: Mon, 2 Apr 2007 17:51:13 +0530 Please see the HOT version 6.3 patch posted on pgsql-patches. I've implemented support for CREATE INDEX and CREATE INDEX CONCURRENTLY based on the recent discussions. The implementation is not yet complete and needs some more testing/work/discussion before we can start considering it for review. One of the regression test case fails because CIC now works in three phases. In the first phase, we just create the catalog entry for the index and commit the transaction. If the index_build fails because of any error (say, unique key constraint) the index creation fails, but the catalog entry remains.
  • 71. Many issues resolved ● CREATE INDEX – including CONCURRENTLY ● Re-using dead tuples ● Interaction with Cluster ● Plan invalidation ● Utilities & tools
  • 72. But still not reviewed
  • 73. Tom Lane says: “break it up, please!” ● Too big a patch for reviewers – almost 12,000 lines ● Broken up into 5 parts – 1. The basic HOT implementation – 2. Retain vacuum, chain pruning and other tricks – 3. Fix the broken VACUUM and VACUUM FULL code – 4. Fix the broken CREATE INDEX – pg_stats and other misc. utilities
  • 77. SKYLINE OF SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX | DIFF] SELECT * FROM books SKYLINE OF rating MAX, price MIN;
  • 78. CDE @ IIIT, Hyderabad
  • 80. Extension to SQL syntax SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX | DIFF]
  • 82. Approximate Queries SELECT * FROM Books SKYLINE OF rating MAX, price MIN;
  • 84. Problems with the Patch Re: PostgreSQL - 'SKYLINE OF' clause added! From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us> To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> Subject: Re: PostgreSQL - 'SKYLINE OF' clause added! Date: Thu, 08 Mar 2007 01:12:22 -0500 Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes: > Tom Lane wrote: >> Well, whether it's horrible or not is in the eye of the beholder, but >> this is certainly a non-standard syntax extension. > Being non-standard should not be the only reason to reject a worthwhile > feature. No, but being non-standard is certainly an indicator that the feature may not be of widespread interest --- if it were, the SQL committee would've gotten around to including it; seems they've managed to include everything but the kitchen sink already. Add to that the complete lack of any previous demand for the feature, and you have to wonder where the market is. > The fact that several > different groups have been mentioned to be working on this feature would > indicate that it is worth considering.
  • 85. Problems with the Patch ● Not part of the ANSI SQL standard – possibly low general applicability – might get added to standard with different syntax – might never get standardized at all ● Requires changes to PostgreSQL parser – new keyword break applications – possible side effects ● Not coded to PostgreSQL standards – would need refactoring
  • 86. Rejected! Re: PostgreSQL - 'SKYLINE OF' clause rejected From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us> To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> Subject: Re: PostgreSQL - 'SKYLINE OF' clause added! Date: Sun, 11 Mar 2007 23:44:41 -0400 Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes: > If we consider this thoroughly and compile a suitable syntax that covers > all bases it could be used as the basis of the standard definition or be > close to what ends up in the standard. I'll bet you a very good dinner that the word SKYLINE will never be seen in the standard. To me, the proposed feature seems an extremely narrow, special-purpose thing. The SQL committee have never been into that very much, and seem even less interested in the last couple of revisions. They like mechanisms that can be used to solve a wide variety of problems, and are not afraid to introduce conceptual complexity to get there. Two examples for you: outer joins and recursive queries. Oracle's (+) syntax is more compact than what got into the spec, but less precise and less functional. For recursive queries, CONNECT BY is way simpler than what got into the spec, but again doesn't cover as much ground. The SKYLINE clause seems to me to be right about on par with CONNECT BY ... it does something useful, but only one thing.
  • 89. Mailing Lists ● Hackers list – pgsql-hackers – main list for development discussion ● Patch list – pgsql-patches – submit your patch here after discussion on -hackers ● Specific feature lists – pgsql-jdbc, pgsql-performance, pgsql-sql, etc. – subscribe at www.postgresql.org/community/lists
  • 90. Web Sites ● www.postgresql.org – main site ● www.pgfoundry.org – add-ins, drivers, tools ● developer.postgresql.org – developer wiki, including TODO lists ● archives.postgresql.org – mailing list archives -- search for your idea here
  • 91. Documentation ● www.postgresql.org/docs – main documentation – internals:/docs/current/static/internals.html – code conventions: /docs/current/static/source.html ● doxygen.postgresql.org – annotated source code ● www.postgresql.org/docs/faqs.FAQ_DEV.html – developer FAQ
  • 92. The PostgreSQL Year RC and Branch December 2007 Development Period Patch Commit Fest February 1, 2008 Development Period Patch Commit Fest April 1, 2008 Development Period Patch Commit Fest June 1, 2008 Development Period Feature Freeze August 1, 2008 Integration & Review (1 month) Beta Beta Testing September, 2008 (1-2 months) RC and Branch October, 2008
  • 93. Other tips on submitting ● Don't get discouraged. – Be prepared to argue. – One hacker rejecting your idea doesn't mean everyone does. – Committers (esp. Tom Lane) are often more concerned about maintainability than cool stuff. ● Be flexible: you will have to make changes. – Corporate and academic coding standards are generally lower than the project's.
  • 94. Other tips on submitting ● Don't use the wrong arguments – “MySQL/Oracle does it this way.” – “Based on this hot academic trend.” ● Some things make a patch harder to accept – New syntax – Backwards compatibility issues – High code counts ● Don't get discouraged.
  • 95. Now, go write some code. or contribute in some easier way
  • 96. Contact Information ● Josh Berkus ● Pavan Deolasee – josh@postgresql.org – pavan.deolasee – blogs.ittoolbox.com/ @enterprisedb.com database/soup – www.enterprisedb.com – www.sun.com/postgresql ● PostgreSQL India – in@postgresql.org This talk is copyright 2007 Josh Berkus, and is licensed under the creative commons attribution license