10 Features Developers are
missing when Stuck with
Proprietary Databases!
Sameer Kumar (@sameerkasi200x)
DB Solution Architect, Ashnik (@AshnikBiz)
15th March 2015
About Me!
- A Random guy who started Career as an Oracle and DB2
DBA (and yeah a bit of SQL Server too)
- Then moved to ‘Ashnik’ and started working with Postgres
- We work on Open Source Consulting and Solution
- And now I love Open Source!
- Twitter - @sameerkasi200x
- Apart from technology I love cycling and photography
2
Why I Love PostgreSQL?
- Claims to be “Most Advanced Open Source Database”
- A vibrant and active community
- Full ACID compliant
- Multi Version Concurrency Control
- NoSQL capability
- Developer Friendly
- Built to be extended ‘easily’
3
Supported on vast platforms
4
- Portable on vast range of Operating Systems – Unix, Linux,
Windows etc
- Supported on various Architectures – RISC, ARM, x86
10 Features you would love as a developer!
1. New JSON Datatype introduced in v9.4 and JSON Functions &
Operators
2. Vast set of datatypes supported – money, time, range, boolean,
interval and many more
3. Rich support for foreign Data Wrapper – Build a Logical Data
warehouse!
4. User Defined Operators – It’s really cool!
5. User Defined Extensions – you have out of box extensions plus you
can write your own!
5… continued
10 Features you would love as a developer!
6. Filter Based Indexes or Partial Indexes – Index only what you need
to!
7. Granular control of parameters at User, Database, Connection or
Transaction Level – sort memory, logging parameters, reliability
parameters and many more
8. Use of indexes to get statistics on the fly
9. JDBC API for COPY Command – Do bulk load right from you java
program
10. Full Text Search – There is a lot more than what you think
6
JSON Features
Store Unstructured Data – Store rows with
different Attributes
“category “
is an array
“features” is
an array of
sub-
documents
create table item_catalog (
item_id varchar(50)
primary key,
item_description
varchar(250), attributes
jsonb );
“features”has different
set of members
New fields which suit the
details of this specific type
of product
8
Benefits to the Developers
- Allows you to store records which might have different
attributes
- Store data in JSONB field until your schema has matured and
firmed and then move it relational attributes (columns)
- Use JSON functions& operators to fetch and return data to
application via APIs
- This would be make application transparent to
underlying structure
- The binary storage format of JSONB allows efficient parsing
- You index JSON fields for faster search!
9
Datatype Support
Data Types Supported
Data Type Usage
Money Store currency Data
Interval Store time interval e.g. ‘2 days’, ‘1 hour’ etc
Time Store the time 2:00PM, 6:00AM etc
Range Store Ranges for integer, date or timestamp
Boolean Store true or false values
And store many more common data types e.g. varchar and char for
string, numeric, float, integer, serial etc.
Create user defined datatypes to store data as per your convenience and
define GiST indexes for your data-types
11
Benefits to the Developers
- Store the data from application or user input in more
intuitive datatypes in database
- Avoid conversion or translation of values retrieved from
database
- Define your own data types to match the structure or
objects defined in programs
- Define your own operators and index access for user
defined operators
12
Foreign Data Wrapper
Access Remote Databases
- As the name suggests allows you to access foreign tables in
remote databases
- Allows you to read and write from these foreign tables
14
Benefits to the Developers
- Access data from legacy systems for run-time processing
- Avoid connecting to multiple databases in application
- Read/write from noSQL or filesystem based stores as if they are
relational tables
- Postgres would push the operations e.g. filter clause to foreign
database for better execution
- Useful for migration or data integration
- Foreign Data Wrapper available for vast databases and data
stores
- Hadoop, MongoDB, Oracle, MySQL, MariaDB, file system and many more
15
User Defined Operators
Define your own operators
- Postgres allows you to create your own operators
- You can override the existing ones for specific cases of you
can give a new meaning to an operator for special cases
17
Benefits to the Developers
- Define your own operators to define how user defined data
types are handled
- Define your own operators to override a default handling of
data-types e.g. perform a case-in-sensitive search on
varchar columns
- Create new data-types to handle specific tasks e.g. use + for
concatenation of strings
- Makes the data processing easier for developers
- Makes the migration process easier e.g. from SQL Server to
PostgreSQL will benefit from string concatenation
18
User Defined Extensions
Extend PostgreSQL capabilities with Extension
- These are like add-on modules which you can compile and
add to PostgreSQL
- Once added the features offered by extensions work as
native features
- Allows you to extend PostgreSQL capabilities
- There are out of box extensions available and you can write
your own
20
Some Popular Extensions
- pg_prewarm – load your data into buffer cache to avoid ‘cold
reboot’ issues
- pgcrypto – cryptographic functions to encrypt the data
- pg_shard – Create a Sharded Cluster with PostgreSQL
- postGIST – Add full spatial capabilities to PostgreSQL
- pg_bufferpool, pgrowlocks, pgstattuple and pg_freespacemap –
take a peek into buffers, locks and data pages
- hstore- to use PostgreSQL as a key-value pair store
- fuzzystrmatch and pg_trgm – more enhanced and powerful
search on textual data
21
Partial Indexes
When only a portion of data is relevant
- Often we have some columns which has low cardinality or
few distinct values
- An index on these columns is not very helpful
- Mostly we have queries which requires only one of the
values from all available values
- e.g. soft delete
- Application always queries data where “ deleted = false”
- e.g. using a column named “closed” in “ACCOUNTS” table in
bank
23
Benefits of creating a partial index
- You can index only that data which is relevant and queried-
• create index idx_active_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where closed=flase;
- This keeps the index size smaller which performs faster
- You can create separate indexes to cover different set of
data e.g.
• create index idx_current_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where acc_type=‘current’;
• create index idx_current_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where acc_type=‘savings’;
24
Granular Parameter Control
You can control Parameters at several levels
- Instance Level – in parameter file or in startup command
- Database
- alter database reporting_db set work_mem=10240;
- User Level
- alter user batch_user set maintenance_work_mem=1024mb
- Transaction Level
- Select set_config(‘work_mem’,’20480’,true);
- Connection/Session Level
- Set synchronous_commit=off;
- Select set_config(‘synchronous_commit’,’off’,false);
26
Benefits to the Developer
- A developer can set the parameters as per the requirements
in the program
- Set higher maintenance and sorting memory for batch jobs
- Set higher sorting memory for reporting user
- Set synchronous_commit off during batches to enhance
performance for bulk loads
- Set different logging for specific users
27
PostgreSQL Planner can Get the
Statistics on the fly
Benefits to the Developer
- Often as a developer you have code batch jobs
- Bulk uploads and bulk deletion of data from tables
- Post these operations you may be querying the same table
- Due to huge change in data volume chances are there that
optimizer will pick a wrong plan.
29
So shall you gather stats after each bulk load
operation?
- Not really!
- PostgreSQL optimizer is smart enough to quickly gauge the
statistics from the indexes on the fly
- Developers don’t need to make their code heavy with
ANALYZE, specially if response time is an important factor
30
JDBC API for COPY
JDBC Copy
- COPY command in PostgreSQL allows you to do bulk loads
- PostgreSQL jdbc drive also provides a COPY API
- Using JDBC Copy you can programmatically load data from
STDIN or files
- Allows programmers to do faster bulk loads
32
Full Text Search
Yes! You can do full text search on PostgreSQL
- You can store your data in PostgreSQL and use it for complex
pattern matches and textual search
- With GIN indexes your text searches and pattern matches
can be made faster
- With additional Extensions you can also do trigram based
searches or phonex/soundex matches
- Makes the developers life easier while doing searches on
textual data
- GIN and GiST indexes help get better performance
34
For Further Reference
- www.postgresql.org
- www.planetpostgresql.org
- Various community user group discussions
- Various blogs
- Josh Berkus
- Magnus Hagander
- Bruce Momjian
- Simon Riggs
- Many more
- Ashnik Blog Archives
- Ashnik YouTube Channel
35
Questions?

FOSSASIA 2015 - 10 Features your developers are missing when stuck with Proprietary Databases

  • 1.
    10 Features Developersare missing when Stuck with Proprietary Databases! Sameer Kumar (@sameerkasi200x) DB Solution Architect, Ashnik (@AshnikBiz) 15th March 2015
  • 2.
    About Me! - ARandom guy who started Career as an Oracle and DB2 DBA (and yeah a bit of SQL Server too) - Then moved to ‘Ashnik’ and started working with Postgres - We work on Open Source Consulting and Solution - And now I love Open Source! - Twitter - @sameerkasi200x - Apart from technology I love cycling and photography 2
  • 3.
    Why I LovePostgreSQL? - Claims to be “Most Advanced Open Source Database” - A vibrant and active community - Full ACID compliant - Multi Version Concurrency Control - NoSQL capability - Developer Friendly - Built to be extended ‘easily’ 3
  • 4.
    Supported on vastplatforms 4 - Portable on vast range of Operating Systems – Unix, Linux, Windows etc - Supported on various Architectures – RISC, ARM, x86
  • 5.
    10 Features youwould love as a developer! 1. New JSON Datatype introduced in v9.4 and JSON Functions & Operators 2. Vast set of datatypes supported – money, time, range, boolean, interval and many more 3. Rich support for foreign Data Wrapper – Build a Logical Data warehouse! 4. User Defined Operators – It’s really cool! 5. User Defined Extensions – you have out of box extensions plus you can write your own! 5… continued
  • 6.
    10 Features youwould love as a developer! 6. Filter Based Indexes or Partial Indexes – Index only what you need to! 7. Granular control of parameters at User, Database, Connection or Transaction Level – sort memory, logging parameters, reliability parameters and many more 8. Use of indexes to get statistics on the fly 9. JDBC API for COPY Command – Do bulk load right from you java program 10. Full Text Search – There is a lot more than what you think 6
  • 7.
  • 8.
    Store Unstructured Data– Store rows with different Attributes “category “ is an array “features” is an array of sub- documents create table item_catalog ( item_id varchar(50) primary key, item_description varchar(250), attributes jsonb ); “features”has different set of members New fields which suit the details of this specific type of product 8
  • 9.
    Benefits to theDevelopers - Allows you to store records which might have different attributes - Store data in JSONB field until your schema has matured and firmed and then move it relational attributes (columns) - Use JSON functions& operators to fetch and return data to application via APIs - This would be make application transparent to underlying structure - The binary storage format of JSONB allows efficient parsing - You index JSON fields for faster search! 9
  • 10.
  • 11.
    Data Types Supported DataType Usage Money Store currency Data Interval Store time interval e.g. ‘2 days’, ‘1 hour’ etc Time Store the time 2:00PM, 6:00AM etc Range Store Ranges for integer, date or timestamp Boolean Store true or false values And store many more common data types e.g. varchar and char for string, numeric, float, integer, serial etc. Create user defined datatypes to store data as per your convenience and define GiST indexes for your data-types 11
  • 12.
    Benefits to theDevelopers - Store the data from application or user input in more intuitive datatypes in database - Avoid conversion or translation of values retrieved from database - Define your own data types to match the structure or objects defined in programs - Define your own operators and index access for user defined operators 12
  • 13.
  • 14.
    Access Remote Databases -As the name suggests allows you to access foreign tables in remote databases - Allows you to read and write from these foreign tables 14
  • 15.
    Benefits to theDevelopers - Access data from legacy systems for run-time processing - Avoid connecting to multiple databases in application - Read/write from noSQL or filesystem based stores as if they are relational tables - Postgres would push the operations e.g. filter clause to foreign database for better execution - Useful for migration or data integration - Foreign Data Wrapper available for vast databases and data stores - Hadoop, MongoDB, Oracle, MySQL, MariaDB, file system and many more 15
  • 16.
  • 17.
    Define your ownoperators - Postgres allows you to create your own operators - You can override the existing ones for specific cases of you can give a new meaning to an operator for special cases 17
  • 18.
    Benefits to theDevelopers - Define your own operators to define how user defined data types are handled - Define your own operators to override a default handling of data-types e.g. perform a case-in-sensitive search on varchar columns - Create new data-types to handle specific tasks e.g. use + for concatenation of strings - Makes the data processing easier for developers - Makes the migration process easier e.g. from SQL Server to PostgreSQL will benefit from string concatenation 18
  • 19.
  • 20.
    Extend PostgreSQL capabilitieswith Extension - These are like add-on modules which you can compile and add to PostgreSQL - Once added the features offered by extensions work as native features - Allows you to extend PostgreSQL capabilities - There are out of box extensions available and you can write your own 20
  • 21.
    Some Popular Extensions -pg_prewarm – load your data into buffer cache to avoid ‘cold reboot’ issues - pgcrypto – cryptographic functions to encrypt the data - pg_shard – Create a Sharded Cluster with PostgreSQL - postGIST – Add full spatial capabilities to PostgreSQL - pg_bufferpool, pgrowlocks, pgstattuple and pg_freespacemap – take a peek into buffers, locks and data pages - hstore- to use PostgreSQL as a key-value pair store - fuzzystrmatch and pg_trgm – more enhanced and powerful search on textual data 21
  • 22.
  • 23.
    When only aportion of data is relevant - Often we have some columns which has low cardinality or few distinct values - An index on these columns is not very helpful - Mostly we have queries which requires only one of the values from all available values - e.g. soft delete - Application always queries data where “ deleted = false” - e.g. using a column named “closed” in “ACCOUNTS” table in bank 23
  • 24.
    Benefits of creatinga partial index - You can index only that data which is relevant and queried- • create index idx_active_acc_paymentdt on ACCOUNTS(acc_int_payment_dt) where closed=flase; - This keeps the index size smaller which performs faster - You can create separate indexes to cover different set of data e.g. • create index idx_current_acc_paymentdt on ACCOUNTS(acc_int_payment_dt) where acc_type=‘current’; • create index idx_current_acc_paymentdt on ACCOUNTS(acc_int_payment_dt) where acc_type=‘savings’; 24
  • 25.
  • 26.
    You can controlParameters at several levels - Instance Level – in parameter file or in startup command - Database - alter database reporting_db set work_mem=10240; - User Level - alter user batch_user set maintenance_work_mem=1024mb - Transaction Level - Select set_config(‘work_mem’,’20480’,true); - Connection/Session Level - Set synchronous_commit=off; - Select set_config(‘synchronous_commit’,’off’,false); 26
  • 27.
    Benefits to theDeveloper - A developer can set the parameters as per the requirements in the program - Set higher maintenance and sorting memory for batch jobs - Set higher sorting memory for reporting user - Set synchronous_commit off during batches to enhance performance for bulk loads - Set different logging for specific users 27
  • 28.
    PostgreSQL Planner canGet the Statistics on the fly
  • 29.
    Benefits to theDeveloper - Often as a developer you have code batch jobs - Bulk uploads and bulk deletion of data from tables - Post these operations you may be querying the same table - Due to huge change in data volume chances are there that optimizer will pick a wrong plan. 29
  • 30.
    So shall yougather stats after each bulk load operation? - Not really! - PostgreSQL optimizer is smart enough to quickly gauge the statistics from the indexes on the fly - Developers don’t need to make their code heavy with ANALYZE, specially if response time is an important factor 30
  • 31.
  • 32.
    JDBC Copy - COPYcommand in PostgreSQL allows you to do bulk loads - PostgreSQL jdbc drive also provides a COPY API - Using JDBC Copy you can programmatically load data from STDIN or files - Allows programmers to do faster bulk loads 32
  • 33.
  • 34.
    Yes! You cando full text search on PostgreSQL - You can store your data in PostgreSQL and use it for complex pattern matches and textual search - With GIN indexes your text searches and pattern matches can be made faster - With additional Extensions you can also do trigram based searches or phonex/soundex matches - Makes the developers life easier while doing searches on textual data - GIN and GiST indexes help get better performance 34
  • 35.
    For Further Reference -www.postgresql.org - www.planetpostgresql.org - Various community user group discussions - Various blogs - Josh Berkus - Magnus Hagander - Bruce Momjian - Simon Riggs - Many more - Ashnik Blog Archives - Ashnik YouTube Channel 35
  • 36.