FOSSASIA 2015 - 10 Features your developers are missing when stuck with Proprietary Databases

10 Features Developers are
missing when Stuck with
Proprietary Databases!
Sameer Kumar (@sameerkasi200x)
DB Solution Architect, Ashnik (@AshnikBiz)
15th March 2015

About Me!
- A Random guy who started Career as an Oracle and DB2
DBA (and yeah a bit of SQL Server too)
- Then moved to ‘Ashnik’ and started working with Postgres
- We work on Open Source Consulting and Solution
- And now I love Open Source!
- Twitter - @sameerkasi200x
- Apart from technology I love cycling and photography
2

Why I Love PostgreSQL?
- Claims to be “Most Advanced Open Source Database”
- A vibrant and active community
- Full ACID compliant
- Multi Version Concurrency Control
- NoSQL capability
- Developer Friendly
- Built to be extended ‘easily’
3

Supported on vast platforms
4
- Portable on vast range of Operating Systems – Unix, Linux,
Windows etc
- Supported on various Architectures – RISC, ARM, x86

10 Features you would love as a developer!
1. New JSON Datatype introduced in v9.4 and JSON Functions &
Operators
2. Vast set of datatypes supported – money, time, range, boolean,
interval and many more
3. Rich support for foreign Data Wrapper – Build a Logical Data
warehouse!
4. User Defined Operators – It’s really cool!
5. User Defined Extensions – you have out of box extensions plus you
can write your own!
5… continued

10 Features you would love as a developer!
6. Filter Based Indexes or Partial Indexes – Index only what you need
to!
7. Granular control of parameters at User, Database, Connection or
Transaction Level – sort memory, logging parameters, reliability
parameters and many more
8. Use of indexes to get statistics on the fly
9. JDBC API for COPY Command – Do bulk load right from you java
program
10. Full Text Search – There is a lot more than what you think
6

Store Unstructured Data – Store rows with
different Attributes
“category “
is an array
“features” is
an array of
sub-
documents
create table item_catalog (
item_id varchar(50)
primary key,
item_description
varchar(250), attributes
jsonb );
“features”has different
set of members
New fields which suit the
details of this specific type
of product
8

Benefits to the Developers
- Allows you to store records which might have different
attributes
- Store data in JSONB field until your schema has matured and
firmed and then move it relational attributes (columns)
- Use JSON functions& operators to fetch and return data to
application via APIs
- This would be make application transparent to
underlying structure
- The binary storage format of JSONB allows efficient parsing
- You index JSON fields for faster search!
9

Data Types Supported
Data Type Usage
Money Store currency Data
Interval Store time interval e.g. ‘2 days’, ‘1 hour’ etc
Time Store the time 2:00PM, 6:00AM etc
Range Store Ranges for integer, date or timestamp
Boolean Store true or false values
And store many more common data types e.g. varchar and char for
string, numeric, float, integer, serial etc.
Create user defined datatypes to store data as per your convenience and
define GiST indexes for your data-types
11

- Store the data from application or user input in more
intuitive datatypes in database
- Avoid conversion or translation of values retrieved from
database
- Define your own data types to match the structure or
objects defined in programs
- Define your own operators and index access for user
defined operators
12

Access Remote Databases
- As the name suggests allows you to access foreign tables in
remote databases
- Allows you to read and write from these foreign tables
14

- Access data from legacy systems for run-time processing
- Avoid connecting to multiple databases in application
- Read/write from noSQL or filesystem based stores as if they are
relational tables
- Postgres would push the operations e.g. filter clause to foreign
database for better execution
- Useful for migration or data integration
- Foreign Data Wrapper available for vast databases and data
stores
- Hadoop, MongoDB, Oracle, MySQL, MariaDB, file system and many more
15

Define your own operators
- Postgres allows you to create your own operators
- You can override the existing ones for specific cases of you
can give a new meaning to an operator for special cases
17

- Define your own operators to define how user defined data
types are handled
- Define your own operators to override a default handling of
data-types e.g. perform a case-in-sensitive search on
varchar columns
- Create new data-types to handle specific tasks e.g. use + for
concatenation of strings
- Makes the data processing easier for developers
- Makes the migration process easier e.g. from SQL Server to
PostgreSQL will benefit from string concatenation
18

Extend PostgreSQL capabilities with Extension
- These are like add-on modules which you can compile and
add to PostgreSQL
- Once added the features offered by extensions work as
native features
- Allows you to extend PostgreSQL capabilities
- There are out of box extensions available and you can write
your own
20

Some Popular Extensions
- pg_prewarm – load your data into buffer cache to avoid ‘cold
reboot’ issues
- pgcrypto – cryptographic functions to encrypt the data
- pg_shard – Create a Sharded Cluster with PostgreSQL
- postGIST – Add full spatial capabilities to PostgreSQL
- pg_bufferpool, pgrowlocks, pgstattuple and pg_freespacemap –
take a peek into buffers, locks and data pages
- hstore- to use PostgreSQL as a key-value pair store
- fuzzystrmatch and pg_trgm – more enhanced and powerful
search on textual data
21

When only a portion of data is relevant
- Often we have some columns which has low cardinality or
few distinct values
- An index on these columns is not very helpful
- Mostly we have queries which requires only one of the
values from all available values
- e.g. soft delete
- Application always queries data where “ deleted = false”
- e.g. using a column named “closed” in “ACCOUNTS” table in
bank
23

Benefits of creating a partial index
- You can index only that data which is relevant and queried-
• create index idx_active_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where closed=flase;
- This keeps the index size smaller which performs faster
- You can create separate indexes to cover different set of
data e.g.
• create index idx_current_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where acc_type=‘current’;
• create index idx_current_acc_paymentdt on
ACCOUNTS(acc_int_payment_dt) where acc_type=‘savings’;
24

You can control Parameters at several levels
- Instance Level – in parameter file or in startup command
- Database
- alter database reporting_db set work_mem=10240;
- User Level
- alter user batch_user set maintenance_work_mem=1024mb
- Transaction Level
- Select set_config(‘work_mem’,’20480’,true);
- Connection/Session Level
- Set synchronous_commit=off;
- Select set_config(‘synchronous_commit’,’off’,false);
26

Benefits to the Developer
- A developer can set the parameters as per the requirements
in the program
- Set higher maintenance and sorting memory for batch jobs
- Set higher sorting memory for reporting user
- Set synchronous_commit off during batches to enhance
performance for bulk loads
- Set different logging for specific users
27

PostgreSQL Planner can Get the
Statistics on the fly

Benefits to the Developer
- Often as a developer you have code batch jobs
- Bulk uploads and bulk deletion of data from tables
- Post these operations you may be querying the same table
- Due to huge change in data volume chances are there that
optimizer will pick a wrong plan.
29

So shall you gather stats after each bulk load
operation?
- Not really!
- PostgreSQL optimizer is smart enough to quickly gauge the
statistics from the indexes on the fly
- Developers don’t need to make their code heavy with
ANALYZE, specially if response time is an important factor
30

JDBC Copy
- COPY command in PostgreSQL allows you to do bulk loads
- PostgreSQL jdbc drive also provides a COPY API
- Using JDBC Copy you can programmatically load data from
STDIN or files
- Allows programmers to do faster bulk loads
32

Yes! You can do full text search on PostgreSQL
- You can store your data in PostgreSQL and use it for complex
pattern matches and textual search
- With GIN indexes your text searches and pattern matches
can be made faster
- With additional Extensions you can also do trigram based
searches or phonex/soundex matches
- Makes the developers life easier while doing searches on
textual data
- GIN and GiST indexes help get better performance
34

For Further Reference
- www.postgresql.org
- www.planetpostgresql.org
- Various community user group discussions
- Various blogs
- Josh Berkus
- Magnus Hagander
- Bruce Momjian
- Simon Riggs
- Many more
- Ashnik Blog Archives
- Ashnik YouTube Channel
35

FOSSASIA 2015 - 10 Features your developers are missing when stuck with Proprietary Databases

More Related Content

What's hot

Viewers also liked

Similar to FOSSASIA 2015 - 10 Features your developers are missing when stuck with Proprietary Databases

More from Ashnikbiz

Recently uploaded

FOSSASIA 2015 - 10 Features your developers are missing when stuck with Proprietary Databases