SlideShare a Scribd company logo
1 of 20
© 2ndQuadrant 2016
Big Data & PostgreSQL
Using TABLESAMPLE to Analyze Very Large Datasets
By Umair Shahid
© 2ndQuadrant 2016
Who am I?
Director, Products @ 2ndQuadrant
Got “pushed” into PostgreSQL in
2004, ended up falling in love
with it
Not a hardcore techie, yet
passionate about open source
software
Interested in Big Data, especially the
newer PostgreSQL features
supporting it
2011
2015
© 2ndQuadrant 2016
What is Big Data?
Volume
Size: Text files to HD videos
Sources: Spreadsheets to sensors
From lakes to oceans
Velocity
More sources imply more speed
Faster connectivity implies more speed
High-paced world requires faster turnaround
Variety
© 2ndQuadrant 2016
What is the problem?
Number of Rows Size on Disk (MB) Time Taken (ms)
1k 0.23 219.706
100k 24 1,302.135
1M 195 7,696.386
5M 951 40,691.603
10M 1,923 60,012.457
100M 19,456 801,493.319
© 2ndQuadrant 2016
Why is this significant?
Data mining has typically been a painful process
Major contributor to the pain has been the time it
takes for queries to return
Many false steps before the required data is
identified
Waiting time is wasted time
Sampling, count based or time based, reduces the
wasted time significantly
© 2ndQuadrant 2016
What is TABLESAMPLE?
Ability to read a random sample of data
in a table
Defined in SQL:2003 (5th revision of
SQL)
Implemented in PostgreSQL 9.5
© 2ndQuadrant 2016
Syntax
SELECT select_expression
FROM table_name
TABLESAMPLE sampling_method ( argument [, ...] )
[ REPEATABLE ( seed ) ]
...
© 2ndQuadrant 2016
sampling_method
argument is percentage of rows
SYSTEM
Block level sampling
Very fast
Non-independent rows
BERNOULLI
Row level sampling
Slower than SYSTEM
Independent rows (uniformly random)
© 2ndQuadrant 2016
© 2ndQuadrant 2016
Demo sampling methods
© 2ndQuadrant 2016
REPEATABLE results
(Reminder: [ REPEATABLE ( seed ) ])
Optional argument
Used if random, yet repeatable results are
required
seed and argument need to be the same to
produce repeatable results
Any changes made to the table will result in a
different data set
© 2ndQuadrant 2016
Now it gets interesting …
TABLESAMPLE allows for additional sampling methods
via extensions
tsm_system_time specifies max number of milliseconds
to spend reading a table
Implements the syntax:
SELECT select_expression
FROM table_name
TABLESAMPLE SYSTEM_TIME (argument)
© 2ndQuadrant 2016
Demo tsm_system_time
© 2ndQuadrant 2016
Enter Orange ...
Funded by AXLE
(http://axleproject.eu)
Same project funded
TABLESAMPLE
Available integrated with
PostgreSQL in 2UDA
(http://2ndquadrant.com/2uda)
Uses TABLESAMPLE to
very quickly create
visualizations for data
Can quickly create
predictive models
© 2ndQuadrant 2016
Demo Orange
You can find a very helpful tutorial at
http://2ndquadrant.com/2uda
© 2ndQuadrant 2016
Other Big Data features in PostgreSQL
● HSTORE
XML
JSON & JSONB
BRIN INDEXES
Parallel sequential scan
Parallel aggregates
FDWs
© 2ndQuadrant 2016
Features from the latest release
9.6 Beta3 announced last night
Added support to parallel query for
TABLESAMPLE
© 2ndQuadrant 2016
Moving Forward …
Next meetup: Tentatively August 19, 2016
Please come forward and share your
PostgreSQL stories
Today’s refreshments are sponsored by
2ndQuadrant - THANK YOU!
Need more sponsors
OR
Need to start charging for these sessions
Special Thanks!!
© 2ndQuadrant 2016
Umair Shahid
Email: umair.shahid@2ndQuadrant.com
Twitter: @pg_umair
2ndQuadrant is hiring - All geographies!
Thank you for your time!

More Related Content

Viewers also liked

Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenPostgresOpen
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...PostgresOpen
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningUmair Shahid
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Denish Patel
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningUmair Shahid
 
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres OpenRobert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres OpenPostgresOpen
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenPostgresOpen
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenPostgresOpen
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finishelliando dias
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres OpenPostgresOpen
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGiuseppe Broccolo
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres OpenMichael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres OpenPostgresOpen
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogicalUmair Shahid
 
PostgreSQL replication from setup to advanced features.
 PostgreSQL replication from setup to advanced features. PostgreSQL replication from setup to advanced features.
PostgreSQL replication from setup to advanced features.Pivorak MeetUp
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQLPGConf APAC
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 
Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Jonathan Battiato
 

Viewers also liked (20)

Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres Open
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres OpenRobert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfs
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres OpenMichael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
 
PostgreSQL replication from setup to advanced features.
 PostgreSQL replication from setup to advanced features. PostgreSQL replication from setup to advanced features.
PostgreSQL replication from setup to advanced features.
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQL
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 
Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016
 
Types dbms
Types dbmsTypes dbms
Types dbms
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

Islamabad PUG - Big Data & PostgreSQL; Using TABLESAMPLE to Analyze Very Large Datasets

  • 1. © 2ndQuadrant 2016 Big Data & PostgreSQL Using TABLESAMPLE to Analyze Very Large Datasets By Umair Shahid
  • 2. © 2ndQuadrant 2016 Who am I? Director, Products @ 2ndQuadrant Got “pushed” into PostgreSQL in 2004, ended up falling in love with it Not a hardcore techie, yet passionate about open source software Interested in Big Data, especially the newer PostgreSQL features supporting it 2011 2015
  • 3. © 2ndQuadrant 2016 What is Big Data? Volume Size: Text files to HD videos Sources: Spreadsheets to sensors From lakes to oceans Velocity More sources imply more speed Faster connectivity implies more speed High-paced world requires faster turnaround Variety
  • 4. © 2ndQuadrant 2016 What is the problem? Number of Rows Size on Disk (MB) Time Taken (ms) 1k 0.23 219.706 100k 24 1,302.135 1M 195 7,696.386 5M 951 40,691.603 10M 1,923 60,012.457 100M 19,456 801,493.319
  • 5. © 2ndQuadrant 2016 Why is this significant? Data mining has typically been a painful process Major contributor to the pain has been the time it takes for queries to return Many false steps before the required data is identified Waiting time is wasted time Sampling, count based or time based, reduces the wasted time significantly
  • 6. © 2ndQuadrant 2016 What is TABLESAMPLE? Ability to read a random sample of data in a table Defined in SQL:2003 (5th revision of SQL) Implemented in PostgreSQL 9.5
  • 7. © 2ndQuadrant 2016 Syntax SELECT select_expression FROM table_name TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ...
  • 8. © 2ndQuadrant 2016 sampling_method argument is percentage of rows SYSTEM Block level sampling Very fast Non-independent rows BERNOULLI Row level sampling Slower than SYSTEM Independent rows (uniformly random)
  • 10. © 2ndQuadrant 2016 Demo sampling methods
  • 11. © 2ndQuadrant 2016 REPEATABLE results (Reminder: [ REPEATABLE ( seed ) ]) Optional argument Used if random, yet repeatable results are required seed and argument need to be the same to produce repeatable results Any changes made to the table will result in a different data set
  • 12. © 2ndQuadrant 2016 Now it gets interesting … TABLESAMPLE allows for additional sampling methods via extensions tsm_system_time specifies max number of milliseconds to spend reading a table Implements the syntax: SELECT select_expression FROM table_name TABLESAMPLE SYSTEM_TIME (argument)
  • 13. © 2ndQuadrant 2016 Demo tsm_system_time
  • 14. © 2ndQuadrant 2016 Enter Orange ... Funded by AXLE (http://axleproject.eu) Same project funded TABLESAMPLE Available integrated with PostgreSQL in 2UDA (http://2ndquadrant.com/2uda) Uses TABLESAMPLE to very quickly create visualizations for data Can quickly create predictive models
  • 15. © 2ndQuadrant 2016 Demo Orange You can find a very helpful tutorial at http://2ndquadrant.com/2uda
  • 16. © 2ndQuadrant 2016 Other Big Data features in PostgreSQL ● HSTORE XML JSON & JSONB BRIN INDEXES Parallel sequential scan Parallel aggregates FDWs
  • 17. © 2ndQuadrant 2016 Features from the latest release 9.6 Beta3 announced last night Added support to parallel query for TABLESAMPLE
  • 18. © 2ndQuadrant 2016 Moving Forward … Next meetup: Tentatively August 19, 2016 Please come forward and share your PostgreSQL stories Today’s refreshments are sponsored by 2ndQuadrant - THANK YOU! Need more sponsors OR Need to start charging for these sessions
  • 20. © 2ndQuadrant 2016 Umair Shahid Email: umair.shahid@2ndQuadrant.com Twitter: @pg_umair 2ndQuadrant is hiring - All geographies! Thank you for your time!