Billion Tables Project (BTP)
Álvaro Hernández Tortosa <aht@8Kdata.com>
Lightning talks – NoSQL Matters 2014
Cologne
Who I am
●
Álvaro Hernández Tortosa <aht@8Kdata.com>
●
What we do @8Kdata:
✔
Database innovation
✔
Training, consulting an...
What is a “large” database?
●
Single-node databases of up to TBs / dozens TBs.
Billions / trillions of records
●
Multi-nod...
Where it all started... (II)
http://it.toolbox.com/blogs/database-soup/one-billion-tables-or-bust-46270
May 21th, 2011
So... why do it?
O
f
cialreasons
In
reality...
●
To prove PostgreSQL has (no?) limits on the # of tables
●
To stress Postg...
First attempts (2011)
●
Josh Berkus
(http://it.toolbox.com/blogs/database-soup/one-billion-tables-or-bust-46270):
3M table...
1B tables. How to get there?
Tune postgresql.conf:
fsync = of
synchronous_commit = of
full_page_writes = of
wal_buffers = ...
1B tables. So, did it work?
$ time ./btp-main.sh 1000000000 16 50000 1000
real 2022m19.961s
user 240m7.044s
sys 165m25.336...
1B tables. Performance
20
80
200
260
300
320
340
380
440
460
540
600
620
640
680
860
1000
0
2000
4000
6000
8000
10000
1200...
The road to 2B tables
●
Faster pizza
 2x latest-gen quad-core Xeon up to 3.3 Ghz
➔
Faster than 16-core 2.0Ghz (8K5tps @10...
The road to 2B tables (II)
●
Storage requirements:
➔
Base: ~6TB
➔
Tablespaces: ~ 300GB
●
The autovacuum problem:
➔
autovac...
Which catalog tables are written by CREATE TABLE?
●
For a 1-column, with no indexes table:
➔
1 entry in pg_class
➔
2 entri...
Billion Tables Project (BTP)
Álvaro Hernández Tortosa <aht@Nosys.es>
Upcoming SlideShare
Loading in …5
×

Billion tables project - Lightning talks - NoSQL Matters - 2014 - Cologne

425 views

Published on

The Billion Tables Project explores how to have large PostgreSQL database installations and explores the metadata required to maintain such an installation. Even while looking at this extreme example, there is a lot of practicality to understanding how PostgreSQL manages its meta-information for your day-to-day operations.

  • Be the first to comment

  • Be the first to like this

Billion tables project - Lightning talks - NoSQL Matters - 2014 - Cologne

  1. 1. Billion Tables Project (BTP) Álvaro Hernández Tortosa <aht@8Kdata.com> Lightning talks – NoSQL Matters 2014 Cologne
  2. 2. Who I am ● Álvaro Hernández Tortosa <aht@8Kdata.com> ● What we do @8Kdata: ✔ Database innovation ✔ Training, consulting and development in PostgreSQL (and Java) ● Twitter: @ahachete ● LinkedIn: linkd.in/1jhvzQ3
  3. 3. What is a “large” database? ● Single-node databases of up to TBs / dozens TBs. Billions / trillions of records ● Multi-node databases, virtually unlimited. Reportedly hundreds of TBs, PBs ● This talk is not about Big Data. It's just about Big Data ● Indeed, we're talking here about Big MetaData (and the world's worst data/metadata ratio ever)
  4. 4. Where it all started... (II) http://it.toolbox.com/blogs/database-soup/one-billion-tables-or-bust-46270 May 21th, 2011
  5. 5. So... why do it? O f cialreasons In reality... ● To prove PostgreSQL has (no?) limits on the # of tables ● To stress PostgreSQL in an unusual way ● To test a new server before going to production ● To beat Josh Berkus, creating tables faster than him ;) ● “Mine is bigger than yours” (database) ● Because we can
  6. 6. First attempts (2011) ● Josh Berkus (http://it.toolbox.com/blogs/database-soup/one-billion-tables-or-bust-46270): 3M tables, 83 tps. Server crashed (out of disk). Serial + text ● Jan Urbanski (http://it.toolbox.com/blogs/database-soup/one-billion-tables-part-2-46349): 4.6M tables, 1K tps. Server crashed (inodes). Int + text ● $SELF (http://it.toolbox.com/blogs/database-soup/one-billion-tables-part-2-46349): 10M tables, 2K2 tps. Stopped. Single int column 100M tables, 1K5 tps. Stopped. Single int column
  7. 7. 1B tables. How to get there? Tune postgresql.conf: fsync = of synchronous_commit = of full_page_writes = of wal_buffers = 256MB autovacuum = of max_locks_per_transaction = 10000 shared_buffers = 16384MB checkpoint_segments = 128
  8. 8. 1B tables. So, did it work? $ time ./btp-main.sh 1000000000 16 50000 1000 real 2022m19.961s user 240m7.044s sys 165m25.336s (aka 33h 42m 20s) ● Avg: 8242 tables per second btp=# SELECT txid_current(); txid_current -------------- 1000001685
  9. 9. 1B tables. Performance 20 80 200 260 300 320 340 380 440 460 540 600 620 640 680 860 1000 0 2000 4000 6000 8000 10000 12000 1B tables. Performance Tables per second tps M tables Peak: 10Ktps
  10. 10. The road to 2B tables ● Faster pizza  2x latest-gen quad-core Xeon up to 3.3 Ghz ➔ Faster than 16-core 2.0Ghz (8K5tps @100Mt): ✔ With 16 processes: 11K9 tps @100Mt ✔ With 8 processes: 12K3 tps @100Mt $ time ./btp-main.sh 1000000000 8 50000 1000 real 1400m6.680s user 165m32.565s sys 119m46.493s (aka 23h 20m 07s) Avg: 11903tps
  11. 11. The road to 2B tables (II) ● Storage requirements: ➔ Base: ~6TB ➔ Tablespaces: ~ 300GB ● The autovacuum problem: ➔ autovacuum_freeze_max_age max val is only 2E9 ➔ Autovacuum will run. Killed workers are restarted ➔ We're doing 1tx per table creation ➔ If we group table creations in txs performance degrades significantly (counter-intuitive!)
  12. 12. Which catalog tables are written by CREATE TABLE? ● For a 1-column, with no indexes table: ➔ 1 entry in pg_class ➔ 2 entries in pg_type (table type, table[] type) ➔ 3 entries in pg_depend (2 for the types, 1 for the schema the table belongs to) ➔ 7 entries in pg_attribute (xmin, xmax, cmin, cmax, ctid, tableoid and the column itself) ➔ So basically creating 11K9 tps means: ➔ 154K insertions per second ➔ Plus 11K9 file creations per second @1MB/s of metainformation
  13. 13. Billion Tables Project (BTP) Álvaro Hernández Tortosa <aht@Nosys.es>

×