Your SlideShare is downloading. ×
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Vertica
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Vertica

1,298

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,298
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Vertica
  • 2. Why? ● ● ● ● Postgres benchmarks (2014-01-13) Remember, these queries are expected to occur within a web request/repsonse cycle! After 60 seconds connections time out We are used to web pages loading in 1-2 seconds
  • 3. Count ● SELECT count(*) FROM transactions ● (229527.0ms) ● => [{"count"=>78144197}] ● SELECT count(*) FROM transactions WHERE client_id = 131 ● (85451.0ms) ● => [{"count"=>34406416}]
  • 4. Yikes.
  • 5. Don't panic! (and carry a towel)
  • 6. We have a few tricks. ● What if we had a table that recorded 1 row per client that tracked all the counts of transactions for each client? id client_id count_transactions 1 131 34406416 2 132 10587625 3 133 85095 What if we wired this table up to a SQL parser?
  • 7. Mondrian! ● ● ● ● Robust aggregate table interface Auto recognizes aggregate tables via naming convention Queries are directed to the correct table If aggregate tables are missed, fall back to fact table ● Can define multiple aggregate tables / fact table ● Also has an intelligent segment cache
  • 8. But theres a problem. ● ● SELECT count(distinct(user_id) FROM transactions Aggregate tables rely on properties of addition operations ● distinct(set_1) + distinct(set_2) != distinct(set_1 + set_2) ● We have no choice but to query our fact table.
  • 9. Ok, now we can panic.
  • 10. Options? ● ● ● NOSQL (map reduce) – Hbase/Hadoop, Mongo, etc Columnar – Lucid, Paraccell, Vertica Bleeding Edge – Google BigQuery, Apache Drill
  • 11. Much Cluster, Many Computer ● ● ● All of these solutions are using distributed systems to query lots of data quickly Querying 100 million rows on a single computer is not fast on current hardware And we are projecting to have a lot more than 100 million rows this year
  • 12. Vertica ● Columnar ● Distributed ● Speaks SQL ● Compatible with Mondrian ● Its fast! ● “drop in” replacement for Postgres
  • 13. Row based database id name favorite_color 1 brian blue 2 dennis red 3 nelson green 4 spencer green (1,brian,blue)(2,dennis,red)(3,nelson,green)(4,spencer,green)
  • 14. Columnar database id name favorite_color 1 brian blue 2 dennis red 3 nelson green 4 spencer green (1,2,3,4)(brian,dennis,nelson,spencer)(blue,red,green,green)
  • 15. Do you even index, bro?
  • 16. Nope! ● ● ● ● ● Vertica has no indexes Vertica has “projections” which are similar to a materialized view Projections are transparent to the query (like an index) Projections are used to optimize JOIN, GROUP BY, and other sorts of queries Provides a tool to autobuild projections based on query analysis
  • 17. Tradeoffs Columnar Row Based ● Slow single row read ● Fast single row read ● Slow single row write ● Fast single row write ● Fast aggreagtes ● Slow aggregates ● Compression (5-10x) ● No compression
  • 18. Distributed ● Data split among servers ● Horizontal scaling ● Data is compressed, so its stored in memory ● Node failure is tolerated ● Network IO is important
  • 19. Count All Transactions Postgres – 230s Vertica – 2.10s Distinct User Count All Transactions Postgres – 187s Vertica – 0.63s
  • 20. So you just drop it in, right? ● 6 or 7 gems needed updates ● Had to roll an activerecord driver ● AreL saved us from a lot of pain ● ● ● Still some SQL problems (database drop, multirow insert) Lots of DevOps help needed Currently deployed to sand and qa, hitting production soon!
  • 21. Thank You

×