The Future of Big Data is Relational (or why you can't escape SQL)
Upcoming SlideShare
Loading in...5
×
 

The Future of Big Data is Relational (or why you can't escape SQL)

on

  • 3,382 views

 

Statistics

Views

Total Views
3,382
Views on SlideShare
3,286
Embed Views
96

Actions

Likes
9
Downloads
59
Comments
0

5 Embeds 96

https://twitter.com 73
http://eventifier.co 18
http://eventifier.com 3
http://simuladormx.blogspot.com 1
http://ec2-54-243-189-159.compute-1.amazonaws.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Future of Big Data is Relational (or why you can't escape SQL) The Future of Big Data is Relational (or why you can't escape SQL) Presentation Transcript

    • The Future of Relational (or Why You Cant Escape SQL) tobrien@discursive.com Twitter: @tobrienThursday, February 28, 13
    • In this session...  Ouroboros  Copernican Revolution  Ptolemaic Entrenchment  Janus  A two minute summary of the last 15 years  Google Magic  The Future of SQLThursday, February 28, 13
    • Tim O’Brien  I’m a developer who also writes  tobrien@discursive.com  Twitter: @tobrienThursday, February 28, 13
    • Thursday, February 28, 13
    • Thursday, February 28, 13
    • RevolutionThursday, February 28, 13
    • Remember all that Big Data Stuff?Thursday, February 28, 13
    • Remember when we all thought it was time to give up schemas? Man, wasn’t that a lot of work.Thursday, February 28, 13
    • What if the relational database “catches up”? What then?Thursday, February 28, 13
    • How we market Big Data: Big Data == Paradigm Shift “singularity” > “disruptor”Thursday, February 28, 13
    • Thursday, February 28, 13
    • Thursday, February 28, 13
    • “Big Data” is to “Traditional Databases” as... Copernicus is to PtolemyThursday, February 28, 13
    • Out with the “old” In with the “new”Thursday, February 28, 13
    • Claudius Ptolemy Copernicus’ ~150 AD model 1543 ADThursday, February 28, 13
    • Google’s BigTable Paper - 2006 Edgar F. Codd “A Relational Model of Data for Large Shared Hadoop - 2007 Data Banks” 1970Thursday, February 28, 13
    • Thursday, February 28, 13
    • Google F1, Spanner Translattice, Impala, + = Drawn-to-Scale Google’s BigTable Paper - 2006 Text Codd NuoDB, Akiban, many more NewSQL products Hadoop - 2007Thursday, February 28, 13
    • Thursday, February 28, 13
    • Thursday, February 28, 13
    • Youth Age Looking Forward Looking BackwardThursday, February 28, 13
    • Whatever. Let’s create a schema. Haven’t you heard? Ok? Databases don’t scale.Thursday, February 28, 13
    • And, both are right...Thursday, February 28, 13
    • • Thursday, February 28, 13
    • Thursday, February 28, 13
    • Thursday, February 28, 13
    • 2000 In the beginning... Proprietary app servers Big Oracle database TextThursday, February 28, 13
    • 2001 More traffic? Specialized application servers Text Throw hardware at the databaseThursday, February 28, 13
    • 2002-2005 More traffic? Specialized application servers Throw hardware at the databaseThursday, February 28, 13
    • 2005 Event More Traffic? Sharding.... ugh. Tex Everything else was scaling horizontal except the database.Thursday, February 28, 13
    • 2006 - New Reality of Big Data Q: What would Google’s BigTable Google do? Paper - 2006 A: Not use a Hadoop - 2007 RDBMsThursday, February 28, 13
    • 2006 Big Data vs. RDBMs for a few for mostThursday, February 28, 13
    • 2007 •The  rise  of  Database  “Luddites” Text Who  needs  Foreign  Keys? Transac3ons?  Just  Simplify •Thursday, February 28, 13
    • 2007 •The  rise  of  Database  “Luddites” Text Rails  hacked  away  @  database  “orthodoxy” Opened  the  door  to  alterna3ve  approachesThursday, February 28, 13
    • •Although,  Basecamp  is  s3ll  a  single  RDBMS…Thursday, February 28, 13
    • 2007- present == Alternatives •Documents –MongoDB  –  Started  in  2007,  OSS  in  2009 –CouchDB  –  Started  in  2005 •Graphs –Neo4j •Key-­‐Value  Stores –Cassandra –Riak –Tokyo  Cabinet •Memory –Memcached  /  Redis •Tabular –HBaseThursday, February 28, 13
    • 2012 Q: What database do you use? A: All of them Text Oracle, Mongo, MySQL, Impala, Riak, some memcache, and some Hadoop thrown in for funThursday, February 28, 13
    • Thursday, February 28, 13
    • Big Data a Necessity at Largest Scale “A certain kind of developer at a certain kind of company” Most development still RDBMSThursday, February 28, 13
    • • There’s  this  company  that  sells  adver3sing –~96%  of  revenue  came  from  adver3sing  in  2011 –~75%  of  the  US  Search  Advert  Market  in  2011 –~44%  shared  of  overall  online  ad  market • One  of  the  most  important  applica3ons  at  Google  ran  on  MySQL   –AdWords  missed  the  NoSQL  revolu3onThursday, February 28, 13
    • Digging into the evolution of Storage at Google • Google’s  BigTable  –  2006 –Tabular –Sparse,  distributed,  mul3-­‐dimensional  sorted  mapThursday, February 28, 13
    • Digging into the evolution of Storage at Google •Google’s  BigTable  –  2006 –“New  users  []  uncertain  of  how  to  best  use  the  BigTable   interface,  par3cularly  if  they  are  accustomed  to  using   rela3onal  databases  that  support  general-­‐purpose   transac3ons.”Thursday, February 28, 13
    • Digging into the evolution of Storage at Google • Google’s  Megastore  –  2010 –Hierarchical  “schemas” –Posi3oned  as  a  NoSQL  store –ACID  within  par33onsThursday, February 28, 13
    • Digging into the evolution of Storage at Google • Google’s  Megastore  –  2010 –“Supports  two-­‐phase  commit  for  atomic  updates  []  these   transac3ons  have  much  higher  latency  and  increase  the  risk   of  conten3on,  we  generally  discourage  applica3ons  from   using  the  feature“Thursday, February 28, 13
    • Digging into the evolution of Storage at Google •Google’s  Spanner  &  F1  –  2012 •Paper  published  in  2012 –Hierarchical,  Semi-­‐rela3onal  Schemas –ACID  across  con3nents  possible  -­‐  14ms  transac3on  overhead  in  a   data-­‐center  with  clock  uncertainty  of  1ms. –SQL –Focus  on  Performance   •Gated  by  Clock  Uncertainty •Consensus:  PaxosThursday, February 28, 13
    • What Differentiates Google Spanner? •Transac3ons  are  only  possible  because  of  Paxos •Forget  NTP,  Google  has  “Reified  Clock  Uncertainty” •Epsilon,  clock  uncertainty,  is  the  ga3ng  factor  for  gaining  consensus  on   transac3on  3mestampe. •It’s  all  about  Time •“as  the  underlying  system  enforces  3ghter  bounds  on  clock  uncertainty,  the   overhead  of  the  stronger  seman3cs  decreases.  As  a  community,  we  should  no   longer  depend  on  loosely  synchronized  clocks  and  weak  3me  APIs  in  designing   distributed  algorithms.Thursday, February 28, 13
    • Let me reiterate Google has Mastered TimeThursday, February 28, 13
    • What Differentiates Google Spanner? •Hierarchical,  Schema3zed  Tables •Similar  to  Akiban’s  approach. •Leads  to  some  interes3ng  possibili3es. •Nested  Subqueries  and  Tree  ResultsThursday, February 28, 13
    • What Differentiates Google Spanner? To reiterate: * hierarchical, schematized tables * distributed “compute fabric” for data * Google has mastered Time * Google built a warp reactorThursday, February 28, 13
    • As goes Google so does the world...  Translattice  Drawn-to-Scale  Akiban  Impala  Several NewSQL companies quickly jumped on this train: - NuoDB - VoltDB  Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust  effort.Thursday, February 28, 13
    • Translattice  Translattice identifies itself as a database that resembles F1  It is a hosted database service which provides distributed transactions.  Translattice uses Paxos  They’ve extended Postgresql and emphasize customer control over data. A distributed, cloud-based databaseThursday, February 28, 13
    • Akiban  Akiban’s approach to storage almost *exactly* matches the strategy Google uses in Spanner.  Akiban lacks the distributed transaction capability of Spanner and F1, but they are working on developing the capability.  Akiban has implemented a query parser, optimizer, and execution engine atop a hierarchical approach to storage.Thursday, February 28, 13
    • Drawn-to-Scale  Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments  Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute fabric”  No Paxos or Transactions... yet. To be released, shortly. Stay tuned.  Drawn to Scale aims to be an “installable” database. Not going the hosted route.  Data stored in HDFS/HBase.Thursday, February 28, 13
    • So there. Big Data is turning into a Big Relational DatabaseThursday, February 28, 13