The Art of Big Data

  • 7,366 views
Uploaded on

Slides for my talk at the Naval Post Graduate SChool PhD Seminar

Slides for my talk at the Naval Post Graduate SChool PhD Seminar

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,366
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
328
Comments
0
Likes
12

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The road lies plain before me;--tis a theme Single and of determined bounds; … - Wordsworth, The Prelude m pre ss.co . word ol bl eclix te Scho p:/ /dou Gr adua 1 ka r, htt val Post 2 9,201 n a San r, Na Nov Krish in a st Sem hD Gue 00–PEC40
  • 2. What is Big Data ? Big Data to smart data Bigo  Agenda Data o  To cover the broad Pipeline picture o  Understand the waypoints & o  Drill down into one area (NOSQL) Analytics/ Modeling Analytic Storage - R Algorithms NOSQL o  Can do others later … Processing -o  Of the Big Data Visualization Hadoop … domain …
  • 3. Thanks to …The giants whose shoulders I am standing on Special  Thanks  to:        Peter  Ateshian,  NPS        Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly        Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  
  • 4. When I think of my own native land, In a moment I seem to be there; But, alas! recollection at hand Soon hurries me back to despair.- Cowper, The Solitude Of Alexander SelKirk
  • 5. What is Big Data ?“Big data” is data “Big data” is less that becomes large about size, more enough that it about flow & velocity cannot be processed - persisting using conventional petabytes per year is methods. @twitter easier than processing terabytes per hour. @twitter Ref:  hIp://radar.oreilly.com/2010/09/the-­‐smaq-­‐stack-­‐for-­‐big-­‐data.html  
  • 6. What is Big Data ? Vinod Khosla’s Cool Dozen!   Consumers : “Widespread innovation in technologies that reduce data overload for users” ~ Data Reduction   Businesses : “Simple solutions to handle the deluge of data generated from various sources …” ~ Big Data Analytics TV  2.0,  EducaXon,  Social  NEXT,Tools  for  sharing  inteerst,Publishing,…   Ref:  hIp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaXons/156307/0/  hIp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/  
  • 7. EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 8. EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 9. EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 10. EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 11. EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 12. I.  Two  Main  Types  –  based  on  collecXon   i.  Big  Data  Streams   o  Data  in  “moXon”   o  TwiIer  fire  hose,  Facebook,  G+     ii.  Big  Data  Logs   o  Data  “at  rest”   o  Logs,  DW,  external  market  data,  POS,  …  II.  Typically,  Big  Data  has  a  non-­‐determinisXc  angle  as  well  …   o  CreaXve  Discovery   o  IteraXve,  Model  based  AnalyXcs   o  Explore  quesXons  to  ask  III.  Smart  Data  =  Big  Data  +  context  +  embedded/interacXve  (inference,   reasoning)  models   o  Model  Driven   o  DeclaraXvely  InteracXve   hIp://www.slideshare.net/leonsp/hadoop-­‐slides-­‐11-­‐what-­‐is-­‐big-­‐data   hIp://www.slideshare.net/Dataversity/wed-­‐1550-­‐bacvanskivladimircolor  
  • 13. AWS – 600 Billion objects!Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire hose for social network analytics ? Zynga §  “Analytics company, not a gaming company!” §  Harvests data : 15 TB/day Storage §  Test new features §  4 U box = 40 TB, §  Target advertising 1 PB = 25 boxes ! §  §  230 million players/month hIp://goo.gl/dcBsQ  
  • 14. •  6  Billion  Messages  per   day  •  2  PB  (w/compression)   online  •  6  PB  w/  replicaXon  •  250  TB/Month  growth  •  HBase  Infrastructure  
  • 15. 50  TB/Day   Very  systemaXc   240  nodes,  84  PB   Diagram  speaks  volumes!  Path  Analysis   Teradata  InstallaXon  A/B  TesXng   Ref:  hIp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  
  • 16. •  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive   dileIante  …  baIling  the  efficient  human  mind  with  spectacular   flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker   •  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks   •  1  TB  of  dataset   •  200  Million  pages  processed  by  Hadoop   •  This  is  a  good  example  of  Connected  data   –  Contextual  w/  variability   –  Breath  of  interpretaXon   –  AnalyXcs  depth  hIp://doubleclix.wordpress.com/2011/03/01/the-­‐educaXon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy%E2%80%9D-­‐by-­‐stephen-­‐baker/  hIp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  
  • 17. Warehouse-­‐style   ApplicaXons   Block  Store   Distributed   Big Data ApplicaXons   Storage   Object  Store   NOSQL   AnalyXcs   Parallelism   Map/Reduce   Web   HPC  AnalyXcs   Cloud   Architecture   Social  Media   Log   Inference   AnalyXcs   Social     RecommendaXon/ Graph   Inference  Engines   Machine   Knowledge   Search,   Learning   Mahout   Graph   Indexing   ClassificaXon,  Clustering  
  • 18. “A towel is about the most massively useful thing an interstellar hitchhiker can have … any man who can hitch the length and breadth of the Galaxy, rough it … win through, and still know where his towel is, is clearly a man to be reckoned with.” - From The Hitchhikers Guide to the Galaxy, by Douglas Adams. Published by Harmony Books in 1979Big  Data  to  Smart  Data
  • 19. Don’t  throw  away   1 any  data  ! Big  data  to  smart  data Be  ready  for  different   2 ways  of  organizing   the  data •  summary h;p://goo.gl/fGw7r
  • 20. Big  Data  Pipeline If a problem has no solution, it is not a problem, but a fact, not to be solved but to be coped with, over time … - Peres’s Law
  • 21. Big  Data  Pipeline •  Stages o  Collect o  Store o  Transform & Analyze o  Model & Reason o  Predict, Recommend & Visualize•  Different systems have different characteristics o  Infrastructure optimization based in application/hardware attributes correlation (short term) •  Hadoop, Splunk, internal Dashboard o  Application performance trends (medium term) •  Analytics, Modeling,… o  Product Metrics •  Feature set vs. usage, what is important to users, stratification •  Modeling using R, Visualization layers like Tableau
  • 22. Big Data Pipeline Ref:h;p:goo.gl/Mm83k Infer-ability Model Internal   dashboards Hand   ,  Tableau   Context coded     Programs,   Connectedness R,  Mahout,   …   SQL,       Variety BI  Tools,   Hadoop,   Pig,   Variability SQL   Hive,     .NET   NOSQL,   Logs,   Dryad,   Velocity Scribe,   HDFS,   XML,   Various   Flume,   other   <iles,  …   Volume Hadoop   tools   …   Decomplexify! Contextualize! Network! Reason! Infer!
  • 23. Build to Fail - “It is working” is not binary The  NOSQL  ! I AM monarch of all I survey; My right there is none to dispute; From the centre all round to the sea I am lord of the fowl and the brute - Cowper, The Solitude Of Alexander SelKirk
  • 24. Agenda•  Opening Gambit –  NOSQL  :  Toil,  Tears  &  Sweat  !  •  The Pragmas –  ABCs  of  NOSQL  [ACID,  BASE  &  CAP]  •  The Mechanics –  Algorithmics  &  Mechanisms  (For  reference)  Referenced Links @ http://doubleclix.wordpress.com/2010/06/20/nosql-talk-references/
  • 25. What is NOSQL Anyway ?•  NOSQL    !=  NoSQL  or  NOSQL  !=  (!SQL)  •  NOSQL  =  Not  Only  SQL  •  Can  be  traced  back  to  Eric  Evans[2]!   –  You  can  ask  him  during  the  ayernoon  session!  •  Unfortunate  Name,  but  is  stuck  now  •  Non  RelaXonal  could  have  been  beIer  •  Usually  OperaXonal,  Definitely  Distributed  •  NOSQL  has  certain  semanXcs  –  need  not  stay  that  way  
  • 26. NOSQL   Key  Value   Column   Document   Graph   In-­‐memory   SimpleDB   CouchDB   Neo4j   Memcached   Google   MongoDB   FlockDB   BigTable   Disk  Based   HBase   Lotus  Domino   InfiniteGraph   Redis   Cassandra   Riak  Tokyo  Cabinet   Dynamo   HyperTable   Voldemort   Azure  TS   Ref:  [22,51,52]  
  • 27. When I think of my own native land, In a moment I seem to be there; But, alas! recollection at hand Soon hurries me back to despair. - Cowper, The Solitude Of Alexander SelKirkNOSQL Tales from the fieldWHAT WORKS
  • 28. •  Designer Augmenting RDBMS with a Distributed key Value Store[40 : A good talk by Geir]•  Invitation only designer brand sales•  Limited inventory sales – start at 12:00, members have 10 min to grab them. 500K mails every day•  Keeps brand value, hidden from search•  Interesting load properties•  Each item a row in DB-BUY NOW reserves it –  Cant order more•  Started out as a Rails app –  shared nothing•  Narrow peaks – half of revenue
  • 29. Christian Louboutin Effect•  ½ amz for Louboutin•  Use Voldemort•  Inventory, Shopping Cart, Checkout•  Partition by prod ID•  Shared infrastructure – “fog” not “cloud’ - Joyent!•  In-memory inventory•  Not afraid of sale anymore! And SQL DBs are still relevant !
  • 30. Typical NOSQL Example Bit.ly•  Bit,ly URL shortening service, uses MongoDB•  User, title, URL, hash, labels[I-5], sort by time•  Scale – ~50M users, ~10K concurrent, ~1.25B shortens per month•  Criteria: –  Simple, Zippy FAST, Very Flexible, Reasonable Durability, Low cost of ownership•  Sharded by userid
  • 31. •  New kind of “dictionary” a word repository, GPS for English – context, pronunciations, twitter … developer API•  Characteristics[I-6,Tony Tam’s presentation] –  RO-centric, 10,000 reads for every write –  Hit a wall with MySQL (4B rows) –  MongoDB read was so good that memcached layer was not required –  MongoDB used 4 times MySQL storage•  Another example : –  Voldemort – Unified Communications, IP-Phone data stored keyed off of phone number. Data relatively stable
  • 32. Large Hadron Collider@CERN•  DAS is part of giant data management enterprise (cms) –  Polygot Persistence (SQL + NOSQL, Mongo, Couch, memcache, HDFS, Luster, Oracle, mySQL, …)•  Data Aggregation System [I-1,I-2,I-3,I-4] –  Uses MongoDB –  Distributed Model, 2-6 pb data –  Combine info. from different metadata sources, query without knowing their existence, user has domain knowledge – but shouldn’t deal with various formats, interfaces and query semantics –  DAS aggregates, caches and presents data as JSON documents – preserving security & integrity And SQL DBs are still relevant !
  • 33. Scaling Twitter• 
  • 34. •  Digg –  RDBMS places burden on reads than writes[I-8] –  Looked at NOSQL, selected Cassandra •  Colum oriented, so more structure than key-value•  Heard from noSQL Boston[http://twitter.com/ #search?q=%23nosqllive] –  Baidu: 120 node HyperTable cluster managing 600TB of data –  StumbleUpon uses HBase for Analytics –  Twitter’s Current Cassandra cluster: 45 nodes
  • 35. •  Adob is a HBase shop •  BBC is a CouchDB shop [I-10,I-11,2] [I-13]•  Adobe SaaS Infrastructure – •  Sweet spot: tagging, content aggregation, •  Multi-master, multi search, storage and so forth datacenter replication•  Dynamic schema & huge number of records[I-5]•  40 million records in 2008 to 1 billion with 50 ms response •  Interactive Mediums•  NOSQL not mature in 2008, •  Old data to CouchDB now good enough •  Thus free up DB to do•  Prod Analytics:40 nodes, work! largest has 100 nodes
  • 36. •  Cloudkick is a Cassandra shop[I-12]•  Cloudkick offers cloud management services•  Store metrics data•  Linear scalability for write load•  Massive write performance •  Memory table & serial commit log•  Low operational costs•  Data Structure –  Metrics, Rolled-up data, Statuses at time slice : all indexed by timestamp
  • 37. •  Guardian/UK –  Runs on Redis[I-14] ! –  “Long-term The Guardian is looking towards the adoption of a schema-free database to sit alongside its Oracle database and is investigating CouchDB. … the relational database is now just a component in the overall data management story, alongside data caching, data stores, search engines And SQL DBs are etc. still relevant ! –  NOSQL can increase performance of "The evil that SQL relational data by offloading specific DBs do lives after data and tasks them; the good is oft interred with their bones...",
  • 38. NOSQL at Netflix•  Netflix is fully in the cloud•  Uses NOSQL across the globe•  Customer Profiles, watchlog, usage logging (see next slide) –  No multi-record locking•  No DBA !•  Easier Schema Changes•  Less complex, Highly Available data store•  Joins happen in the applications http://www.hpts.ws/sessions/nosql-ecosystem.pdf http://www.hpts.ws/sessions/GlobalNetflixHPTS.pdf
  • 39. 21 NOSQL Themes•  Web  Scale  •  Scale  Incrementally/conXnuous  growth  •  Oddly  shaped  &  exponenXally  connected  •  Structure  data  as  it  will  be  used  –  i.e.  read,  query  •  Know  your  queries/updates  in  advance[96],  but  you  can  change   them  later  •  Compute  aIributes  at  run  Xme  •  Create  a  few  large  enXXes  with  opXonal  parts   –  NormalizaXon  creates  many  small  enXXes  •  Define  Schemas  in  models  (not  in  databases)  •  Avoid  impedance  mismatch  •  Narrow  down  &  solve  your  core  problem  •  Solve  the  right  problem  with  the  right  tool   Ref:  [I-­‐8]  
  • 40. 21 NOSQL Themes•  ExisXng  soluXons  are  clunky[1]  (in  certain  situaXons)  •  Scale  automaXcally,  “becoming  prohibiXvely  costly  (in   terms  of  manpower)  to  operate”  TwiIer[I-­‐9]     •  DistribuXon  &  parXXoning  are  built-­‐in  NOSQL  •  RDBMS  distribuXon  &  sharding  not  fun  and  is  expensive   –  Lose  most  funcXonality  along  the  way  •  Data  at  the  center,  Flexible  schema,  Less  joins  •  The  value  of  NOSQL  is  in  flexibility  as  much  as  it  is  in  “Big   Data”  
  • 41. 21 NOSQL Themes•  Requirements[3]   –  Data  will  not  fit  in  one  node   •  And  so  need  data  parXXon/distribuXon  by  the  system   –  Nodes  will  fail,  but  data  needs  to  be  safe  –  replicaXon!   –  Low  latency  for  real-­‐Xme  use  •  Data  Locality   –  Row  based  structures  will  need  to  read  whole  row,   even  for  a  column   –  Column  based  structures  need  to  scan  for  each  row  •  SoluXon  :  Column  storage  with  Locality     –  Keep  data  that  is  read  together,  don’t  read  what  you   don’t  care   •  For  example  friends  –  other  data   Ref:  3  
  • 42. ABCs of NOSQL - ACID, BASE & CAPThe woods are lovely, dark, and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep. -Frost
  • 43. CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37] Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • 44. CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]  C-­‐A  No  P  →  Single  DB  server,  no  network  par::on   Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • 45. CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]   C-­‐P  No  A  →  Block   transac:on  in   case  of  par::on   failure   Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • 46. CAP Principle Interesting (& controversial) from“CAP  Principle  →     NOSQL perspective  Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]   A-­‐P  No  C  →   Expira:on  based   caching,  vo:ng   majority   Availability Partition
  • 47. ABCs  of  NOSQL  •  ACID   o  Atomicity,  Consistency,  IsolaXon  &  Durability  –   fundamental  properXes  of  SQL  DBMS  •  BASE[35,39]   o  Basically  Available  Soy  state(Scalable)   Eventually  Consistent    •  CAP[36,39]   o  Consistency,  Availability  &  ParXXoning   o  This  C  is  ~A+C   •  i.e.  Atomic  Consistency[36]  
  • 48. ACID  •  Atomicity   o  All  or  nothing  •  Consistent   o  From  one  consistent  state  to  another   •  e.g.  ReferenXal  Integrity   o  But  it  is  also  applicaXon  dependent  on     •  e.g.  min  account  balance   •  Predicates,  invariants,…  •  IsolaXon  •  Durability  
  • 49. CAP  Pragmas  •  PrecondiXons   o  The  domain  is  scalable  web  apps   o  Low  Latency  For  real  Xme  use   o  A  small  sub-­‐set  of  SQL  FuncXonality   o  Horizontal  Scaling  •  PritcheI[35]  talks  about  relaxing  consistency   across  funcXonal  groups  than  within  funcXonal   groups  •  Idempotency  to  consider   o  Updates  inc/dec  are  rarely  idempotent   o  Order  preserving  trx  are  not  idempotent  either   o  MVCC  is  an  answer  for  this  (CouchDB)  
  • 50. Consistency  •  Strict  Consistency   o Any  read  on  Data  X  will  return  the  most   recent  write  on  X[42]  •  SequenXal  Consistency   o Maintains  sequenXal  order  from   mulXple  processes  (No  menXon  of  Xme)  •  Linearizability   o Add  Xmestamp  from  loosely   synchronized  processes  
  • 51. Consistency  •  Write  availability,  not  read  availability[44]  •  Even  load  distribuXon  is  easier  in   eventually  consistent  systems  •  MulX-­‐data  center  support  is  easier  in   eventually  consistent  systems  •  Some  problems  are  not  solvable  with   eventually  consistent  systems  •  Code  is  someXmes  simpler  to  write  in   strongly  consistent  systems  
  • 52. CAP  EssenXals  –  1  of  3  •  “CAP  Principle  →  Strong  Consistency,  High   Availability,  ParXXon-­‐resilience:  Pick  at   most  2”[37]   o  C-­‐A  No  P  →  Single  DB  server,  no  network   parXXon   o  C-­‐P  No  A  →  Block  transacXon  in  case  of   parXXon  failure   o  A-­‐P  No  C  →  ExpiraXon  based  caching,  voXng   majority  •  Which  feature  to  discard  depends  on  the   nature  of  your  system[41]  
  • 53. CAP  EssenXals  –  2  of  3  •  Yield  vs.  Harvest[37]   o  Yield  →  Probability  of  compleXng  a  request   o  Harvest  →  FracXon  of  data  reflected  in  the   response  •  Some  systems  tolerate  <  100%  harvest  (e.g   search  i.e.  approximate  answers  OK)   others  need  100%  harvest  (e.g.  Trx  i.e.   correct  behavior  =  single  well  defined   response)  •  For  sub-­‐systems  that  tolerate  harvest   degradaXon,  CAP  makes  sense      
  • 54. CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in   appropriate  sub-­‐systems  that  has  state   management  and  data  semanXcs  that  match  the   opera<onal  feature  &  impedance   o  Hence  NotOnly  SQL  not  No  SQL   o  Intelligent  homing  to  tolerate  parXXon  failures[44]   o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)   o  TwiIer  tweets  in  Cassandra  &  MySQL   o  BBC  using  MongoDB  for  offloading  DBMS   o  Polygot  persistence  at  LHC@CERN  
  • 55. CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in   appropriate  sub-­‐systems  that  has  state   management  and  data  semanXcs  that  match  the   opera<onal  feature  &  impedance   o  Hence  NotOnly  SQL  not  No  SQL   o  Intelligent  homing  to  tolerate  parXXon  failures[44]   o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)   o  TwiIer  tweets  in  Cassandra  and  MySQL   Most important o  BBC  using  MongoDB  for  offloading  DBMS   point in the whole o  Polygot  persistence  at  LHC@CERN   presentation
  • 56. Eventual  Consistency  &  AMZ  •  DistribuXon  Transparency[38]  •  Larger  distributed  systems,  network   parXXons  are  given  •  Consistency  Models   o  Strong   o  Weak   •  Has  an  inconsistency  window  before  update  and   guaranteed    view   o  Eventual   •  If  no  new  updates,  all  will  see  the  value,  eventually  
  • 57. Eventual  Consistency  &  AMZ  •  Guarantee  variaXons[38]   o Read-­‐Your-­‐writes   o Session  consistency   o Monotonic  Read  consistency   •  Access  will  not  return  previous  value   o Monotonic  Write  consistency   •  Serialize  write  by  the  same  process  •  Guarantee  order  (vector  clocks,   mvcc)   o  Example  :  Amz  Cart  merger  (let  cart  add  even  with  parXal   failure)  
  • 58. Eventual  Consistency  &  AMZ  -­‐  SimpleDB  •  SimpleDB  strong  consistency   semanXcs  [49,50]     o UnXl  Feb  2010,  SimpleDB  only   supported  eventual  consistency  i.e.   GetAIributes  ayer  PutAIributes  might   not  be  the  same  for  some  Xme  (1   second)   o On  Feb  24,  AWS  Added   ConsistentRead=True  aIribute  for  read   o Read  will  reflect  all  writes  that  got   200OK  Xll  that  Xme!  
  • 59. Eventual  Consistency  &  AMZ  -­‐  SimpleDB  •  SimpleDB  strong  consistency   semanXcs  [49,50]     o Also  added  condiXonal  put/delete   o Put  aIribute  has  a  specified  value   (Expected.1.Value=)  or  (Expected. 1.Exists  =  true/false)   o Same  condiXonal  check  capability  for   delete  also   o   Only  on  one  aIribute  !  
  • 60. Eventual  Consistency  &  AMZ  –  S3  •  S3  is  an  eventual  consistency  system   o Versioning   o “S3  PUT  &  COPY  synchronously  store   data  across  mulXple  faciliXes  before   returning  SUCCESS”   o Repair  Lost  redundancy,  repair  bit-­‐rot   o Reduced  Redundancy  opXon  for  data   that  can  be  reproduced   (99.999999999%    vs.  99.99%)     •  Approx  1/3rd  less   o CloudFront  for  caching  
  • 61. !SQL  ?  •  “We  conclude  that  the  current  RDBMS  code  lines,  while   aIempXng  to  be  a  “one  size  fits  all”  soluXon,  in  fact,  excel  at   nothing.  Hence,  they  are  25  year  old  legacy  code  lines  that   should  be  reXred  in  favor  of  a  collecXon  of  “from  scratch”   specialized  engines.”[43]  •  “Current  systems  were  built  in  an  era  where  resources  were   incredibly  expensive,  and  every  compuXng  system  was   watched  over  by  a  collecXon  of  wizards  in  white  lab  coats,   responsible  for  the  care,  feeding,  tuning  and  opXmizaXon  of   the  system.  In  that  era,  computers  were  expensive  and   people  were  cheap”  •  “The  1970  -­‐  1985  period  was  a  <me  of  intense  debate,  a   myriad  of  ideas,  &  considerable  upheaval.  We  predict  the   next  fiUeen  years  will  have  the  same  feel  “  
  • 62. Further  deliberaXon  •  Daniel  Abadi[45],Mike  Stonebreaker[46],   James  Hamilton[47],  Pat  Hilland[48]  are  all   good  read  for  further  deliberaXons  
  • 63. NOSQL Internals & Algorithmics
  • 64. Caveats  •  A  representaXve  subset  of  the  mechanics  and   mechanisms  used  in  the  NOSQL  world  •  Being  refined  &  newer  ones  are  being  tried  •  At  a  system  level  –  to  show  how  the  techniques   play  a  part  to  deliver  a  capability  •  The  NOSQL  Papers  and  other  references  for   further  deliberaXon  •  Even  if  we  don’t  cover  fully,  it  is  OK.  I  want  to   introduce  some  of  the  concepts  so  that  you  get   an  appreciaXon  …  
  • 65. NOSQL  Mechanics  •  Horizontal  Scalability   •  Performance   –  Gossip  (Cluster   –  SStables/memtables   membership)   –  LSM  w/Bloom  Filter   –  Failure  DetecXon   •  Integrity/Version   –  Consistent  Hashing   reconcilia<on   –  ReplicaXon   –  Timestamps   Techniques   –  Vector  Clocks   •  Hinted  Handoff   •  Merkle  Trees   –  MVCC   –  Sharding  MongoDB   –  SemanXc  vs.  syntacXc   reconciliaXon   –  Regions  in  HBase    
  • 66. Consistent  Hashing  •  Origin:  web  caching  “To  decrease  ‘hot   spots’  •  Three  goals[87]   –  Smooth  evoluXon   •  When  a  new  machine  joins,  minimum  rebalance   work  and  impact   –  Spread   •  Objects  assigned  to  a  min  number  of  nodes   –  Load   •  #  of  disXnct  objects  assigned  to  a  node  is  small  
  • 67. Consistent  Hashing  •  Hash  Keyspace/Token  is  divided  into  parXXons/ranges  •  Cassandra  –  choice     –  OrderPreserving  parXXoner  –  key  =  token  (for  range  queries)   –  Also  saw  a  CollaXngOrderPreservingParXXoner  •  ParXXons  assigned  to  nodes  that  are  logically  arranged  in  a  circle   topology  •  Amz  (dynamo)  –  assign  sets  of   (random)  mulXple  points  to   different  machines  depending  on   load  •  Cassandra  –  monitor  load  &   distribute  •  Specific  join  &  leave  protocols  •  ReplicaXon  –  next  3  consecuXve  •  Cassandra  –  Rack-­‐aware,   Datacenter-­‐aware  
  • 68. Consistent  Hashing  -­‐  Hinted-­‐handoff  •  What  happens  when  a  node  is  not  available  ?   –  May  be  under  load   –  May  be  network  parXXon  •  Sloppy  Quorum  &  Hinted-­‐handoff  •  R/W  performed  on  the  1st  n  healthy  nodes  •  Replica  sent  to  a  host  node  with  hint  in   metadata  &  then  transferred  when  the  actual   node  is  up  •  Burdens  neighboring  nodes  •  Cassandra  0.6.2  default  is  disabled  (I  think)  
  • 69. Consistent  Hashing  -­‐  ReplicaXon  •  What  happens  when  a  new  node   joins  ?   – It  gets  one  or  more  parXXons   – Dynamo  :  Copy  the  whole  parXXon   – Cassandra  :  Replicate  keyset   – Cassandra  :  working  on  a  bit  torrent   type  protocol  to  copy  from  replicas  
  • 70. AnX-­‐entropy  •  Merge  and  reconciliaXon  operaXons   –  Operate  on  two  states  and  return  a  new  state[86]  •  Merkle  Trees   –  Dynamo  use  of  Merkle  trees  to  detect   inconsistencies  between  replicas   –  AnXEntropy  in  Cassandra  exchanges  Merkle  trees   and  if  they  disagree,  range  repair  via  compacXon [91,92]   –  Cassandra  uses  the  ScuIlebuI  ReconciliaXon[86]  
  • 71. Gossip  •  Membership  &  Failure  detecXon  •  Based  on  emergence  without  rigidity  –   pulse  coupled  oscillators,  biological   systems  like  fireflies  ![90]  •  Also  used  for  state  propagaXon   –  Used  in  Dynamo/Cassandra  
  • 72. Gossip  •  Cassandra  exchanges  heartbeat  state,  applicaXon  state   and  so  forth  •  Every  second,  random  live  node,  random  unreachable   node  and  exchanges  key-­‐value  structures  •  Some  nodes  play  the  part  of  seeds  •  Seed  /iniXal  contact  points  in  staXc  conf  file   storage.conf  file  •  Could  also  come  from  a  configuraXon  service  like   zookeeper  •  To  guard  against  node  flap,  explicit  membership  join  and   leave  –  now  you  know  why  hinted  handoff  was  added    
  • 73. Membership  &  Failure  detecXon  •  Consensus  &  Atomic  Broadcast    -­‐  impossible  to   solve  in  a  distributed  system[88,89]   –  Cannot  differenXate  between  an  slow  system  and  a   crashed  system    •  Completeness   –  Every  system  that  crashed  will  be  eventually   detected  •  Correctness   –  A  correct  process  is  never  suspected  •  In  short,  if  you  are  dead  somebody  will  no<ce  it   and  if  you  are  alive,  nobody  will  mistake  you  for   dead  !  
  • 74. Ø  Accrual  Failure  Detector  •  Not    Boolean  value  but  a  probabilisXc  number  that  “accrues”  over   an  exponenXal  scale  •  Captures  the  degree  of  confidence  that  a  corresponding  monitored   process  has  crashed[94]   –  Suspicion  Level   –  Ø  =  1  -­‐>  prob(error)  10%   –  Ø  =  2  -­‐>  prob(error)  1%   –  Ø  =  3  -­‐>  prob(error)  0.1%  •  If  process  is  dead,     –  Ø  is  monotonically  increasing  &  Ø→α  as  t  →α  •  If  process  is  alive  and  kicking,  Ø=0  •  Account  for  lost  messages,  network  latency  and  actual  crash  of   system/process  •  Well  known  heartbeat  period  Δi,  then  network  latency  Δtr  can  be   tracked  by  inter-­‐arrival  Xme  modeling  
  • 75. Write/Read  Mechanisms  •  Read  &  Write  to  a  random  node   (StorageProxy)  •  Proxy  coordinates  the  read  and  write   strategy  (R/W  =  any,  quorum  et  al)  •  Memtables/SSTables  from  big  table  •  Bloom  Filter/Index  •  LSM  Trees  
  • 76. Hbase – WAL, Node Write Memstore, HDFS File system Commit Logs Node M e m o MemTable r y Read Flushing Index Index Index D i BF BF BF s kSSTable• Immutable• Compaction• Maintain Index & Bloom Filter
  • 77. How…  does  HBase  work  again?   http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html http://hbaseblog.com/2010/07/04/hug11-hbase-0-90-preview-wrap-up/
  • 78. Bloom  Filter  •  The  BloomFilter  answers  the  quesXon    •  “Might  there  be  data  for  this  key  in  this   SSTable?”  [Ref:  Cassandra/Hbase  mailer]   –  “Maybe"  or   –   “Definitely  not“   –  When  the  BloomFilter  says  "maybe"  we  have  to  go  to   disk  to  check  out  the  content  of  the  SSTable  •  Depends  on  implementaXon   –  Redone  in  Cassandra   –  Hbase  0.20.x  removed,  will  be  back  in  0.90  with  a   “jazzy”  implementaXon  
  • 79. Was it a vision, or a waking dream?Fled is that music:—do I wake or sleep? -Keats, Ode to a Nightingale
  • 80. •  http://www.readwriteweb.com/enterprise/2011/11/infographic-data- deluge---8-ze.php•  http://www.crn.com/news/data-center/232200061/efficiency-or- bust-data-centers-drive-for-low-power-solutions-prompts-channel- growth.htm•  http://www.quantumforest.com/2011/11/do-we-need-to-deal-with- big-data-in-r/•  http://www.forbes.com/special-report/2011/migration.html•  http://www.mercurynews.com/bay-area-news/ci_19368103•  http://www.businessinsider.com/apple-new-data-center-north- carolina-created-50-jobs-2011-11