The Art of Big Data
Upcoming SlideShare
Loading in...5
×
 

The Art of Big Data

on

  • 7,330 views

Slides for my talk at the Naval Post Graduate SChool PhD Seminar

Slides for my talk at the Naval Post Graduate SChool PhD Seminar

Statistics

Views

Total Views
7,330
Views on SlideShare
7,305
Embed Views
25

Actions

Likes
10
Downloads
318
Comments
0

3 Embeds 25

http://paper.li 23
http://www.linkedin.com 1
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Art of Big Data The Art of Big Data Presentation Transcript

  • The road lies plain before me;--tis a theme Single and of determined bounds; … - Wordsworth, The Prelude m pre ss.co . word ol bl eclix te Scho p:/ /dou Gr adua 1 ka r, htt val Post 2 9,201 n a San r, Na Nov Krish in a st Sem hD Gue 00–PEC40
  • What is Big Data ? Big Data to smart data Bigo  Agenda Data o  To cover the broad Pipeline picture o  Understand the waypoints & o  Drill down into one area (NOSQL) Analytics/ Modeling Analytic Storage - R Algorithms NOSQL o  Can do others later … Processing -o  Of the Big Data Visualization Hadoop … domain …
  • Thanks to …The giants whose shoulders I am standing on Special  Thanks  to:        Peter  Ateshian,  NPS        Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly        Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  
  • When I think of my own native land, In a moment I seem to be there; But, alas! recollection at hand Soon hurries me back to despair.- Cowper, The Solitude Of Alexander SelKirk
  • What is Big Data ?“Big data” is data “Big data” is less that becomes large about size, more enough that it about flow & velocity cannot be processed - persisting using conventional petabytes per year is methods. @twitter easier than processing terabytes per hour. @twitter Ref:  hIp://radar.oreilly.com/2010/09/the-­‐smaq-­‐stack-­‐for-­‐big-­‐data.html  
  • What is Big Data ? Vinod Khosla’s Cool Dozen!   Consumers : “Widespread innovation in technologies that reduce data overload for users” ~ Data Reduction   Businesses : “Simple solutions to handle the deluge of data generated from various sources …” ~ Big Data Analytics TV  2.0,  EducaXon,  Social  NEXT,Tools  for  sharing  inteerst,Publishing,…   Ref:  hIp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaXons/156307/0/  hIp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/  
  • EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • EBC322    Volume o  Scale    Velocity o  Data  change  rate  vs.  decision  window    Variety o  Different  sources  &  formats  o  Structured  vs.  Unstructured    Variability o  Breadth  of  interpreta<on  &  o  Depth  of  analy<cs    Contextual o  Dynamic  variability  o  RecommendaXon    Connectedness hIp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hIp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • I.  Two  Main  Types  –  based  on  collecXon   i.  Big  Data  Streams   o  Data  in  “moXon”   o  TwiIer  fire  hose,  Facebook,  G+     ii.  Big  Data  Logs   o  Data  “at  rest”   o  Logs,  DW,  external  market  data,  POS,  …  II.  Typically,  Big  Data  has  a  non-­‐determinisXc  angle  as  well  …   o  CreaXve  Discovery   o  IteraXve,  Model  based  AnalyXcs   o  Explore  quesXons  to  ask  III.  Smart  Data  =  Big  Data  +  context  +  embedded/interacXve  (inference,   reasoning)  models   o  Model  Driven   o  DeclaraXvely  InteracXve   hIp://www.slideshare.net/leonsp/hadoop-­‐slides-­‐11-­‐what-­‐is-­‐big-­‐data   hIp://www.slideshare.net/Dataversity/wed-­‐1550-­‐bacvanskivladimircolor  
  • AWS – 600 Billion objects!Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire hose for social network analytics ? Zynga §  “Analytics company, not a gaming company!” §  Harvests data : 15 TB/day Storage §  Test new features §  4 U box = 40 TB, §  Target advertising 1 PB = 25 boxes ! §  §  230 million players/month hIp://goo.gl/dcBsQ  
  • •  6  Billion  Messages  per   day  •  2  PB  (w/compression)   online  •  6  PB  w/  replicaXon  •  250  TB/Month  growth  •  HBase  Infrastructure  
  • 50  TB/Day   Very  systemaXc   240  nodes,  84  PB   Diagram  speaks  volumes!  Path  Analysis   Teradata  InstallaXon  A/B  TesXng   Ref:  hIp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  
  • •  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive   dileIante  …  baIling  the  efficient  human  mind  with  spectacular   flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker   •  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks   •  1  TB  of  dataset   •  200  Million  pages  processed  by  Hadoop   •  This  is  a  good  example  of  Connected  data   –  Contextual  w/  variability   –  Breath  of  interpretaXon   –  AnalyXcs  depth  hIp://doubleclix.wordpress.com/2011/03/01/the-­‐educaXon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy%E2%80%9D-­‐by-­‐stephen-­‐baker/  hIp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  
  • Warehouse-­‐style   ApplicaXons   Block  Store   Distributed   Big Data ApplicaXons   Storage   Object  Store   NOSQL   AnalyXcs   Parallelism   Map/Reduce   Web   HPC  AnalyXcs   Cloud   Architecture   Social  Media   Log   Inference   AnalyXcs   Social     RecommendaXon/ Graph   Inference  Engines   Machine   Knowledge   Search,   Learning   Mahout   Graph   Indexing   ClassificaXon,  Clustering  
  • “A towel is about the most massively useful thing an interstellar hitchhiker can have … any man who can hitch the length and breadth of the Galaxy, rough it … win through, and still know where his towel is, is clearly a man to be reckoned with.” - From The Hitchhikers Guide to the Galaxy, by Douglas Adams. Published by Harmony Books in 1979Big  Data  to  Smart  Data
  • Don’t  throw  away   1 any  data  ! Big  data  to  smart  data Be  ready  for  different   2 ways  of  organizing   the  data •  summary h;p://goo.gl/fGw7r
  • Big  Data  Pipeline If a problem has no solution, it is not a problem, but a fact, not to be solved but to be coped with, over time … - Peres’s Law
  • Big  Data  Pipeline •  Stages o  Collect o  Store o  Transform & Analyze o  Model & Reason o  Predict, Recommend & Visualize•  Different systems have different characteristics o  Infrastructure optimization based in application/hardware attributes correlation (short term) •  Hadoop, Splunk, internal Dashboard o  Application performance trends (medium term) •  Analytics, Modeling,… o  Product Metrics •  Feature set vs. usage, what is important to users, stratification •  Modeling using R, Visualization layers like Tableau
  • Big Data Pipeline Ref:h;p:goo.gl/Mm83k Infer-ability Model Internal   dashboards Hand   ,  Tableau   Context coded     Programs,   Connectedness R,  Mahout,   …   SQL,       Variety BI  Tools,   Hadoop,   Pig,   Variability SQL   Hive,     .NET   NOSQL,   Logs,   Dryad,   Velocity Scribe,   HDFS,   XML,   Various   Flume,   other   <iles,  …   Volume Hadoop   tools   …   Decomplexify! Contextualize! Network! Reason! Infer!
  • Build to Fail - “It is working” is not binary The  NOSQL  ! I AM monarch of all I survey; My right there is none to dispute; From the centre all round to the sea I am lord of the fowl and the brute - Cowper, The Solitude Of Alexander SelKirk
  • Agenda•  Opening Gambit –  NOSQL  :  Toil,  Tears  &  Sweat  !  •  The Pragmas –  ABCs  of  NOSQL  [ACID,  BASE  &  CAP]  •  The Mechanics –  Algorithmics  &  Mechanisms  (For  reference)  Referenced Links @ http://doubleclix.wordpress.com/2010/06/20/nosql-talk-references/
  • What is NOSQL Anyway ?•  NOSQL    !=  NoSQL  or  NOSQL  !=  (!SQL)  •  NOSQL  =  Not  Only  SQL  •  Can  be  traced  back  to  Eric  Evans[2]!   –  You  can  ask  him  during  the  ayernoon  session!  •  Unfortunate  Name,  but  is  stuck  now  •  Non  RelaXonal  could  have  been  beIer  •  Usually  OperaXonal,  Definitely  Distributed  •  NOSQL  has  certain  semanXcs  –  need  not  stay  that  way  
  • NOSQL   Key  Value   Column   Document   Graph   In-­‐memory   SimpleDB   CouchDB   Neo4j   Memcached   Google   MongoDB   FlockDB   BigTable   Disk  Based   HBase   Lotus  Domino   InfiniteGraph   Redis   Cassandra   Riak  Tokyo  Cabinet   Dynamo   HyperTable   Voldemort   Azure  TS   Ref:  [22,51,52]  
  • When I think of my own native land, In a moment I seem to be there; But, alas! recollection at hand Soon hurries me back to despair. - Cowper, The Solitude Of Alexander SelKirkNOSQL Tales from the fieldWHAT WORKS
  • •  Designer Augmenting RDBMS with a Distributed key Value Store[40 : A good talk by Geir]•  Invitation only designer brand sales•  Limited inventory sales – start at 12:00, members have 10 min to grab them. 500K mails every day•  Keeps brand value, hidden from search•  Interesting load properties•  Each item a row in DB-BUY NOW reserves it –  Cant order more•  Started out as a Rails app –  shared nothing•  Narrow peaks – half of revenue
  • Christian Louboutin Effect•  ½ amz for Louboutin•  Use Voldemort•  Inventory, Shopping Cart, Checkout•  Partition by prod ID•  Shared infrastructure – “fog” not “cloud’ - Joyent!•  In-memory inventory•  Not afraid of sale anymore! And SQL DBs are still relevant !
  • Typical NOSQL Example Bit.ly•  Bit,ly URL shortening service, uses MongoDB•  User, title, URL, hash, labels[I-5], sort by time•  Scale – ~50M users, ~10K concurrent, ~1.25B shortens per month•  Criteria: –  Simple, Zippy FAST, Very Flexible, Reasonable Durability, Low cost of ownership•  Sharded by userid
  • •  New kind of “dictionary” a word repository, GPS for English – context, pronunciations, twitter … developer API•  Characteristics[I-6,Tony Tam’s presentation] –  RO-centric, 10,000 reads for every write –  Hit a wall with MySQL (4B rows) –  MongoDB read was so good that memcached layer was not required –  MongoDB used 4 times MySQL storage•  Another example : –  Voldemort – Unified Communications, IP-Phone data stored keyed off of phone number. Data relatively stable
  • Large Hadron Collider@CERN•  DAS is part of giant data management enterprise (cms) –  Polygot Persistence (SQL + NOSQL, Mongo, Couch, memcache, HDFS, Luster, Oracle, mySQL, …)•  Data Aggregation System [I-1,I-2,I-3,I-4] –  Uses MongoDB –  Distributed Model, 2-6 pb data –  Combine info. from different metadata sources, query without knowing their existence, user has domain knowledge – but shouldn’t deal with various formats, interfaces and query semantics –  DAS aggregates, caches and presents data as JSON documents – preserving security & integrity And SQL DBs are still relevant !
  • Scaling Twitter• 
  • •  Digg –  RDBMS places burden on reads than writes[I-8] –  Looked at NOSQL, selected Cassandra •  Colum oriented, so more structure than key-value•  Heard from noSQL Boston[http://twitter.com/ #search?q=%23nosqllive] –  Baidu: 120 node HyperTable cluster managing 600TB of data –  StumbleUpon uses HBase for Analytics –  Twitter’s Current Cassandra cluster: 45 nodes
  • •  Adob is a HBase shop •  BBC is a CouchDB shop [I-10,I-11,2] [I-13]•  Adobe SaaS Infrastructure – •  Sweet spot: tagging, content aggregation, •  Multi-master, multi search, storage and so forth datacenter replication•  Dynamic schema & huge number of records[I-5]•  40 million records in 2008 to 1 billion with 50 ms response •  Interactive Mediums•  NOSQL not mature in 2008, •  Old data to CouchDB now good enough •  Thus free up DB to do•  Prod Analytics:40 nodes, work! largest has 100 nodes
  • •  Cloudkick is a Cassandra shop[I-12]•  Cloudkick offers cloud management services•  Store metrics data•  Linear scalability for write load•  Massive write performance •  Memory table & serial commit log•  Low operational costs•  Data Structure –  Metrics, Rolled-up data, Statuses at time slice : all indexed by timestamp
  • •  Guardian/UK –  Runs on Redis[I-14] ! –  “Long-term The Guardian is looking towards the adoption of a schema-free database to sit alongside its Oracle database and is investigating CouchDB. … the relational database is now just a component in the overall data management story, alongside data caching, data stores, search engines And SQL DBs are etc. still relevant ! –  NOSQL can increase performance of "The evil that SQL relational data by offloading specific DBs do lives after data and tasks them; the good is oft interred with their bones...",
  • NOSQL at Netflix•  Netflix is fully in the cloud•  Uses NOSQL across the globe•  Customer Profiles, watchlog, usage logging (see next slide) –  No multi-record locking•  No DBA !•  Easier Schema Changes•  Less complex, Highly Available data store•  Joins happen in the applications http://www.hpts.ws/sessions/nosql-ecosystem.pdf http://www.hpts.ws/sessions/GlobalNetflixHPTS.pdf
  • 21 NOSQL Themes•  Web  Scale  •  Scale  Incrementally/conXnuous  growth  •  Oddly  shaped  &  exponenXally  connected  •  Structure  data  as  it  will  be  used  –  i.e.  read,  query  •  Know  your  queries/updates  in  advance[96],  but  you  can  change   them  later  •  Compute  aIributes  at  run  Xme  •  Create  a  few  large  enXXes  with  opXonal  parts   –  NormalizaXon  creates  many  small  enXXes  •  Define  Schemas  in  models  (not  in  databases)  •  Avoid  impedance  mismatch  •  Narrow  down  &  solve  your  core  problem  •  Solve  the  right  problem  with  the  right  tool   Ref:  [I-­‐8]  
  • 21 NOSQL Themes•  ExisXng  soluXons  are  clunky[1]  (in  certain  situaXons)  •  Scale  automaXcally,  “becoming  prohibiXvely  costly  (in   terms  of  manpower)  to  operate”  TwiIer[I-­‐9]     •  DistribuXon  &  parXXoning  are  built-­‐in  NOSQL  •  RDBMS  distribuXon  &  sharding  not  fun  and  is  expensive   –  Lose  most  funcXonality  along  the  way  •  Data  at  the  center,  Flexible  schema,  Less  joins  •  The  value  of  NOSQL  is  in  flexibility  as  much  as  it  is  in  “Big   Data”  
  • 21 NOSQL Themes•  Requirements[3]   –  Data  will  not  fit  in  one  node   •  And  so  need  data  parXXon/distribuXon  by  the  system   –  Nodes  will  fail,  but  data  needs  to  be  safe  –  replicaXon!   –  Low  latency  for  real-­‐Xme  use  •  Data  Locality   –  Row  based  structures  will  need  to  read  whole  row,   even  for  a  column   –  Column  based  structures  need  to  scan  for  each  row  •  SoluXon  :  Column  storage  with  Locality     –  Keep  data  that  is  read  together,  don’t  read  what  you   don’t  care   •  For  example  friends  –  other  data   Ref:  3  
  • ABCs of NOSQL - ACID, BASE & CAPThe woods are lovely, dark, and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep. -Frost
  • CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37] Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]  C-­‐A  No  P  →  Single  DB  server,  no  network  par::on   Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • CAP Principle“CAP  Principle  →      Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]   C-­‐P  No  A  →  Block   transac:on  in   case  of  par::on   failure   Availability Partition Which  feature  to  discard  depends  on  the  nature  of  your  system[41]  
  • CAP Principle Interesting (& controversial) from“CAP  Principle  →     NOSQL perspective  Strong  Consistency,      High  Availability,     Consistency  Par::on-­‐resilience:    Pick  at  most  2”[37]   A-­‐P  No  C  →   Expira:on  based   caching,  vo:ng   majority   Availability Partition
  • ABCs  of  NOSQL  •  ACID   o  Atomicity,  Consistency,  IsolaXon  &  Durability  –   fundamental  properXes  of  SQL  DBMS  •  BASE[35,39]   o  Basically  Available  Soy  state(Scalable)   Eventually  Consistent    •  CAP[36,39]   o  Consistency,  Availability  &  ParXXoning   o  This  C  is  ~A+C   •  i.e.  Atomic  Consistency[36]  
  • ACID  •  Atomicity   o  All  or  nothing  •  Consistent   o  From  one  consistent  state  to  another   •  e.g.  ReferenXal  Integrity   o  But  it  is  also  applicaXon  dependent  on     •  e.g.  min  account  balance   •  Predicates,  invariants,…  •  IsolaXon  •  Durability  
  • CAP  Pragmas  •  PrecondiXons   o  The  domain  is  scalable  web  apps   o  Low  Latency  For  real  Xme  use   o  A  small  sub-­‐set  of  SQL  FuncXonality   o  Horizontal  Scaling  •  PritcheI[35]  talks  about  relaxing  consistency   across  funcXonal  groups  than  within  funcXonal   groups  •  Idempotency  to  consider   o  Updates  inc/dec  are  rarely  idempotent   o  Order  preserving  trx  are  not  idempotent  either   o  MVCC  is  an  answer  for  this  (CouchDB)  
  • Consistency  •  Strict  Consistency   o Any  read  on  Data  X  will  return  the  most   recent  write  on  X[42]  •  SequenXal  Consistency   o Maintains  sequenXal  order  from   mulXple  processes  (No  menXon  of  Xme)  •  Linearizability   o Add  Xmestamp  from  loosely   synchronized  processes  
  • Consistency  •  Write  availability,  not  read  availability[44]  •  Even  load  distribuXon  is  easier  in   eventually  consistent  systems  •  MulX-­‐data  center  support  is  easier  in   eventually  consistent  systems  •  Some  problems  are  not  solvable  with   eventually  consistent  systems  •  Code  is  someXmes  simpler  to  write  in   strongly  consistent  systems  
  • CAP  EssenXals  –  1  of  3  •  “CAP  Principle  →  Strong  Consistency,  High   Availability,  ParXXon-­‐resilience:  Pick  at   most  2”[37]   o  C-­‐A  No  P  →  Single  DB  server,  no  network   parXXon   o  C-­‐P  No  A  →  Block  transacXon  in  case  of   parXXon  failure   o  A-­‐P  No  C  →  ExpiraXon  based  caching,  voXng   majority  •  Which  feature  to  discard  depends  on  the   nature  of  your  system[41]  
  • CAP  EssenXals  –  2  of  3  •  Yield  vs.  Harvest[37]   o  Yield  →  Probability  of  compleXng  a  request   o  Harvest  →  FracXon  of  data  reflected  in  the   response  •  Some  systems  tolerate  <  100%  harvest  (e.g   search  i.e.  approximate  answers  OK)   others  need  100%  harvest  (e.g.  Trx  i.e.   correct  behavior  =  single  well  defined   response)  •  For  sub-­‐systems  that  tolerate  harvest   degradaXon,  CAP  makes  sense      
  • CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in   appropriate  sub-­‐systems  that  has  state   management  and  data  semanXcs  that  match  the   opera<onal  feature  &  impedance   o  Hence  NotOnly  SQL  not  No  SQL   o  Intelligent  homing  to  tolerate  parXXon  failures[44]   o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)   o  TwiIer  tweets  in  Cassandra  &  MySQL   o  BBC  using  MongoDB  for  offloading  DBMS   o  Polygot  persistence  at  LHC@CERN  
  • CAP  EssenXals  –  3  of  3  •  Trading  Harvest  for  yield  –  AP  •  ApplicaXon  decomposiXon  &  use  NOSQL  in   appropriate  sub-­‐systems  that  has  state   management  and  data  semanXcs  that  match  the   opera<onal  feature  &  impedance   o  Hence  NotOnly  SQL  not  No  SQL   o  Intelligent  homing  to  tolerate  parXXon  failures[44]   o  MulX  zones  in  a  region  (150  miles  -­‐  5  ms)   o  TwiIer  tweets  in  Cassandra  and  MySQL   Most important o  BBC  using  MongoDB  for  offloading  DBMS   point in the whole o  Polygot  persistence  at  LHC@CERN   presentation
  • Eventual  Consistency  &  AMZ  •  DistribuXon  Transparency[38]  •  Larger  distributed  systems,  network   parXXons  are  given  •  Consistency  Models   o  Strong   o  Weak   •  Has  an  inconsistency  window  before  update  and   guaranteed    view   o  Eventual   •  If  no  new  updates,  all  will  see  the  value,  eventually  
  • Eventual  Consistency  &  AMZ  •  Guarantee  variaXons[38]   o Read-­‐Your-­‐writes   o Session  consistency   o Monotonic  Read  consistency   •  Access  will  not  return  previous  value   o Monotonic  Write  consistency   •  Serialize  write  by  the  same  process  •  Guarantee  order  (vector  clocks,   mvcc)   o  Example  :  Amz  Cart  merger  (let  cart  add  even  with  parXal   failure)  
  • Eventual  Consistency  &  AMZ  -­‐  SimpleDB  •  SimpleDB  strong  consistency   semanXcs  [49,50]     o UnXl  Feb  2010,  SimpleDB  only   supported  eventual  consistency  i.e.   GetAIributes  ayer  PutAIributes  might   not  be  the  same  for  some  Xme  (1   second)   o On  Feb  24,  AWS  Added   ConsistentRead=True  aIribute  for  read   o Read  will  reflect  all  writes  that  got   200OK  Xll  that  Xme!  
  • Eventual  Consistency  &  AMZ  -­‐  SimpleDB  •  SimpleDB  strong  consistency   semanXcs  [49,50]     o Also  added  condiXonal  put/delete   o Put  aIribute  has  a  specified  value   (Expected.1.Value=)  or  (Expected. 1.Exists  =  true/false)   o Same  condiXonal  check  capability  for   delete  also   o   Only  on  one  aIribute  !  
  • Eventual  Consistency  &  AMZ  –  S3  •  S3  is  an  eventual  consistency  system   o Versioning   o “S3  PUT  &  COPY  synchronously  store   data  across  mulXple  faciliXes  before   returning  SUCCESS”   o Repair  Lost  redundancy,  repair  bit-­‐rot   o Reduced  Redundancy  opXon  for  data   that  can  be  reproduced   (99.999999999%    vs.  99.99%)     •  Approx  1/3rd  less   o CloudFront  for  caching  
  • !SQL  ?  •  “We  conclude  that  the  current  RDBMS  code  lines,  while   aIempXng  to  be  a  “one  size  fits  all”  soluXon,  in  fact,  excel  at   nothing.  Hence,  they  are  25  year  old  legacy  code  lines  that   should  be  reXred  in  favor  of  a  collecXon  of  “from  scratch”   specialized  engines.”[43]  •  “Current  systems  were  built  in  an  era  where  resources  were   incredibly  expensive,  and  every  compuXng  system  was   watched  over  by  a  collecXon  of  wizards  in  white  lab  coats,   responsible  for  the  care,  feeding,  tuning  and  opXmizaXon  of   the  system.  In  that  era,  computers  were  expensive  and   people  were  cheap”  •  “The  1970  -­‐  1985  period  was  a  <me  of  intense  debate,  a   myriad  of  ideas,  &  considerable  upheaval.  We  predict  the   next  fiUeen  years  will  have  the  same  feel  “  
  • Further  deliberaXon  •  Daniel  Abadi[45],Mike  Stonebreaker[46],   James  Hamilton[47],  Pat  Hilland[48]  are  all   good  read  for  further  deliberaXons  
  • NOSQL Internals & Algorithmics
  • Caveats  •  A  representaXve  subset  of  the  mechanics  and   mechanisms  used  in  the  NOSQL  world  •  Being  refined  &  newer  ones  are  being  tried  •  At  a  system  level  –  to  show  how  the  techniques   play  a  part  to  deliver  a  capability  •  The  NOSQL  Papers  and  other  references  for   further  deliberaXon  •  Even  if  we  don’t  cover  fully,  it  is  OK.  I  want  to   introduce  some  of  the  concepts  so  that  you  get   an  appreciaXon  …  
  • NOSQL  Mechanics  •  Horizontal  Scalability   •  Performance   –  Gossip  (Cluster   –  SStables/memtables   membership)   –  LSM  w/Bloom  Filter   –  Failure  DetecXon   •  Integrity/Version   –  Consistent  Hashing   reconcilia<on   –  ReplicaXon   –  Timestamps   Techniques   –  Vector  Clocks   •  Hinted  Handoff   •  Merkle  Trees   –  MVCC   –  Sharding  MongoDB   –  SemanXc  vs.  syntacXc   reconciliaXon   –  Regions  in  HBase    
  • Consistent  Hashing  •  Origin:  web  caching  “To  decrease  ‘hot   spots’  •  Three  goals[87]   –  Smooth  evoluXon   •  When  a  new  machine  joins,  minimum  rebalance   work  and  impact   –  Spread   •  Objects  assigned  to  a  min  number  of  nodes   –  Load   •  #  of  disXnct  objects  assigned  to  a  node  is  small  
  • Consistent  Hashing  •  Hash  Keyspace/Token  is  divided  into  parXXons/ranges  •  Cassandra  –  choice     –  OrderPreserving  parXXoner  –  key  =  token  (for  range  queries)   –  Also  saw  a  CollaXngOrderPreservingParXXoner  •  ParXXons  assigned  to  nodes  that  are  logically  arranged  in  a  circle   topology  •  Amz  (dynamo)  –  assign  sets  of   (random)  mulXple  points  to   different  machines  depending  on   load  •  Cassandra  –  monitor  load  &   distribute  •  Specific  join  &  leave  protocols  •  ReplicaXon  –  next  3  consecuXve  •  Cassandra  –  Rack-­‐aware,   Datacenter-­‐aware  
  • Consistent  Hashing  -­‐  Hinted-­‐handoff  •  What  happens  when  a  node  is  not  available  ?   –  May  be  under  load   –  May  be  network  parXXon  •  Sloppy  Quorum  &  Hinted-­‐handoff  •  R/W  performed  on  the  1st  n  healthy  nodes  •  Replica  sent  to  a  host  node  with  hint  in   metadata  &  then  transferred  when  the  actual   node  is  up  •  Burdens  neighboring  nodes  •  Cassandra  0.6.2  default  is  disabled  (I  think)  
  • Consistent  Hashing  -­‐  ReplicaXon  •  What  happens  when  a  new  node   joins  ?   – It  gets  one  or  more  parXXons   – Dynamo  :  Copy  the  whole  parXXon   – Cassandra  :  Replicate  keyset   – Cassandra  :  working  on  a  bit  torrent   type  protocol  to  copy  from  replicas  
  • AnX-­‐entropy  •  Merge  and  reconciliaXon  operaXons   –  Operate  on  two  states  and  return  a  new  state[86]  •  Merkle  Trees   –  Dynamo  use  of  Merkle  trees  to  detect   inconsistencies  between  replicas   –  AnXEntropy  in  Cassandra  exchanges  Merkle  trees   and  if  they  disagree,  range  repair  via  compacXon [91,92]   –  Cassandra  uses  the  ScuIlebuI  ReconciliaXon[86]  
  • Gossip  •  Membership  &  Failure  detecXon  •  Based  on  emergence  without  rigidity  –   pulse  coupled  oscillators,  biological   systems  like  fireflies  ![90]  •  Also  used  for  state  propagaXon   –  Used  in  Dynamo/Cassandra  
  • Gossip  •  Cassandra  exchanges  heartbeat  state,  applicaXon  state   and  so  forth  •  Every  second,  random  live  node,  random  unreachable   node  and  exchanges  key-­‐value  structures  •  Some  nodes  play  the  part  of  seeds  •  Seed  /iniXal  contact  points  in  staXc  conf  file   storage.conf  file  •  Could  also  come  from  a  configuraXon  service  like   zookeeper  •  To  guard  against  node  flap,  explicit  membership  join  and   leave  –  now  you  know  why  hinted  handoff  was  added    
  • Membership  &  Failure  detecXon  •  Consensus  &  Atomic  Broadcast    -­‐  impossible  to   solve  in  a  distributed  system[88,89]   –  Cannot  differenXate  between  an  slow  system  and  a   crashed  system    •  Completeness   –  Every  system  that  crashed  will  be  eventually   detected  •  Correctness   –  A  correct  process  is  never  suspected  •  In  short,  if  you  are  dead  somebody  will  no<ce  it   and  if  you  are  alive,  nobody  will  mistake  you  for   dead  !  
  • Ø  Accrual  Failure  Detector  •  Not    Boolean  value  but  a  probabilisXc  number  that  “accrues”  over   an  exponenXal  scale  •  Captures  the  degree  of  confidence  that  a  corresponding  monitored   process  has  crashed[94]   –  Suspicion  Level   –  Ø  =  1  -­‐>  prob(error)  10%   –  Ø  =  2  -­‐>  prob(error)  1%   –  Ø  =  3  -­‐>  prob(error)  0.1%  •  If  process  is  dead,     –  Ø  is  monotonically  increasing  &  Ø→α  as  t  →α  •  If  process  is  alive  and  kicking,  Ø=0  •  Account  for  lost  messages,  network  latency  and  actual  crash  of   system/process  •  Well  known  heartbeat  period  Δi,  then  network  latency  Δtr  can  be   tracked  by  inter-­‐arrival  Xme  modeling  
  • Write/Read  Mechanisms  •  Read  &  Write  to  a  random  node   (StorageProxy)  •  Proxy  coordinates  the  read  and  write   strategy  (R/W  =  any,  quorum  et  al)  •  Memtables/SSTables  from  big  table  •  Bloom  Filter/Index  •  LSM  Trees  
  • Hbase – WAL, Node Write Memstore, HDFS File system Commit Logs Node M e m o MemTable r y Read Flushing Index Index Index D i BF BF BF s kSSTable• Immutable• Compaction• Maintain Index & Bloom Filter
  • How…  does  HBase  work  again?   http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html http://hbaseblog.com/2010/07/04/hug11-hbase-0-90-preview-wrap-up/
  • Bloom  Filter  •  The  BloomFilter  answers  the  quesXon    •  “Might  there  be  data  for  this  key  in  this   SSTable?”  [Ref:  Cassandra/Hbase  mailer]   –  “Maybe"  or   –   “Definitely  not“   –  When  the  BloomFilter  says  "maybe"  we  have  to  go  to   disk  to  check  out  the  content  of  the  SSTable  •  Depends  on  implementaXon   –  Redone  in  Cassandra   –  Hbase  0.20.x  removed,  will  be  back  in  0.90  with  a   “jazzy”  implementaXon  
  • Was it a vision, or a waking dream?Fled is that music:—do I wake or sleep? -Keats, Ode to a Nightingale
  • •  http://www.readwriteweb.com/enterprise/2011/11/infographic-data- deluge---8-ze.php•  http://www.crn.com/news/data-center/232200061/efficiency-or- bust-data-centers-drive-for-low-power-solutions-prompts-channel- growth.htm•  http://www.quantumforest.com/2011/11/do-we-need-to-deal-with- big-data-in-r/•  http://www.forbes.com/special-report/2011/migration.html•  http://www.mercurynews.com/bay-area-news/ci_19368103•  http://www.businessinsider.com/apple-new-data-center-north- carolina-created-50-jobs-2011-11