• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Israel Meetup : Couchbase and Big Data
 

Big Data Israel Meetup : Couchbase and Big Data

on

  • 682 views

See how you can work with Couchbase and Hadoop to provide real time data analysis at the top of your BigData infrastructure.

See how you can work with Couchbase and Hadoop to provide real time data analysis at the top of your BigData infrastructure.

Statistics

Views

Total Views
682
Views on SlideShare
682
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Israel Meetup : Couchbase and Big Data Big Data Israel Meetup : Couchbase and Big Data Presentation Transcript

    • Thursday, March 7, 13
    • BigData,  NoSQL,  Couchbase Tugdual  “Tug”  Grall Technical  Evangelist email:  tug@couchbase.com twi0er:  @tgrallThursday, March 7, 13
    • BigData,  NoSQL,  Couchbase Tugdual  “Tug”  Grall Technical  Evangelist email:  tug@couchbase.com twi0er:  @tgrallThursday, March 7, 13
    • About  me • Tugdual  “Tug”  Grall • Web -­‐ Couchbase -­‐    @tgrall -­‐ Technical  Evangelist -­‐      hLp://blog.grallandco.com -­‐ eXo -­‐      tgrall -­‐ CTO • NantesJUG  co-­‐founder -­‐ Oracle • Pet  Project  : -­‐ Developer/Product  Manager • hLp://www.resultri.com -­‐ Mainly  Java/SOA -­‐ Developer  in  consulIng  firmsThursday, March 7, 13
    • $30B  Database  Market  Being  Disrupted 95% <50%? Rela&onal  Technology Rela&onal  Technology Other Rela&onal   Technology Rela&onal   NoSQL Technology Technology 2012 2027 All  new  database  growth  will  be  NoSQLThursday, March 7, 13
    • OperaIonal  vs.  AnalyIc  Databases Real-­‐Mme,   AnalyMc InteracMve  Databases Databases NoSQL Fast  access   Get  insights  from   to  data data Couchbase Cassandra Cloudera MongoDB Hbase Hortonworks MaprThursday, March 7, 13
    • Couchbase  Server  Core  Principles Easy   Consistent  High   Scalability PE RF O R M A N C E Performance Grow  cluster  without   Consistent  sub-­‐millisecond   applicaIon  changes,  without   read  and  write  response  Imes   downIme  with  a  single  click with  consistent  high  throughput Always  On   JSON JSON JSO Flexible  Data   24x365 Model JSON N JSON No  downIme  for  soXware   JSON  document  model  with  no   upgrades,  hardware   fixed  schema. maintenance,  etc.Thursday, March 7, 13
    • What  Is  Biggest  Data  Management  Problem  Driving   Use  of  NoSQL  in  Coming  Year? 49% 35% 29% 16% 12% 11% Lack  of  flexibility/ Inability  to  scale   Performance   Cost All  of  these Other rigid  schemas out  data challenges Source:  Couchbase  Survey,  December  2011,  n  =  1351.Thursday, March 7, 13
    • Couchbase  2.0  Launch! 12/12/12Thursday, March 7, 13
    • Couchbase  2.0  New  Features JSON support Indexing and Querying Incremental Map Cross data center Reduce replicationThursday, March 7, 13
    • Couchbase  Handles  Real  World  ScaleThursday, March 7, 13
    • 8092 Couchbase  Server  2.0  Architecture 11211 11210 Query  API Memcapable    1.0 Memcapable    2.0 Sub<tle Moxi Query  Engine REST  management  API/Web  UI vBucket  state  and  replica<on  manager Memcached Global  singleton  supervisor Rebalance  orchestrator Configura<on  manager Node  health  monitor Process  monitor Couchbase  EP  Engine Heartbeat Data  Manager Cluster  Manager storage  interface New  Persistence  Layer h_p on  each  node one  per  cluster Erlang/OTP HTTP Erlang  port  mapper Distributed  Erlang 8091 4369 21100  -­‐  21199Thursday, March 7, 13
    • Couchbase  Server  2.0  Architecture 8092 11211 11210 Query  API Memcapable    1.0 Memcapable    2.0 Moxi Query  Engine REST  management  API/Web  UI vBucket  state  and  replica<on  manager Object-­‐level  Cache Global  singleton  supervisor RAM  Cache,   Rebalance  orchestrator Configura<on  manager Node  health  monitor Server/Cluster   Process  monitor Indexing  &   Heartbeat Couchbase  EP  Engine Management  &   Persistence   storage  interface CommunicaMon Management (Erlang) New  (C  Persistence Disk   &  V8) Persistence  Layer h_p on  each  node one  per  cluster Erlang/OTP The Unreasonable Effectiveness of C by Damien Katz HTTP Erlang  port  mapper Distributed  Erlang 8091 4369 21100  -­‐  21199Thursday, March 7, 13
    • Open  Source  Project Apache  2.0 hLps://github.com/couchbase/ Gerrit: hLp://review.couchbase.org/ hLps://github.com/couchbaselabs/Thursday, March 7, 13
    • Official  SDKs Ruby libcouchbase Clojure Python www.couchbase.com/develop GoThursday, March 7, 13
    • Write  OperaIon Doc  1 App  Server 3 3 Managed  Cache 2 To  other  node ReplicaIon   Queue Disk  Queue Disk Couchbase  Server  NodeThursday, March 7, 13
    • Write  OperaIon App  Server 3 3 Managed  Cache 2 To  other  node ReplicaIon   Doc  1 Queue Disk  Queue Disk Couchbase  Server  NodeThursday, March 7, 13
    • Write  OperaIon App  Server 3 3 Managed  Cache 2 To  other  node ReplicaIon   Doc  1 Doc  1 Doc  1 Queue Disk  Queue Disk Couchbase  Server  NodeThursday, March 7, 13
    • Basic  OperaIons • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know • App  reads,  writes,  updates  docs • MulMple  app  servers  can  access  same   document  at  same  Mme COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know • App  reads,  writes,  updates  docs REPLICA REPLICA REPLICA • MulMple  app  servers  can  access  same   Doc  4 Doc Doc  6 Doc Doc  7 Doc document  at  same  Mme Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know • App  reads,  writes,  updates  docs REPLICA REPLICA REPLICA • MulMple  app  servers  can  access  same   Doc  4 Doc Doc  6 Doc Doc  7 Doc document  at  same  Mme Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs ACTIVE ACTIVE ACTIVE Doc  5 Doc Doc  4 Doc Doc  1 Doc Doc  2 Doc Doc  7 Doc Doc  2 Doc Doc  9 Doc Doc  8 Doc Doc  6 Doc REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE Doc  5 Doc Doc  4 Doc Doc  1 Doc Doc  2 Doc Doc  7 Doc Doc  2 Doc Doc  9 Doc Doc  8 Doc Doc  6 Doc REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc Doc  9 Doc Doc  8 Doc Doc  6 Doc REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know • App  reads,  writes,  updates  docs REPLICA REPLICA REPLICA Doc  4 Doc Doc  6 Doc Doc  7 Doc Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Basic  OperaIons APP  SERVER  1 APP  SERVER  2 COUCHBASE  Client  Library COUCHBASE  Client  Library CLUSTER  MAP CLUSTER  MAP • Docs  distributed  evenly  across  servers   READ/WRITE/UPDATE • Each  server  stores  both  acMve  and  replica   SERVER  1 SERVER  2 SERVER  3 docs Only  one  doc  ac<ve  at  a  <me ACTIVE ACTIVE ACTIVE • Client  library  provides  app  with  simple   Doc  5 Doc Doc  4 Doc Doc  1 Doc interface  to  database Doc  2 Doc Doc  7 Doc Doc  2 Doc • Cluster  map  provides  map   to  which  server  doc  is  on Doc  9 Doc Doc  8 Doc Doc  6 Doc App  never  needs  to  know • App  reads,  writes,  updates  docs REPLICA REPLICA REPLICA • MulMple  app  servers  can  access  same   Doc  4 Doc Doc  6 Doc Doc  7 Doc document  at  same  Mme Doc  1 Doc Doc  3 Doc Doc  9 Doc Doc  8 Doc Doc  2 Doc Doc  5 Doc COUCHBASE  SERVER    CLUSTERThursday, March 7, 13
    • Store  &  Retrieve  OperaIons • get  (key) –  Retrieve  a  document • set  (key,  value) –  Store  a  document,  overwrites  if  exists • add  (key,  value) –  Store  a  document,  error/excep<on  if  exists • replace  (key,  value) –  Store  a  document,  error/excep<on  if  doesn’t  exist • cas  (key,  value,  cas) –  Compare  and  swap,  mutate  document  only  if  it  hasn’t  changed   while  execu<ng  this  opera<onThursday, March 7, 13
    • JSON  Document  Structure meta { Meta  InformaMon   “id”:  “u::jasdeep@couchbase.com”, Including  Key “rev”:  “1-­‐0002bce0000000000”, “flags”:  0, “expiraMon”:  0, All  Keys  Unique  and   “type”:  “json” Kept  in  RAM } document { “uid”:  123456, Document  Value “firstname”:  “jasdeep”, “lastname”:  “Jaitla”, Most  Recent  In  Ram   “age”:  22, “favorite_colors”:  [“blue”,  “black”], And  Persisted  To  Disk “email”:  “jasdeep@couchbase.com” }Thursday, March 7, 13
    • Thursday, March 7, 13
    • DEMONSTRATIONThursday, March 7, 13
    • Thursday, March 7, 13
    • HADOOP  &  NOSQLThursday, March 7, 13
    • What  is  Sqoop? Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. sqoop.apache.orgThursday, March 7, 13
    • What  is  Sqoop? • Traditional ETL T Data Application DataThursday, March 7, 13
    • What  is  Sqoop? • A different paradigm Application Data DataThursday, March 7, 13
    • What  is  Sqoop? • A very scalable different paradigm Application Data Application Data Application Data DataThursday, March 7, 13
    • What  is  Sqoop? • Where did the Transform go? TTT TTT TTT TTT Application DataThursday, March 7, 13
    • Sqoop  Details • Sqoop   • Default  connecIon  is  via  JDBC • Lots  of  custom  connectors • Couchbase,  VoltDB,  VerIca • Teradata,  Netezza • Oracle,  MySQL,  PostgresThursday, March 7, 13
    • Title  Only 40  milliseconds  to  respond  with   the  decision. 3 profiles,  real  Ime  campaign   staIsIcs 2 1 profiles,  campaigns eventsThursday, March 7, 13
    • Couchbase  Import  and  Export $ sqoop import –-connect http://localhost:8091/pools --table DUMP $ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5 $ sqoop export --connect http://localhost:8091/pools --table DUMP –export-dir DUMP •For  Imports,  table  must  be: – DUMP:  All  keys  currently  in  Couchbase – BACKFILL_n:  All  key  mutaIons  for  n  minutes •Specified  –username  maps  to  bucket – By  default  set  to  “default”  bucketThursday, March 7, 13
    • Thursday, March 7, 13
    • COUCHBASE  “ANALYTICS”Thursday, March 7, 13
    • RelaIonal  vs  Document C1 C2 C3 C4 { JSON JSON } JSON RelaMonal  data  model Document  data  model Highly-­‐structured  table  organizaIon   CollecIon  of  complex  documents  with with  rigidly-­‐defined  data  formats  and   arbitrary,  nested  data  formats  and record  structure. varying  “record”  format.Thursday, March 7, 13
    • JSON  Only?  Nope! Couchbase  can  store  straight  strings,  binary  blobs,  JSON   Documents  and  a  special  structure  called  an  Atomic   Counter  (a  posiIve  integer  value  with  its  own  special   operaIons). Maximum  size  of  a  Document  is  20MB.Thursday, March 7, 13
    • Document  Keys Document  Keys  Come  In  Many  Flavors { JSON • Human  Readable JSON • Incremental  Counter  Index • UUID • Timestamp  Based } JSON • Social  Media  Account  ID • Random  NumbersThursday, March 7, 13
    • Document  Keys If your keys are indeterminable, you will need Secondary Indexes -- Views (Map or Map/ Reduce) or Elastic Search to find Documents.Thursday, March 7, 13
    • Document  Keys If your keys are indeterminable, you will need Secondary Indexes -- Views (Map or Map/ Reduce) or Elastic Search to find Documents. There  are  many  paLerns  for  key  creaIon,   it’s  a  skill  and  an  art  to  design  your  keys.Thursday, March 7, 13
    • Document  Keys If you want to find Documents based on more than one parameter, you may need Views as well.Thursday, March 7, 13
    • Document  Keys If you want to find Documents based on more than one parameter, you may need Views as well. In  many  cases  Lookups  can  also  be  done  without   Views,  using  a  Lookup  PaLern,  but  that’s  not  always   the  case  especially  for  Ime  based  or  geo  based  values.Thursday, March 7, 13
    • Map  FuncIonThursday, March 7, 13
    • Thursday, March 7, 13
    • DEMONSTRATIONThursday, March 7, 13
    • View  Indexes  are  Append  Only  B+  Trees,  so   new  data  is  just  added  to  them,  and  they   are  compacted  and  opImized  automaIcally Views  are  only  Re-­‐Indexed  if  you  change  their   definiIon  and  republish  them.  The  original  index   stays  available  unIl  new  redefined  index   completes  indexing.Thursday, March 7, 13
    • Text  or  Numeric  Based  Keys Map function(doc,  meta)  { emit(doc.email,  meta.id) }Thursday, March 7, 13
    • Text  or  Numeric  Based  Keys Map function(doc,  meta)  { emit(doc.email,  meta.id) } text keyThursday, March 7, 13
    • Text  or  Numeric  Based  Keys Map function(doc,  meta)  { emit(doc.email,  meta.id) } text key doc.email meta.id abba@couchbase.com u::1 jasdeep@couchbase.com u::2 zorro@couchbase.com u::3Thursday, March 7, 13
    • View  UpdaIng Couchbase Bucket Design Document 1 Design Document 2 View View View View ViewThursday, March 7, 13
    • View  UpdaIng Couchbase Bucket Updates every 3 seconds or 5000 document operations Design Document 1 Design Document 2 View View View View ViewThursday, March 7, 13
    • View  UpdaIng This is a Configurable Setting Couchbase Bucket Updates every 3 seconds or 5000 document operations Design Document 1 Design Document 2 View View View View ViewThursday, March 7, 13
    • View  UpdaIng Couchbase Bucket Design Document 1 Design Document 2 View View View View ViewThursday, March 7, 13
    • View  UpdaIng Couchbase Bucket Can also be Triggered to Update by client queries by using stale=false parameter Design Document 1 Design Document 2 View View View View ViewThursday, March 7, 13
    • Indexing  &  Querying • Define  materialized  views  on  JSON  documents  and  then  query  across   the  data  set   • Using  views  you  can  define -­‐ Primary  indexes   -­‐ Simple  secondary  indexes  (most  common  use  case) -­‐ Complex  secondary,  terIary  and  composite  indexes -­‐ AggregaIons  (reducIon)   • Indexes  are  eventually  indexed   • Queries  are  eventually  consistent  with  respect  to  documents • Built  using  Map/Reduce  technology   -­‐ Map  and  Reduce  funcIons  are  wriLen  in  JavascriptThursday, March 7, 13
    • reddalyzr.com http://reddalyzr.com/ Real-Time Analysis of Reddit using Couchbase & ClojureThursday, March 7, 13
    • I’m  Excited  to  See  What  You  Build, Q  &  A Couchbase  Docs Contact  me  on  Twi_er www.couchbase.com/docs/index-­‐full.html @tgrall Couchbase  Forums Contact  me  by  Email www.couchbase.com/forums tug@couchbase.com IRC Learn  More  About  Design  Pa_erns #couchbase CouchbaseModels.com #libcouchbase Senng  up  for  Ruby  on  Rails CouchbaseOnRails.comThursday, March 7, 13
    • Thursday, March 7, 13
    • Q&AThursday, March 7, 13
    • Thursday, March 7, 13
    • Thursday, March 7, 13