• Share
  • Email
  • Embed
  • Like
  • Private Content
NoSQL for SQL Professionals
 

NoSQL for SQL Professionals

on

  • 1,439 views

 

Statistics

Views

Total Views
1,439
Views on SlideShare
1,435
Embed Views
4

Actions

Likes
2
Downloads
53
Comments
1

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • For more implementation details about the McGraw Hill Case Study, you can view these slides: https://speakerdeck.com/u/christse/p/edsense-building-a-self-adapting-interactive-learning-portal-with-couchbase
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NoSQL for SQL Professionals NoSQL for SQL Professionals Presentation Transcript

    • Unlock Potential NoSQL for SQL Professionals William McKnight Dipti Borkar President Director, Product Management McKnight Consulting Group Couchbase October 16, 2012Copyright © 2012 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1
    • William McKnightPresident,  McKnight  Consul5ng  Group    •  Frequent  keynote  speaker  and  trainer  interna5onally    •  Consulted  to  Pfizer,  Sco5abank,  Teva  Pharmaceu5cals,   Verizon,  and  many  other  Global  1000  companies  •  A  prolific  writer  with  hundreds  of  ar5cles,  blogs  and  white   papers  in  publica5on  •  Focused  on  delivering  business  value  and  solving  business   problems  u5lizing  proven,  streamlined  approaches  to   informa5on  management  •  Former  Fortune  50  Informa5on  Technology  execu5ve   2  
    • Former Enterprise Information Holy Grail USERS/REPORTS MDBS DATA WAREHOUSES AND USERS ATIONS APPLIC TIONALOPERA RDBMS LEGACY SOURCES DATA MARTS DATA INTEGRATION AL LYTIC ANA NAL ATIO OPER 3  
    • No More 4  
    • The Relational Database Data Page Page Header 1120 Aris Doug Johnson Practice Director 206-676-5636 doug.johnson@aris.com Records 1121 Stolt Offshore MS Craig Lennox Mr +66 1226 71269 craig.lennox@stoltoffshore.com 1122Medtronic, Inc. Mark Kohls Principle Database Administrator Page 763.516.2557 mark.kohls@medtronic.com Footer© McKnight Consulting Group, 2010 Row IDs 5  
    • What does Big Data Mean?"   Data in NoSQL - No SQL allowed or Not Only SQL?"   Sensor, social and web data?"   Data in a system that does not support SQL?"   A system with petabytes?"   Hadoop? 6  
    • Why the Sudden Explosionof Interest?"   An increased number and variety of data sources that generate large quantities of data –  Sensors (e.g. location, RFID, …) –  Social (e.g. twitter, wikis, … ) –  Web clicks"   Realization that data was “too valuable” to delete –  Even when little signal to lots of noise"   Dramatic decline in the cost of hardware, especially storage –  If storage was still $100/GB there would be no big data revolution underway 7  
    • Why NoSQL for Big Data "   More data model flexibility –  JSON as a data model (think XML) –  No “schema first” requirement; load first "   Faster time to insight from data acquisition "   Relaxed ACID –  Eventual consistency –  Willing to trade consistency for availability –  ACID would crush things like storing clicks on Google "   Low upfront software costs "   Utilizes Java "   Full Scans "   Programmers love the freedoms 8  
    • Hadoop, MapReduce and “Big Data” •  Parallel programming framework •  Hadoop is an open source distributed file system (HDFS) plus MapReduce •  Hadoop is used by those facing webscale-data challenges 9  
    • Who uses Hadoop 40,000+ nodes running Hadoop Research for Ad systems and web search Product search indexes Analytics from user sessions Log analysis for reporting and analytics and machine learning Log analysis, data mining, and machine learning Large scale image conversion High energy physics, genomics, Digital Sky Survey 10  
    • ACID"   Atomicity – full transactions pass or fail"   Consistency – database in valid state after each transaction"   Isolation – transactions do not interfere with one another"   Durability – transactions remain committed no matter what (i.e., crashes) 11  
    • What Gives the CIO Heartburn AboutNoSQL"   Developer Skills"   Lack of ACID Compliance"   Tools lacking and Projects Flawed"   Fast Nature of Unburdened Projects"   Different Developers"   Schema-less/lite Models"   Lack of Payback Methodology 12  
    • MapReduce 1.  Take  a  large  problem  and  divide  it  into  sub-­‐problems     …MAP   2.  Perform  the  same  func5on  on  all  sub-­‐problems       3.  Combine  the  output   DoWork()   … DoWork()   DoWork()    REDUCE   … Output   13  
    • MapReduce (MR)"   Programming framework (library and runtime) for analyzing data sets stored in HDFS"   MapReduce jobs are composed of two functions –  Map –  Reduce"   User only writes the Map and Reduce functions"   MR framework provides all the “glue” and coordinates the execution of the Map and Reduce jobs on the cluster. –  Fault tolerant –  Scalable 14  
    • A Quick Summary Parallel DB Systems NoSQLData Model " Structured data with known "   Any data will fit in any schema format "   (un)(semi)structuredHardware " Purchased as an appliance "   “User assembled” fromConfiguration commodity machinesFault Tolerance " Failures assumed to be rare "   Failures assumed to be " No query level fault tolerance common "   Simple, yet efficient, fault tolerance. Where to do big data analytics? 15  
    • Key-Value Stores"   NoSQL OLTP"   A record may look like: –  Book: “Of Mice and Men": Author: “Hemmingway“"   Great for unstructured data centered on a single object."   Typically used as a cache for data frequently requested by web applications such as online shopping carts or social-media sites. 16  
    • Document Stores"   A record may look like: –  “id” => 12345, –  “name” => “Jane”, –  “age” => 22, –  “address” => number => 123 street => Main"   Often deployed for web-traffic analysis, social gaming, content stores, user-behavior/action analysis, or log-file analysis in real time. 17  
    • Graph Stores: EmphasizingRelationships as Primary Data"   Based on Graph Theory –  Vertices (nodes), edges (relations) and properties"   Navigating social networks, configurations and recommendations –  i.e., Get the cheapest flights from DFW to SYD leaving on 7/12/12 with a minimum number of stops and each stop less than 2 hours."   i.e., Social Networks –  Churn and Offer Management 18  
    • Picking the Right NoSQL Database From “Picking the Right NoSQL Database Tool” by Mikayel Vardanyan 19  
    • The NoSQL Challenge 20  
    • There’s No Technology Silver Bullet Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,201021 > 21  
    • Hybrid Information Universe DATA MARTS USERS/REPORTS DATA STREAM HADOOP DATA WAREHOUSE PROCESSING MDBS AND USERS ATIONS APPLIC TIONALOPERA RDBMS DATA WAREHOUSE LEGACY APPLIANCE SOURCES ELEMENTS IN THE CLOUD NOSQL COLUMNAR DATABASES DATA INTEGRATION AL LYTIC MASTER DATA ANA NAL ATIO OPER SYNDICATED DATA 22  
    • Data Integration Big   UnBig   (NoSQL)   (RDBMS)   "   Increasingly data first lands in the unstructured universe "   NoSQL stores are big data "EL" tools "   The Need for Data Integration with the Enterprise 23  
    • Agile Approaches Business Analysis Design Construction 5 11 9 Data ETL ETL Design Analysis DevelopmentJustification Planning Deployment Support 8 12 Database Application Design Development 2 4 1 3 6 16 17 Enterprise Project 15 Business Case Infrastructure Project Requirements Application Implementation Release Operate and Assessment Evaluation Planning Prototyping Evaluation Maintain Definition 13 Data Mining 7 10 14 Metadata Metadata Metadata Repository Repository Repository Analysis Design Development Source: Business Intelligence Roadmap, Larissa Moss & Shaku Atre" 24 24  
    • Cloud  Services  The benefits of cloud computing are:• On-Demand and Self Service• Broad Network Access• Resource Pooling• Rapid Elasticity• Measured Service Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif) 25  
    • Information Store Guidance     Real-­‐ Small  Data   Terabytes   Petabytes   Historical     Unstructured     Source  Data   Random   Ad-­‐hoc   Time   OK   Data   Data   supplier  to     Queries   other  systems  OperaKonal  Systems                          Columnar  database                          Data  Mart  (relaKonal)                      Data  Stream  Processing              Data  Warehouse              NoSQL                      Master  Data  Management                      MulKdimensional  Mart                               26  
    • What Will Motivate IT to Adopt NoSQL?"   Continuation of Big Vendor Legacy Seen as Too Expensive"   Scaling: Data > 1 Machine"   Schema Flexibility"   Mandatory Requirements to Keep Multiple Years of Highly Detailed Data"   Tired of Losing “Deals” to More Agile Hybrid IT Organizations"   NoSQL Tool Marketplace Innovations 27  
    • NoSQL  for    Interac5ve  Applica5ons   28  
    • Couchbase  Server   NoSQL  Document  Database   NoSQL  Database   2.0 29  
    • Market  Adop5on   Internet  Companies   Enterprises   •  Social  Gaming   • Communica5ons   •  Ad  Networks   • Retail   •  Social  Networks   • Financial  Services   •  Online  Business   • Health  Care   Services   • Automo5ve/Airline   •  E-­‐Commerce   • Agriculture   •  Online  Media   •  Content  Management   • Consumer  Electronics   •  Cloud  Services   • Business  Systems   30  
    • Market  Adop5on  –  Customers   Internet  Companies   Enterprises   More  than  300  customers  -­‐-­‐  5,000  producKon  deployments  worldwide   31  
    • RELATIONAL  VS  NOSQL  DOCUMENT   DATABASES   32  
    • Rela5onal  vs  Document  data  model   C1   C2   C3   C4   {   JSON       JSON     }   JSON   RelaKonal  data  model   Document  data  model   Highly-­‐structured  table  organiza5on   Collec5on  of  complex  documents  with   with  rigidly-­‐defined  data  formats  and   arbitrary,  nested  data  formats  and   record  structure.   varying  “record”  format.   33  
    • Example:  User  Profile   User  Info   Address  Info   KEY   First   Last   ZIP_id   ZIP_id   CITY   STATE   ZIP   1   DipK   Borkar   2   1   DEN   CO   30303   2   Joe Smith   2   2   MV   CA   94040     3   Ali   Dodson   2   3   CHI   IL   60609   4   John   Doe   3   4   NY   NY   10010   To  get  informaKon  about  specific  user,  you  perform  a  join  across  two  tables     34  
    • Document  Example:  User  Profile    {          “ID”:  1,   =   +          “FIRST”:  “DipK”,          “LAST”:  “Borkar”,          “ZIP”:  “94040”,          “CITY”:  “MV”,          “STATE”:  “CA”      }   JSON   All  data  in  a  single  document   35  
    • Rela5onal  Technology  Scales  Up   ApplicaKon  Scales  Out   Just  add  more  commodity  web  servers   System  Cost   Applica5on  Performance    Web/App  Server  Tier   Users   RDBMS  Scales  Up   Get  a  bigger,  more  complex  server   System  Cost   Applica5on  Performance     Won’t   scale   beyond   this  point   RelaKonal  Database   Users   Expensive  and  disrupKve  sharding,  doesn’t  perform  at  web  scale   36  
    • Couchbase  Server  Scales  Out  Like  App  Tier   ApplicaKon  Scales  Out   Just  add  more  commodity  web  servers   System  Cost   Applica5on  Performance    Web/App  Server  Tier   Users   NoSQL  Database  Scales  Out   Cost  and  performance  mirrors  app  Ker   System  Cost   Applica5on  Performance     Couchbase  Distributed  Data  Store   Users   Scaling  out  flalens  the  cost  and  performance  curves   37  
    • NoSQL  Database  Considera5ons   Easy   Consistent  High   Scalability   Performance   Grow  cluster  without  applica5on   Always  awesome  experience     changes,  without  down5me   for  your  applica5on  users.   when  needed   Always  On   Flexible   24x7x365   Data  Model   The  sun  never  sets  on  the  Internet,   Keep  developers  produc5ve  and   your  applica5on  needs  the  database   allow  fast  and  easy  addi5on  of     to  always  serve  data.   new  features   38  
    • USE  CASE  AND  APPLICATION   EXAMPLES   39  
    • Data  driven  use  cases     •  Support  for  unlimited  data  growth       •  Data  with  non-­‐homogenous  structure     •  Need  to  quickly  and  ofen  change  data  structure   •  3rd  party  or  user  defined  structure   •  Variable  length  documents   •  Sparse  data  records   •  Hierarchical  data     40  
    • Performance  driven  use  cases   •  Low  latency  magers   •  High  throughput  magers   •  Large  number  of  users     •  Unknown  demand  with  sudden  growth  of   users/data     •  Predominantly  direct  document  access   •  Workloads  with  very  high  muta5on  rate  per   document   41  
    • Use  Case  Examples  Web  app  or  Use-­‐case   Couchbase  SoluKon   Example  Customer  Content  and  Metadata   Couchbase  document  store  +  Elas5c  Search   McGraw-­‐Hill…  Management  System  Social  Game  or  Mobile   Couchbase  stores  game  and  player  data   Zynga…  App    Ad  TargeKng   Couchbase  stores  user  informa5on  for  fast   AOL…   access  User  Profile  Store   Couchbase  Server  as  a  key-­‐value  store   TuneWiki…    Session  Store   Couchbase  Server  as  a  key-­‐value  store   Concur….    High  Availability     Couchbase  Server  as  a  memcached  5er   Orbitz…    Caching  Tier   replacement    Chat/Messaging   Couchbase  Server   DOCOMO…  Plaoorm   42  
    • Use  Case:  Social  Gaming   Social  and  Mobile  Gaming   Types  of  Data   ApplicaKon  Requirements   •  User  account  informa5on   •  Ability  to  support  rapid  growth   •  User  game  profile  info   •  Fast  response  5mes  for   •  User’s  social  graph   awesome  user  experience   •  State  of  the  game   •  Game  up5me  –24x7x365   •  Player  badges  and  stats   •  Easy  to  update  apps  with  new   features     Why  NoSQL  and  Couchbase     •  Scalability  ensures  that  games  are  ready  to  handle  the  millions   of  users  that  come  with  viral  growth.     •  High  performance  guarantees  players  are  never  lef  wai5ng  to   make  their  next  move.     •  Always-­‐on  opera5ons  means  zero  interrup5on  to  game  play   (and  revenue)     •  Flexible  data  model  means  games  can  be  developed  rapidly  and   updated  easily  with  new  features   43  
    • Use  Case:  Ad  Targe5ng   Ad  TargeKng   Types  of  Data   ApplicaKon  Requirements   •  User  profile:  preferences   •  High  performance  to  meet   and  psychographic  data   limited  ad  serving  budget;  5me     •  Ad  serving  history  by  user   allowance  is  typically  <40  msec   •  Ad  buying  history  by   •  Scalability  to  handle  hundreds   adver5ser       of  millions  of  user  profiles  and   rapidly  growing  amount  of   •  Ad  serving  history  by   data   adver5ser     •  24x7x365  availability  to  avoid   ad  revenue  loss   Why  NoSQL  and  Couchbase     •  Sub-­‐millisecond  reads/writes  means  less  5me  is  needed  for  data   access,  more  5me  is  available  for  ad  logic  processing,  and  more   highly  op5mized  ads  will  be  served   •  Ease  of  scalability  ensures  that  the  data  cluster  can  be  grown   seamlessly  as  the  amount  of  user  and  ad  data  grows   •  Always-­‐on  opera5ons  =  always-­‐on  revenue.  You  will  never  miss   the  opportunity  to  serve  an  ad  because  down5me.   44  
    • Use  Case:  Content  and  metadata  store   Building  a  self-­‐adapKng,   interacKve  learning  portal  with   Couchbase   45  
    • The Problem   As learning move online in great numbers Growing need to build interactive learning environments that 0101001001 Scale!! 1101010101 0101001010 101010   Scale  to  millions   Serve  MHE  as  well  as  third-­‐ Including   Support   Self-­‐adapt  via   of  learners   party  content   open  content   learning  apps   usage  data   46  
    • The Challenge  Hmmm...this  looks  kinda   Backend is an Interactive Contentlike:  +  Content  Caching  (Scale)   Delivery Cloud that must:+  Social  Gaming  (Stats)    +  Ad  Targe<ng  (Smarts)   •  Allow  for  elastic scaling  under  spike  periods   •  Ability  to  catalog  &  deliver  content  from  many sources   •  Consistent  low-latency  for  metadata  and  stats  access   •  Require  full-text  search  support  for  content  discovery   •  Offer  tunable  content  ranking & recommendation   func5ons     Experimented with a combination of: XML  Databases   In-­‐memory  Data  Grids   SQL/MR  Engines   Enterprise  Search  Servers   47  
    • The Technologies   48  
    • The Learning Portal   •  Designed and built as a collaboration between MHE Labs and Couchbase •  Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration •  Available for download and further development as open source code 49  
    • COUCHBASE  SOLUTION   “THE  BASICS”   50  
    • Basic  Opera5on   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     READ/WRITE/UPDATE   SERVER  1     SERVER  2     SERVER  3     •  Docs  distributed  evenly  across     ACTIVE     ACTIVE     ACTIVE   servers     Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   •  Each  server  stores  both  acKve  and   replica  docs   Doc  2   Doc   Doc  7   Doc   Doc  2   Doc   –  Only  one  server  ac5ve  at  a  5me   •  Client  library  provides  app  with   Doc  9   Doc   Doc  8   Doc   Doc  6   Doc   simple  interface  to  database   REPLICA   REPLICA   REPLICA   •  Cluster  map  provides  map     to  which  server  doc  is  on   Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   –  App  never  needs  to  know   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   •  App  reads,  writes,  updates  docs   Doc  8   Doc   Doc  2   Doc   Doc  5   Doc   •  MulKple  app  servers  can  access  same   document  at  same  Kme   COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   51  
    • Add  Nodes  to  Cluster   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     READ/WRITE/UPDATE   READ/WRITE/UPDATE   SERVER  1     SERVER  2     SERVER  3     SERVER  4     SERVER  5     •  Two  servers  added  with             ACTIVE   ACTIVE   ACTIVE   ACTIVE   ACTIVE   one-­‐click  operaKon   Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   •  Docs  automaKcally   rebalance  across  cluster   Doc  2   Doc   Doc  7   Doc   Doc  2   Doc   –  Even  distribu5on  of  docs   –  Minimum  doc  movement   Doc  9   Doc   Doc  8   Doc   Doc  6   Doc   •  Cluster  map  updated   REPLICA   REPLICA   REPLICA   REPLICA   REPLICA   •  App  database     Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   calls  now  distributed     over  larger  number  of   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   servers     Doc  8   Doc   Doc  2   Doc   Doc  5   Doc   COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   52  
    • Fail  Over  Node   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     SERVER  1     SERVER  2     SERVER  3     SERVER  4     SERVER  5     •  App  servers  accessing  docs             ACTIVE   ACTIVE   ACTIVE   ACTIVE   ACTIVE   •  Requests  to  Server  3  fail   Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   Doc  9   Doc   Doc  6   Doc   •  Cluster  detects  server  failed   –  Promotes  replicas  of  docs  to   Doc  2   Doc   Doc  7   Doc   Doc  3   Doc   Doc  8   Doc   Doc   ac5ve   –  Updates  cluster  map   Doc  1   Doc  3   •  Requests  for  docs  now  go  to   REPLICA   REPLICA   REPLICA   REPLICA   REPLICA   appropriate  server   Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   Doc  5   Doc   Doc  8   Doc   •  Typically  rebalance     would  follow   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   Doc  2   Doc   COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   53  
    • Couchbase  Server  Admin  Console   54  
    • 55  
    • Q  &  A   56  
    • William McKnight Dipti Borkarwmcknight@mcknightcg.com dipti@couchbase.comwww.mcknightcg.com www.couchbase.com 57