DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today


Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today

  1. 1. Dec, 7 2011Real NoSQLApplications in theEnterprise Today. Apache
 Cassandra Jonathan Ellis, CTO DataStax Matt Aslett, 451 Group
  2. 2. Welcome and Housekeeping  We will email the presentation after the webinar  Please ask questions using the Q&A panel. I will ask the panelists at the end of the presentation.  You can contact me at
  3. 3. Our presenters  Matt Aslett - Senior Analyst   Jonathan Ellis – CTO 451 Group DataStax Matthew covers data Jonathan is CTO and co-founder management software for The at DataStax. Prior to DataStax, 451 Groups Information Jonathan worked extensively Management practice, including with Apache Cassandra while relational and non-relational employed at Racksace. Prior to databases, data warehousing Rackspace, Jonathan built a and data caching. Matthew is multi-petabyte, scalable storage also an expert in open source system based on Reed-Solomon software and contributes encoding for backup provider regularly to reports produced Mozy. In addition to his work with through the 451 Commercial DataStax, Jonathan is project Adoption of Open Source chair of Apache Cassandra. (CAOS) Research Service, as well as to the 451 CAOS Theory blog.
  4. 4. The  451  Group   451  Research  is  focused  on  the  business  of  enterprise  IT   innovaAon.  The  company’s  analysts  provide  criAcal  and  Amely   insight  into  the  compeAAve  dynamics  of  innovaAon  in  emerging   technology  segments.   Tier1  Research  is  a  single-­‐source  research  and  advisory  firm  covering   the  mulA-­‐tenant  datacenter,  hosAng,  IT  and  cloud-­‐compuAng  sectors,   blending  the  best  of  industry  and  financial  research.     The  UpAme  InsAtute  is  ‘ The  Global  Data  Center  Authority’  and  a   pioneer  in  the  creaAon  and  facilitaAon  of  end-­‐user  knowledge   communiAes  to  improve  reliability  and  uninterrupAble  availability     in  datacenter  faciliAes.   TheInfoPro  is  a  leading  IT  advisory  and  research  firm  that  provides   real-­‐world  perspecAves  on  the  customer  and  market  dynamics  of  the   enterprise  informaAon  technology  landscape,  harnessing  the   collecAve  knowledge  and  insight  of  leading  IT  organizaAons   worldwide.   ChangeWave  Research  is  a  research  firm  that  idenAfies  and  quanAfies   ‘change’  in  consumer  spending  behavior,  corporate  purchasing,  and   industry,  company  and  technology  trends.     ©  2011  by  The  451  Group.  All  rights  reserved    
  5. 5. 451  Research    MaRhew  AsleR   •  Senior  analyst,  enterprise  soTware   •  With  The  451  Group  since  2007   •   ©  2011  by  The  451  Group.  All  rights  reserved    
  6. 6. Relevant  reports    NoSQL,  NewSQL  and  Beyond    Assessing  the  drivers  behind  the  development  and  adopAon   of  NoSQL  and  NewSQL  databases,  as  well  as  data  grid/ caching  technologies    Released  April  2011    Role  of  open  source  in  driving  innovaAon      ©  2011  by  The  451  Group.  All  rights  reserved    
  7. 7. NoSQL,  NewSQL  and  Beyond   NoSQL     New  breed  of  non-­‐relaAonal   database  products     RejecAon  of  fixed  table  schema   and  join  operaAons       Designed  to  meet  scalability   requirements  of  distributed   architectures     And/or  schema-­‐less  data   management  requirements     ©  2011  by  The  451  Group.  All  rights  reserved    
  8. 8. NoSQL,  NewSQL  and  Beyond   NoSQL   NewSQL     New  breed  of  non-­‐relaAonal    New  breed  of  relaAonal   database  products   database  products     RejecAon  of  fixed  table  schema    Retain  SQL  and  ACID   and  join  operaAons      Designed  to  meet  scalability     Designed  to  meet  scalability   requirements  of  distributed   requirements  of  distributed   architectures   architectures    Or  improve  performance  so     And/or  schema-­‐less  data   horizontal  scalability  is  no   management  requirements     longer  a  necessity     ©  2011  by  The  451  Group.  All  rights  reserved    
  9. 9. NoSQL,  NewSQL  and  Beyond   NoSQL   NewSQL     New  breed  of  non-­‐relaAonal    New  breed  of  relaAonal   database  products   database  products     RejecAon  of  fixed  table  schema    Retain  SQL  and  ACID   and  join  operaAons      Designed  to  meet  scalability     Designed  to  meet  scalability   requirements  of  distributed   requirements  of  distributed   architectures   architectures    Or  improve  performance  so     And/or  schema-­‐less  data   horizontal  scalability  is  no   management  requirements     longer  a  necessity     …  and  Beyond    In-­‐memory  data  grid/cache  products    PotenAal  primary  pla`orm  for  distributed  data  management       ©  2011  by  The  451  Group.  All  rights  reserved    
  10. 10. NoSQL,  NewSQL  and  Beyond   NoSQL     Big  tables  –  data  mapped  by  row   key,  column  key  and  Ame  stamp       Key-­‐value  stores  -­‐  store  keys  and   associated  values       Document  store  -­‐  stores  all  data  as   a  single  document       Graph  databases  -­‐  use  nodes,   properAes  and  edges  to  store  data   and  the  relaAonships  between   enAAes   ©  2011  by  The  451  Group.  All  rights  reserved    
  11. 11. NoSQL,  NewSQL  and  Beyond   NoSQL   NewSQL     Big  tables  –  data  mapped  by  row     MySQL  storage  engines  -­‐  scale-­‐ key,  column  key  and  Ame  stamp     up  and  scale-­‐out     Key-­‐value  stores  -­‐  store  keys  and     Transparent  sharding  -­‐  reduce  to     associated  values     manual  effort  required  to  scale     Document  store  -­‐  stores  all  data  as     Appliances  -­‐  take  advantage  of   a  single  document     improved  hardware     Graph  databases  -­‐  use  nodes,   performance,  solid  state  drives   properAes  and  edges  to  store  data     New  databases  -­‐  designed   and  the  relaAonships  between   specifically  for  scale-­‐out   enAAes   ©  2011  by  The  451  Group.  All  rights  reserved    
  12. 12. NoSQL,  NewSQL  and  Beyond   NoSQL   NewSQL     Big  tables  –  data  mapped  by  row     MySQL  storage  engines  -­‐  scale-­‐ key,  column  key  and  Ame  stamp     up  and  scale-­‐out     Key-­‐value  stores  -­‐  store  keys  and     Transparent  sharding  -­‐  reduce  to     associated  values     manual  effort  required  to  scale     Document  store  -­‐  stores  all  data  as     Appliances  -­‐  take  advantage  of   a  single  document     improved  hardware     Graph  databases  -­‐  use  nodes,   performance,  solid  state  drives   properAes  and  edges  to  store  data     New  databases  -­‐  designed   and  the  relaAonships  between   specifically  for  scale-­‐out   enAAes   Data  grid/cache     spectrum  of  data  management  capabiliAes,  from  non-­‐persistent  data  caching   to  persistent  caching,  replicaAon,  and  distributed  data  and  compute  grid   ©  2011  by  The  451  Group.  All  rights  reserved    
  13. 13. Photo credit:Foxtongue on Flickr ©  2011  by  The  451  Group.  All  rights  reserved    
  14. 14. SPRAIN   Scalability  -­‐  Hardware  economics     Example  project/service/vendor:   •  BigTable,  HBase,  Riak,  MongoDB,  Couchbase,  Hadoop,  Cassandra   •  Amazon  RDS,  Xeround,  SQL  Azure,  NuoDB   •  Data  grid/cache     Associated  use  case:   •   Large-­‐scale  distributed  data  storage   •   Analysis  of  conAnuously  updated  data   •   MulA-­‐tenant  PaaS  data  layer   ©  2011  by  The  451  Group.  All  rights  reserved    
  15. 15. SPRAIN   Performance  -­‐  MySQL  limitaAons     Example  project/service/vendor:   •  Hypertable,  Couchbase,  Riak,  Membrain,  MongoDB,  Redis   •  Data  grid/cache   •  VoltDB,  Clustrix     Associated  use  case:   •  Real  Ame  data  processing  of  mixed  read/write  workloads   •  Data  caching   •  Large-­‐scale  data  ingesAon   ©  2011  by  The  451  Group.  All  rights  reserved    
  16. 16. SPRAIN   Relaxed  consistency  -­‐  CAP  Theorem     Example  project/service/vendor:   •  Dynamo,  Voldemort,  Cassandra,  Riak   •  Amazon  SimpleDB     Associated  use  case:   •  MulA-­‐data  center  replicaAon     •  Service  availability   •  Non-­‐transacAonal  data  off-­‐load   ©  2011  by  The  451  Group.  All  rights  reserved    
  17. 17. SPRAIN   Agility  -­‐  polyglot  persistence,  schema-­‐less     Example  project/service/vendor:   •  MongoDB,  CouchDB,  Cassandra,  Riak   •  Google  App  Engine,  SimpleDB,  SQL  Azure     Associated  use  case:   •  Mobile/remote  device  synchronizaAon   •  Agile  development   •  Data  caching   ©  2011  by  The  451  Group.  All  rights  reserved    
  18. 18. SPRAIN   Intricacy  -­‐  big  data,  total  data     Example  project/service/vendor:   •  Neo4j,  GraphDB,  InfiniteGraph   •  Apache  Cassandra,  Hadoop,  Riak   •  VoltDB,  Clustrix     Associated  use  case:   •  Social  networking  applicaAons   •  Geo-­‐locaAonal  applicaAons   •  ConfiguraAon  management  database   ©  2011  by  The  451  Group.  All  rights  reserved    
  19. 19. SPRAIN   Necessity  -­‐  open  source     The  failure  of  exisAng  suppliers  to  address  emerging   requirements     Example  projects:   •  BigTable:  Google   •  Dynamo:  Amazon   •  Cassandra:  Facebook   •  HBase:  Powerset   •  Voldemort:  LinkedIn   •  Hypertable:  Zvents   •  Neo4j:  Windh  Technologies   ©  2011  by  The  451  Group.  All  rights  reserved    
  20. 20. Use  cases  –  database  types   ©  2011  by  The  451  Group.  All  rights  reserved    
  21. 21. Use  cases  –  new  applicaAons  Web  applicaAons  •   social  games  •   SaaS  •   e-­‐commerce  systems  •   clickstream  analysis  •   ad  and  offer  targeAng   ©  2011  by  The  451  Group.  All  rights  reserved    
  22. 22. Use  cases  –  new  requirements  Web  applicaAons  •   social  games  •   SaaS  •   e-­‐commerce  systems  •   clickstream  analysis  •   ad  and  offer  targeAng   ©  2011  by  The  451  Group.  All  rights  reserved    
  23. 23. Requirements   Data  analysis   •   read  heavy     •   batch  processing   •   analyAcs-­‐opAmized       •   data  locality  model   ©  2011  by  The  451  Group.  All  rights  reserved    
  24. 24. Use  cases  –  new  soluAons  Data  analysis  •   read  heavy    •   batch  processing  •   analyAcs-­‐opAmized      •   data  locality  model   ©  2011  by  The  451  Group.  All  rights  reserved    
  25. 25. Requirements   Data  analysis   •   batch  processing   •   aggregaAon  of  mixed   data  sources   •   structured  and  un/semi-­‐ structured  data   •   transform  and  load   ©  2011  by  The  451  Group.  All  rights  reserved    
  26. 26. Use  cases  Data  analysis  •   batch  processing  •   aggregaAon  of  mixed  data  sources  •   structured  and  un/semi-­‐structured  data  •   transform  and  load   ©  2011  by  The  451  Group.  All  rights  reserved    
  27. 27. Target  markets   Web  applicaAons   •   social  games   •   SaaS   •   e-­‐commerce  systems   •   clickstream  analysis   •   ad  and  offer  targeAng   ©  2011  by  The  451  Group.  All  rights  reserved    
  28. 28. Real NoSQLApplications in theEnterprise Today. APACHE CASSANDRA JONATHAN ELLIS 2 8
  29. 29. Today’s Database Challenge
  30. 30. Navigating the NoSQL waters  Distributed  Horizontally scalable  Eventually consistent  Non-relational   Column store   Document stores   Key-value   Graph   … and more
  31. 31. Cassandra: the best for “big data”  Elegant architecture  Operational flexibility  Industry-leading performance  Youshould be using Cassandra for applications requiring   high-performance, realtime queries   scalability past one machine   bulletproof reliability
  32. 32. Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011
  33. 33. Cassandra Highlights  Multi-master, multi-DC  Linearly scalable  Larger-than-memory datasets  High performance  Full durability  Integrated caching  Tuneable consistency
  34. 34. PerformanceA single four-core machine; one million inserts + one million updates
  35. 35. The Cassandra Difference Scalable Operational Cost Performance Ease EffectiveCassandra * ✔ ✔ ✔Oracle Exadata ✔ ✔ ✖MySQL ✖ ✔ ✔MongoDB ✖ ✔ ✔Sharding ✔ ✖ ✔HBase ✔ ✖ ✔ *And when it comes to Performance, we’re unmatched.
  36. 36. Why Businesses Choose CassandraVertical Big-Data Never Very Easy to Non- Flexible Multi- Cost Scale Down Fast Operate Structured Schema DC / Effective Data CloudMedia /Advertising ✔ ✔ ✔ ✔ ✔ ✔ ✔Telecomm ✔ ✔ ✔ ✔ ✔ ✔ ✔Financial ✔ ✔ ✔ ✔ ✔ ✔Social ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔IT (DaaS) ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔Healthcare ✔ ✔ ✔ ✔ ✔Online Retail ✔ ✔ ✔ ✔ ✔ ✔The most popular types of applications that use Cassandra are those that…• Are web/SaaS-based, and/or• Collect high volumes of “Data Exhaust” from machine-generated sources
  37. 37. “With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”  Adrian Cockcroft, Cloud Architect
  38. 38. Netflix’s problems  Could not build datacenters fast enough  Made decision to go to cloud (AWS)  Cassandra on AWS is a key infrastructure component of its globally distributed streaming product.  Applications include Netflix’s subscriber system, AB testing, and viewing history service (including positions at which members stopped watching a streaming program).
  39. 39. Netflix on Cassandra  Fast  Cheap  Scalable  Flexible  No SPOF
  40. 40. “Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.” Matt Conway, VP Engineering
  41. 41. Backupify’s problem  Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger  Needs:   Horizontal scaling   Ability to handle high write loads   Elasticity with no manual sharding
  42. 42. Backupify on Cassandra  Ease of scale enabled engineers to focus on building great applications  DataStax OpsCenter made it easy to monitor the health and perf of their cluster  Reliable, redundant and scalable low- balance data storage helped eliminate down-time  Ability to offer both backup and storage, but also analysis of data eventually
  43. 43. “You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.” Harry Robertson, Tech Lead
  44. 44. Ooyala’s problem  Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online  Needs:   Elasticity,to respond to spikes in data scale   Ability to respond to increasingly sophisticated analytic needs of customers
  45. 45. Ooyala on Cassandra  Classic “Big Data” problem did not require re-architecting  Application agility was enabled – developers spend time building cool apps, not figuring out how to scale  Enabled more powerful and granular analytics to their customers
  46. 46. “Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.”  Kyle Ambroff, Sr. Engineer
  47. 47. Formspring’ problem  Usersof Formspring engage with and learn more about each other by asking and responding to questions. With close to 4B responses in the system and 30M unique users, they needed:   To support explosive growth   To seamlessly syndicate user content   To avoid sharding   Application flexiblity
  48. 48. Formspring on Cassandra  No sharding needed – just add nodes to scale  Performance – the popular users with many followers saw no speed reduction.   No more memcached!  Flexibility of a schema-optional architecture is very developer friendly
  49. 49. Why DataStax?DataStax delivers database products and servicesbased on Apache Cassandra from experts who areat the forefront of todays data revolution. Database Software & Tools Support & Services   DataStax Enterprise   Production Support   DataStax Community   Consultative Help   DataStax OpsCenter   Professional Training   Drivers & Connectors   Online Documentation
  50. 50. DataStax Overview  Founded in April 2010   Commercial leader in Apache Cassandra™, the popular open-source “big data” database  Headquartered in San Francisco Bay area  100+ customers   35+ employees (split between San Fran and Austin)  Home to Apache Cassandra Chair & most committers  Secured $11M in Series B funding in Sep 2011
  51. 51. 100+ customers
  52. 52. DataStax Value  The simplest way to get started with Apache Cassandra: DataStax Community Edition  A smart, integrated platform that provides Analytics and Real-Time capabilities in the same database, without any resource contention: DataStax Enterprise  The backing of the Cassandra Experts
  53. 53. DataStax Enterprise1.  DataStax Enterprise Database Server2.  OpsCenter Enterprise Management solution3.  Expert production support & consultative services
  54. 54. Enterprise Database ServerEnterprise-class database built to handletoday’s big-data needs in a cost-effective, easy,and reliable way.  Leverages resources on-premise or in the cloud  Guarantees uptime with a master-less distributed architecture  Allows for fast application changes via flexible schemas 2 3  Handles structured, semi-structured, and Real-Time unstructured data Replication 1 4  Provides advanced security   Eliminates the need for separate analytics Analytics system 6 5
  55. 55. OpsCenter EnterpriseOpsCenter Enterprise supplies management,monitoring, and control over DataStax Enterprise   Visual, browser-based user   Proactive alerts that warn interface of impending issues   Administration tasks   Built-in external carried out in point-and- notification abilities click fashion   Allows for visual rebalance of data across a cluster when new nodes are added
  56. 56. Expert Production SupportDataStax Enterprise includes production supportand consultative services from the Cassandraexperts.   Support service level agreements that range from business hours to 24x7x365  Consultative support for assistance on architecture, design, and tuning  Certified quarterly service packs  Hot-fix support
  57. 57. DataStax Enterprise Compared Scalable Operational Cost Real-Time + Performance Ease Effective AnalyticsDataStax Enterprise ✔ ✔ ✔ ✔Oracle Exadata ✔ ✔ ✖ ✔MySQL ✖ ✔ ✔ ✖MongoDB ✖ ✔ ✔ ✖Sharding ✔ ✖ ✔ ✖HBase ✔ ✖ ✔ ✖Oracle NoSQL DB ✔ ✖ ? ✔
  58. 58. DataStax – Your One-Stop Shop  DataStax Enterprise and Community Editions   Professional Training, Expert Consulting  Documentation and Dev Center    Whitepapers, Case Studies, FAQ’s and more   you!