DataStax C*ollege Credit: What and Why NoSQL?

  • 1,586 views
Uploaded on

In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will …

In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,586
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
45
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.infoworld.com/d/data-management/the-time-nosql-standards-now-194998

Transcript

  • 1. Aaron MortonRobin Schumacher 1
  • 2. • 40 minute webinar• 15 minute Q+A • #CassandraQA • WebEx Q&A window• Slides and recording will be available• Next webcast: • Time for a new relationship?(Information Week) • September 26th 2
  • 3. Aaron Morton (@aaronmorton)DataStax MVP for Apache CassandraAaron Morton is a Freelance Developer based in New Zealand, and aCommitter on the Apache Cassandra project. In 2010 he gave up the RDBMSworld for the scale and reliability of Cassandra. He now spends his timeadvancing the Cassandra project and helping others get the best out of it.www.thelastpickle.com 3
  • 4. Robin SchumacherVP of Products @ DataStaxRobin Schumacher has spent the last 20 years working with databases and big data.Before DataStax he was at EnterpriseDB, where he built and led a market-drivenproduct management group. Previously, Robin started and led the productmanagement team at MySQL for three years before they were bought by Sun, andthen by Oracle. He also started and led the product management team atEmbarcadero Technologies.Robin is the author of three database performance books and frequent speaker atindustry events. Robin holds BS, MA, and Ph.D. degrees from various universities. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. First ANSI1986 standard.1989 FOREIGN KEY New types, JOIN,1992 DDL, Transaction Isolation Levels1999 Triggers 9
  • 10. First public1996, v3.19 release MyISAM engine,1999, v3.23 no Transactions InnoDB, ACID2001, v4.X Transactions, FOREIGN KEY 10
  • 11. PRIMARY KEY,1995, v6.0 FOREIGN KEY1996, v6.5 JOIN NVARCHAR,1998, v7.0 replication Referential2000, v2000 Integrity actions 11
  • 12. Small limited1989, v1.0 release1997, v6.2 Triggers1998, v6.3 Sub selects MVCC1999, v6.5.3 Transactions FOREIGN KEY,2000, v7.0.3 JOIN 12
  • 13. • Adds application complexity• Adds operational complexity• Thundering Herds• “There are 2 hard problems in computer science:caching, naming, and off-by-1 errors” 13
  • 14. • Adds application complexity • Adds operational complexity• Schema defined in multiple databases• SPOF for shard•Hard to grow and keep balanced 14
  • 15. • Fail over may add application complexity• Unknown asynchronous delay in replication• Potentially wasting resources on Slave• Reliability of passive Slave is unknown• “We failed to fail over to the slave.” 15
  • 16. • Adds application complexity• Unknown asynchronous delay in replication• SPOF for writes 16
  • 17. • ALTER TABLE locks the table• Must be applied to many individual servers• “foo varchar(50) DEFAULT NULL” 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 2007 Tokyo Cabinet2009 Redis2009 Voldemort2009 Riak 22
  • 23. 2008 Apache Couch DB2009 MongoDB 23
  • 24. 2007 Neo4J2009 Infogrid2010 InfiniteGraph 24
  • 25. Apache Hbase (as 2007 part of Lucene) BigTable as part of2008 / 2011 Google App Engine 2009 Apache Cassandra 2012 Amazon DynamoDB 25
  • 26. • Cluster based• Replication built in• No schema or flexible schema• Expect node failure 26
  • 27. • Aaron Morton • @aaronmorton • www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License 27
  • 28. Why NoSQL..? 28
  • 29. “NoSQL is the stuff of the InternetAge.” - Andrew Oliver, InfoWorld 29
  • 30. What Characterizes the “Internet Age” with data?1. Big Data – Concerns… • Scaling data velocity, variety, volume2. Data in the Cloud – Promises… • Transparent elasticity • Scalability • Availability • Ease of use (data distribution, redundancy, etc.) • All these also needed on premise…3. Data “everywhere” – needing to support multiple data centers, geographies, etc. 30
  • 31. Why NoSQL?You have Big Data use cases.• Volume, variety, volume• Complexity of data distribution• Future proof apps where scaling is concerned“Big data technologies describe a newgeneration of technologies andarchitectures, designed to economicallyextract value from very large volumes ofa wide variety of data, by enabling high-velocity capture, discovery, and/oranalysis ” - IDC 31
  • 32. Why NoSQL?Cassandra – a massively scalable NoSQL database• Superior write performance for data velocity• Strong data type support for data variety• Linear scalability/scale out for data volume• Fast for both reads and writes “We‟ve seen a 700% performance improvement, while our database grew over 500% at the same time. Plus we‟ve saved 40% in operational costs.” - SourceNinja 32
  • 33. Why NoSQL? Cassandra and Performance“In terms of scalability, there is a clear winnerthroughout our experiments. Cassandraachieves the highest throughput for themaximum number of nodes in all experimentswith a linear increasing throughput.”Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable,et al., August 2012, p. 10. Benchmark paper presented at the Very Large Database Conference,2012. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf In the Cloud… In Web Apps… YCSB Benchmark Source: http://blog.cubrid.org/dev-platform/nosql-http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability- benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae8on.html 6b2-NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email 33
  • 34. Why NoSQL?You need continuous availability.• Different than high availability• For applications that can’t go down• May involve one or multiple locations 34
  • 35. Why NoSQL?Cassandra – a continuously available NoSQL DBMS• Built to overcome the fact that hardware failures can and do occur• No single point of failure• Out-of-the-box redundancy of function and data“For us, the primary motivating factors are continuousavailability and multi-data center support. We also likethe fact that we can trust Cassandra; when we need towrite data, we don‟t have to worry that it‟s going to getwritten and be there no matter what.”- RightScale 35
  • 36. Why NoSQL?You need true location independence.• Need to read AND write data anywhere• Data is eventually synchronized in all locations• Keep data local for fast access 36
  • 37. Why NoSQL?Cassandra – a location independent database• Replication is multi-data center, multi-directional capable• Handles multiple cloud geo-zones• Supports hybrid on-premise/cloud deployments• Tunable data consistency“I can create a Cassandra cluster in any regionof the world in 10 minutes. When marketingdecide we want to move into a certain part ofthe world, we‟re ready.”- Netflix 37
  • 38. Why NoSQL?You need real-time, transactional capabilities• For applications needing ACID, use RDBMS• For applications without ACID requirements, but with transactional needs, use NoSQL• The “C” is ACID does not apply to NoSQL; the “C” in the CAP theorem does“Ninety-five percent (95%) of database-drivensystems today don‟t need ACID transactions.”– Dan McCreary, The CIO‟s Guide to NoSQLWebinar 38
  • 39. Why NoSQL?Cassandra – real-time NoSQL transactions• Supports AID transactions: atomic, isolated, and durable• Provides tunable data consistency – per operation – to handle the “C” in the CAP theorem• No ACID “C” as there are no referential integrity/foreign key constraints“Cassandra stands at the front of the NoSQLpack when it comes to supporting real-time,Big Data applications.” – Wikibon 39
  • 40. Why NoSQL?You need a more flexible/agile data model.• Escape the rigidity of the relational data model• Able to easily store and access all data types• Few worries about performance of “wide” rows 40
  • 41. Why NoSQL?The Cassandra Data Model - Bigtable• A row-oriented, column structure• A column family is similar to an RDBMS table but is more flexible/dynamic• A row in a column family is indexed by its key. Other columns may be indexed as well“Cassandra‟s NoSQL data model allows us Keyspaceto insert and query data much morenaturally than what we had previously. The Column Familyanalysts who routinely use this data wereimpressed with the flexibility and speed at ID Name SSN DOBwhich the queries came back.” - NASA 41
  • 42. Why NoSQL?You need a better architecture.• Master/slave – inherent issues; write bottlenecks• Sharding – difficult to setup/maintain• Shared storage – has availability concerns 42
  • 43. Why NoSQL?Cassandra – a “masterless” architecture• Peer-to-peer design• No write bottlenecks• No manual sharding or shared storage issues• Less operational overhead“Cassandra was just a better design all around– more truly horizontally scalable and with lessmanagement overhead – and there‟s no singlepoint of failure. I looked at Cassandra‟sarchitecture and thought, „Yeah, that‟s how youdo it.‟” - Backupify 43
  • 44. Why NoSQL?Because you need…• The ability to handle big data use cases• Continuous availability vs. high availability• A location independent database• A real-time, transactional database• A more flexible/agile data model• A better architecture 44
  • 45. Key Cassandra Use Cases• Real-time, big data workloads• Time series data management• High-velocity device data consumption and analysis• Media streaming management (e.g., music, movies)• Social media (i.e., unstructured data) input and analysis• Online web retail (e.g., shopping carts, user transactions)• Real-time data analytics• Online gaming (e.g., real-time messaging)• Software as a Service (SaaS) applications that utilize web services• Online portals (e.g. healthcare provider/patient interactions)• Most write-intensive systems 45
  • 46. Why NoSQL? - The CIO‟s Guide to NoSQL, Dan McCreary 46
  • 47. • Cassandra.Apache.org• PlanetCassandra.org• Datastax.com 47