Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Enterprise NoSQL Silver bullet or poison pill? @billynewport IBM distinguished engineer
NoSQL =  n ot  o nly  SQL <ul><li>Rumors of the SQL DBMS demise are greatly exaggerated. </li></ul><ul><li>SQL databases w...
Agenda <ul><li>Discuss the SQL mindset </li></ul><ul><li>Discuss the NoSQL mindset and contrast </li></ul>
SQL benefits <ul><li>Centralized schema managed by DBA. </li></ul><ul><ul><li>Relatively static schema </li></ul></ul><ul>...
SQL Benefits <ul><li>Relationships through joins </li></ul><ul><li>Easy indexing </li></ul><ul><li>No consistency issues, ...
SQL means domain centric <ul><li>Think about the data, find the nouns </li></ul><ul><li>Nouns become tables </li></ul><ul>...
Domain centric <ul><li>Use SQL to ask any question </li></ul><ul><li>Use indexes to speed up SQL queries. </li></ul><ul><l...
SQL Eco system <ul><li>Standards for programming (SQL/JDBC/ODBC/ESQL) </li></ul><ul><li>Easy to port applications between ...
Ecosystem <ul><li>Availability of reporting tools. </li></ul><ul><li>Availability of ETL (extract/transform/load) tools. <...
SQL implementations <ul><li>However, these choices lead to vertical single machine implementations </li></ul><ul><li>Or at...
But <ul><li>A machine with: </li></ul><ul><ul><li>dual-socket Intel multi-core </li></ul></ul><ul><ul><li>256GB  memory </...
Types of nosql <ul><li>Static Key value store (memcache) </li></ul><ul><li>DataGrid KV store (IBM WebSphere eXtreme Scale,...
<ul><li>BUT, as Spiderman said: With great power  comes GREAT RESPONSIBILITY! </li></ul><ul><li>NoSQL solutions typically ...
Relax constraints for flexibility <ul><li>Relaxing some of these choices leads to different possible store implementation ...
Pro/con <ul><ul><ul><li>This may allow linear scaling </li></ul></ul></ul><ul><ul><ul><li>This may allow fast relationship...
Question Centric <ul><li>NoSQL seems to start with the questions rather than the data. </li></ul><ul><li>Once we know the ...
Question Centric Ask a different question maybe?
Issues <ul><li>The new questions may require a different partitioning schema to be efficient. </li></ul><ul><li>Now it doe...
Multiple clusters <ul><li>You can try storing the data partitioned different ways in different NoSQL clusters. </li></ul><...
Multiple clusters <ul><li>Data consistency? </li></ul><ul><li>You better not have a lot of questions because this gets exp...
Don’t normalize <ul><li>You can’t easily do joins with nosql. </li></ul><ul><li>This means you want to denormalize and kee...
System of Record (SOR) <ul><li>SQL means DBMS is the System of Record </li></ul><ul><li>People are used to this. </li></ul...
NoSQL SOR <ul><li>Usually in a NOSQL world, multiple system of records are  NORMAL . </li></ul><ul><li>The application def...
Benefits of multiple SOR <ul><li>You can SCALE! </li></ul><ul><li>No concurrency bottlenecks </li></ul><ul><li>You can loc...
Drawbacks of multiple SOR <ul><li>Consistency is a problem. </li></ul><ul><li>Conflicts need to be reconciled. </li></ul><...
Operations <ul><li>SQL </li></ul><ul><ul><li>Insert </li></ul></ul><ul><ul><li>Select </li></ul></ul><ul><ul><li>Update </...
Search -> Retrieve <ul><li>For online queries, try to convert every search to a retrieve operation. </li></ul><ul><ul><li>...
Table scans <ul><li>Multi-machine table scans won’t work for anything online. </li></ul><ul><ul><li>Google doesn’t map/red...
Transaction integrity <ul><li>Used to be just ‘normal’ transactions </li></ul><ul><li>Not any more. Not all transactions a...
Schema <ul><li>Does the store understand the schema? </li></ul><ul><li>Is a row just a blob or does it have shape? </li></...
Skill level <ul><li>More flexibility and application control </li></ul><ul><li>Typically means higher skill level on the d...
Thank you Driving a race car under control is fun!
Being a passenger in an unguided missile is not! Go in with your eyes open!
Thank you @billynewport
Upcoming SlideShare
Loading in …5
×

Enterprise NoSQL: Silver Bullet or Poison Pill

2,698 views

Published on

This is a slightly revised version of the keynote I gave for the first time at StrangeLoop 2010. It tries to shows the pros and cons of NoSQL versus SQL and highlight whats easy and not so easy to do so people have a better understanding of typical NoSQL type products.

Published in: Technology

Enterprise NoSQL: Silver Bullet or Poison Pill

  1. 1. Enterprise NoSQL Silver bullet or poison pill? @billynewport IBM distinguished engineer
  2. 2. NoSQL = n ot o nly SQL <ul><li>Rumors of the SQL DBMS demise are greatly exaggerated. </li></ul><ul><li>SQL databases will be around for a long, long time. </li></ul><ul><li>However... </li></ul><ul><li>Nosql offers additional competing and/or complementary technologies for storing data in different organizations than traditional SQL albeit with different sets of pros and cons. </li></ul>
  3. 3. Agenda <ul><li>Discuss the SQL mindset </li></ul><ul><li>Discuss the NoSQL mindset and contrast </li></ul>
  4. 4. SQL benefits <ul><li>Centralized schema managed by DBA. </li></ul><ul><ul><li>Relatively static schema </li></ul></ul><ul><ul><li>Easy Ad hoc query support </li></ul></ul><ul><ul><li>Normalized Data </li></ul></ul>
  5. 5. SQL Benefits <ul><li>Relationships through joins </li></ul><ul><li>Easy indexing </li></ul><ul><li>No consistency issues, one copy/system of record </li></ul><ul><li>No need to partition data model. </li></ul>
  6. 6. SQL means domain centric <ul><li>Think about the data, find the nouns </li></ul><ul><li>Nouns become tables </li></ul><ul><li>identify attributes/keys </li></ul><ul><li>normalize the tables to Nth normal form… </li></ul>
  7. 7. Domain centric <ul><li>Use SQL to ask any question </li></ul><ul><li>Use indexes to speed up SQL queries. </li></ul><ul><li>Think Data Model first, worry about questions/access patterns later. </li></ul>
  8. 8. SQL Eco system <ul><li>Standards for programming (SQL/JDBC/ODBC/ESQL) </li></ul><ul><li>Easy to port applications between different SQL databases using the right standard. </li></ul><ul><li>IDE support </li></ul>
  9. 9. Ecosystem <ul><li>Availability of reporting tools. </li></ul><ul><li>Availability of ETL (extract/transform/load) tools. </li></ul><ul><li>SQL centric brainwashing occurs from a young age in engineers. </li></ul>
  10. 10. SQL implementations <ul><li>However, these choices lead to vertical single machine implementations </li></ul><ul><li>Or at best, shared everything, limited scale out implementations on exotic (expensive) hardware. </li></ul>
  11. 11. But <ul><li>A machine with: </li></ul><ul><ul><li>dual-socket Intel multi-core </li></ul></ul><ul><ul><li>256GB memory </li></ul></ul><ul><ul><li>SSD storage </li></ul></ul><ul><li>can likely run >90% of all the SQL databases out there really FAST. </li></ul>
  12. 12. Types of nosql <ul><li>Static Key value store (memcache) </li></ul><ul><li>DataGrid KV store (IBM WebSphere eXtreme Scale, Oracle Coherence, Gigaspaces) </li></ul><ul><li>Row oriented Sparse column store (Cassandra, HBase, ...) </li></ul><ul><li>Remote shared memory (Terracotta, IBM Cluster Accelerator, IBM 390 Coupling Facility) </li></ul><ul><li>Key document store (MongoDB) </li></ul><ul><li>Network store (Neo4J) </li></ul>
  13. 13. <ul><li>BUT, as Spiderman said: With great power comes GREAT RESPONSIBILITY! </li></ul><ul><li>NoSQL solutions typically relax some of the established constraints in return for implementation flexibility for certain solutions difficult to implement with SQL. </li></ul>NoSQL means choices
  14. 14. Relax constraints for flexibility <ul><li>Relaxing some of these choices leads to different possible store implementation strategies. </li></ul><ul><ul><li>Simplest is shared key/blob store </li></ul></ul><ul><li>Partitioning the data model leads to sharding and linear scale out but: </li></ul><ul><li>No cross shard query support </li></ul><ul><li>No cheap global indexes </li></ul><ul><li>No joins across shards </li></ul>
  15. 15. Pro/con <ul><ul><ul><li>This may allow linear scaling </li></ul></ul></ul><ul><ul><ul><li>This may allow fast relationship traversal </li></ul></ul></ul><ul><ul><ul><li>This may allow more flexible schemas </li></ul></ul></ul><ul><ul><ul><li>This may allow more consistency choices </li></ul></ul></ul><ul><ul><ul><li>But, you must make trade offs to get here </li></ul></ul></ul><ul><ul><ul><li>This is not obvious at all to most people! </li></ul></ul></ul>
  16. 16. Question Centric <ul><li>NoSQL seems to start with the questions rather than the data. </li></ul><ul><li>Once we know the questions then we can layout the data using some partitioned model. </li></ul><ul><li>We can now scale it out and all is good </li></ul><ul><li>What could you do if scale wasn’t an issue? </li></ul>
  17. 17. Question Centric Ask a different question maybe?
  18. 18. Issues <ul><li>The new questions may require a different partitioning schema to be efficient. </li></ul><ul><li>Now it doesn’t scale at all. </li></ul><ul><li>Repartitioning is extremely hard. </li></ul><ul><li>Offline questions can be solved with map/reduce or similar batch approaches with maybe a copy of the data. </li></ul>
  19. 19. Multiple clusters <ul><li>You can try storing the data partitioned different ways in different NoSQL clusters. </li></ul><ul><li>Pick the cluster you want depending on the question. </li></ul>
  20. 20. Multiple clusters <ul><li>Data consistency? </li></ul><ul><li>You better not have a lot of questions because this gets expensive fast. </li></ul><ul><li>Lots of online different questions don’t suit sharded NoSQL. </li></ul>
  21. 21. Don’t normalize <ul><li>You can’t easily do joins with nosql. </li></ul><ul><li>This means you want to denormalize and keep the needed data in the rows even if this means duplicating it. </li></ul><ul><ul><li>Remember, storage/DASD is super cheap in a scale out model. </li></ul></ul><ul><li>Consistency? </li></ul>
  22. 22. System of Record (SOR) <ul><li>SQL means DBMS is the System of Record </li></ul><ul><li>People are used to this. </li></ul><ul><li>It’s the first problem implementing any kind of cache on top of a DBMS. </li></ul><ul><ul><li>How do I keep the cache in sync with the database? </li></ul></ul>
  23. 23. NoSQL SOR <ul><li>Usually in a NOSQL world, multiple system of records are NORMAL . </li></ul><ul><li>The application defines consistency rules and just gets on with it. </li></ul><ul><li>Inconsistency is handled with a business process of some kind. </li></ul><ul><li>This is a big mind shift for normal SQL programmers… </li></ul>
  24. 24. Benefits of multiple SOR <ul><li>You can SCALE! </li></ul><ul><li>No concurrency bottlenecks </li></ul><ul><li>You can locate data sets around the planet and use the closest one. </li></ul><ul><li>More highly available as there are multiple copies and replicas are typically multi-master. </li></ul>
  25. 25. Drawbacks of multiple SOR <ul><li>Consistency is a problem. </li></ul><ul><li>Conflicts need to be reconciled. </li></ul><ul><li>Most products only have rudimentary support for this: </li></ul><ul><ul><li>Imagine bank balances using last write wins… </li></ul></ul><ul><ul><li>But, even with bank balances, inconsistencies can be handled correctly. </li></ul></ul>
  26. 26. Operations <ul><li>SQL </li></ul><ul><ul><li>Insert </li></ul></ul><ul><ul><li>Select </li></ul></ul><ul><ul><li>Update </li></ul></ul><ul><ul><li>Delete </li></ul></ul><ul><li>NoSQL </li></ul><ul><ul><li>Put </li></ul></ul><ul><ul><li>Retrieve by key </li></ul></ul><ul><ul><li>Delete </li></ul></ul><ul><ul><li>Complex Search typically means map/reduce… </li></ul></ul>
  27. 27. Search -> Retrieve <ul><li>For online queries, try to convert every search to a retrieve operation. </li></ul><ul><ul><li>Cache query results </li></ul></ul><ul><ul><li>Precalculate every possible query </li></ul></ul><ul><ul><li>Maintain these query caches </li></ul></ul><ul><ul><li>In other words, use some kind of global index for simple search but maintaining it may be expensive. </li></ul></ul><ul><li>Joins/Group By/Limit and so on are more difficult </li></ul>
  28. 28. Table scans <ul><li>Multi-machine table scans won’t work for anything online. </li></ul><ul><ul><li>Google doesn’t map/reduce for every google search! </li></ul></ul><ul><li>Offline complex queries can be done using Map/Reduce </li></ul><ul><li>You need to write code for most complex searches! </li></ul>
  29. 29. Transaction integrity <ul><li>Used to be just ‘normal’ transactions </li></ul><ul><li>Not any more. Not all transactions are equal. </li></ul><ul><li>Synchronous versus write behind. </li></ul><ul><li>Chained or asynchronous versus 2pc </li></ul>
  30. 30. Schema <ul><li>Does the store understand the schema? </li></ul><ul><li>Is a row just a blob or does it have shape? </li></ul><ul><li>Is the schema an application only idea? </li></ul><ul><li>DBAs or app developers own the schema? </li></ul><ul><li>Can application developers be trusted? </li></ul>
  31. 31. Skill level <ul><li>More flexibility and application control </li></ul><ul><li>Typically means higher skill level on the development side </li></ul><ul><li>Single app company means highly skilled team. </li></ul><ul><li>Multiapp company means less highly skilled teams. </li></ul><ul><li>Law of big numbers at work. The fewer developers, usually more chance of high skill level. </li></ul>
  32. 32. Thank you Driving a race car under control is fun!
  33. 33. Being a passenger in an unguided missile is not! Go in with your eyes open!
  34. 34. Thank you @billynewport

×