No sql

425 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
425
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • RDBMS assumes a well- defi ned structure in data. It assumes that the data is dense and is largely uniform. RDBMS builds on a prerequisite that the properties of the data can be defi ned up front and that its interrelationships are well established and systematically referenced. It also assumes that indexes can be consistently defi ned on data sets and that such indexes can be uniformly leveraged for faster querying. in the context of massive sparse data sets with loosely defi ned structures, RDBMS appears a forced fi t. With massive data sets the typical storage mechanisms and access methods also get stretched. Denormalizing tables, dropping constraints, and relaxing transactional guarantee can help an RDBMS scale, but after these modifi cations an RDBMS starts resembling a NoSQL product.
  • No sql

    1. 1. RDBMS ? Prateek Jain 12-Jul-2012
    2. 2. Solutions…
    3. 3. Common ArchitectureWeb server Web server Web server App server App server Cache server RDBMS CMS Data Feeds
    4. 4. SQL - Story till now… Stable environment. No more discussions on Data stores. Easy to train and employ people. SQL running effectively at core.
    5. 5. SQL - Story till now… For dealing with lists (as tables) it’s a great language,dynamic and relatively fast • Sure it has a few problems but give me a language that doesn’t
    6. 6. What Next…? We need to fast, scale and be part of web
    7. 7. ORM - OMG! The effort of trying to convert something inherently hierarchical into something relational Probably the biggest waste of programming time, lines of code and source of bugs and latency is ORM
    8. 8. Challenges Data grows exponentially. Data is unstructured. Data is huge and spread across 100’s/1000’s of nodes. SQL is useful - when things are flat
    9. 9. Lots of data In the banking world we have a lot of data Today 50-100,000 quotes a second isn’t unusual It gets more complex... • 10,000 portfolios, each with 1,000 buy/sell orders at specific prices • We now have 100,000 prices coming in every second and 10 million orders to watch
    10. 10. Time is critical Inthe world of trading only the first one gets the deal, there is no second place. While being first to have the order is what makes the money banks now have a “new” problem “RISK”
    11. 11. Lots of data, lots of calculations  There are two main flavors of distributed computing • Data • Computation  Often they are closely related but not always.  To achieve either we usually need lots of memory and CPUs  We don’t stack them or put them in clusters these days, we distribute them
    12. 12. Why not RDBMS? Not designed to scale out. Strongly ACID complaint. Slower running queries (specially in joins). Schema based. Not suited for changing data structure.
    13. 13. CAP TheoremC – consistency A – availability P – partition tolerance** You must make trade-offs and sacrifice at least one in favor of the other two.
    14. 14. NoSql Not Just Sql
    15. 15. Categories Document Based Graph Based Column Based Key/Value Based Data Structure Based
    16. 16. Example Products
    17. 17. Eventual Consistency
    18. 18. Eventual Consistency Given a sufficiently long period of time, over which no updates are sent, one can expect that all updates will, eventually, propagate through the system and all the replicas will be consistent. In the presence of continuing updates, an accepted update eventually either reaches a replica or the replica retires from service.
    19. 19. Scalability
    20. 20. Scalability Scalability is the ability of a system to increase throughput with addition of resources to address load increases. Scalability can be achieved by: – Provisioning a large and powerful resource to meet the additional demands. – It can be achieved by relying on a cluster of ordinary machines to work as a unit.
    21. 21. How to choose ? Scalability Transactionalintegrity and consistency Data modeling Query support Access and interface availability
    22. 22. Scalability column-family-centric NoSQL databases are a good choice if extreme scalability is a requirement. Not well suited for real-time transaction processing. (RDBMS is best) Eventually consistent NoSQL options, like Cassandra or Riak, may be workable.
    23. 23. Transactional Integrity and Consistency Batch-centric analytics on warehoused data is also not subject to transactional requirements. Data sets that are written once for e.g., web traffic log files, social networking status updates, advt. click-through imprints, road- traffic data, stock market tick data, game scores etc.
    24. 24. Transactional Integrity and Consistency If range operations are common and integrity of updates is required, an RDBMS is the best choice. If atomicity at an individual item level is sufficient, then column-family databases, document databases.
    25. 25. Data Modeling RDBMS offers a consistent way of modeling data. Relational algebra underlies the data model. In the NoSQL world there is no such standardized and well-defined data model.
    26. 26. Data Modeling Ifrelaxed schema is your primary reason for using NoSQL, then MongoDB is a great option for getting started with NoSQL. MongoDB is used by many web-centric businesses.
    27. 27. Querying Support An RDBMS thrives on SQL support, which makes accessing and querying data easy. Among document databases, MongoDB provides the best querying capabilities. For key/value pairs and in-memory stores, nothing is more feature-rich than Redis as far as querying capabilities go.
    28. 28. Querying Support Column-family stores like HBase have little to offer as far as rich querying capabilities go. Project called Hive makes it possible to query HBase using SQL-like syntax and semantics.
    29. 29. Access and Interface Availability MongoDB has the notion of drivers. CouchDB always has the RESTful HTTP interface available. Redis, Membase, Riak, HBase, Hypertable, Cassandra, and Voldemort have support for language bindings to connect from most mainstream languages.
    30. 30. Performance
    31. 31. 50/50 Read and Update Resultsshowthat under this test case Apache Cassandra outperforms the competition on both read and update latencies. HBase comes close but stays behind Cassandra.
    32. 32. 95/5 Read and Update The sorted ordered column-family stores perform best for contiguous range reads. HBase seems to deliver consistent performance for reads, irrespective of the number of operations per second. MySQL delivers the best performance for read-only cases.
    33. 33. Future? Coexistence
    34. 34. Future Getting ready for polyglot persistence. Understanding the database technologies suitable for immutable data sets. Choosing the right database to facilitate ease of application development.
    35. 35. Examples Linked In uses Hadoop for many large-scale analytics jobs like probabilistically predicting people you may know. Facebook (mysql + HBase, cassandra, ZooKeeper) Twitter (mysql + Cassandra + FlockDB)
    36. 36. Questions?
    37. 37. Feedback trainer.prateek@gmail.com

    ×