New trends in data

1,079 views

Published on

Key trends in data including NoSQL and NewSQL

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,079
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
68
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Why is data so in vogue these days?This resurgence is driven by an unprecedented explosion of data, as organizations capture a greater quantity and diversity of data than ever before. The challenge is how to both manage and extract more value from the data assets being accumulated. We’re seeing a surge of new tools and new techniques as traditional databases are being stretched beyond their capabilities.Explosion of dataNew techniques and technologies to manage dataDeriving new value from the data
  • Why is data so in vogue these days?This resurgence is driven by an unprecedented explosion of data, as organizations capture a greater quantity and diversity of data than ever before. The challenge is how to both manage and extract more value from the data assets being accumulated. We’re seeing a surge of new tools and new techniques as traditional databases are being stretched beyond their capabilities.Explosion of dataNew techniques and technologies to manage dataDeriving new value from the data
  • The current lay of the land. Before we start talking about NoSQL, NewSQL etc, where are we today?Benefits of virtualizing dataRelational databases have been used by enterprises for packaged apps like SAP, BI apps like for analyzing consumer sales, and custom business apps like a payroll system. The same databases will continue to cater to these application types even with the emergence of all the new data types.The traditional deployment of the databases has been on dedicated physical hardware, overprovisioned for each application’s peak usage. With virtualization, you can also deploy them on shared pool of virtual resources, and abstract from the applications using them.
  • How do I architect my data tier for highly variable application usage? My app has 10,000 users on a normal day but 10,000,000 on Mother’s Day.How do I distribute data efficiently to my compute clouds? I have applications and users in multiple places that need access to the same data in real-time.How do I process these large quantities of data in an efficient manner to allow for better real-time decision-making?I need to build a very high scale application with transactional consistency
  • Modern applications are accessed by users from a variety of compute systems; traditional desktop computers and more frequently tablets and mobile devices.
  • Why is data so in vogue these days?This resurgence is driven by an unprecedented explosion of data, as organizations capture a greater quantity and diversity of data than ever before. The challenge is how to both manage and extract more value from the data assets being accumulated. We’re seeing a surge of new tools and new techniques as traditional databases are being stretched beyond their capabilities.Explosion of dataNew techniques and technologies to manage dataDeriving new value from the data
  • RDBMS not designed for today's automated, data-heavy transactional environments. They've acquired decades worth of questionable new features– bloat.CAP theorem – pick 2Consistency (all nodes see the same data at the same time) Availability (node failures do not prevent survivors from continuing to operate) Partition Tolerance (the system continues to operate despite arbitrary message loss) tolerance to network partitions.Database developers all know the ACID acronym. It says that database transactions should be:Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other. Other operations cannot access data that has been modified during a transaction that has not yet completed.Durable: Completed transactions persist, even when servers restart etc. Ability to recover the committed transaction updates against any kind of system failure (transaction log)An alternative to ACID is BASE:Basic AvailabilitySoft-stateEventual consistency
  • NoSQL has emerged as a common name for databases that are key-value pairs supporting simple operations. These are the de facto databases for Web 2.0 applications, like Craigslist, Foursquare; due to the simplicity, flexibility, and high productivity these enable.There are no major commercial players in the space, which is dominated by a number of open source communities. Most common ones you will run into are MongoDB, Cassandra, and Couchbase. These databases differ by the type of data they can store, e.g., documents, graphs, programming objects.Primary use cases for NoSQL are high productivity web and mobile applications, which constitute most of the new development today.These applications typically require simple transactions on large data sets. The deployment of MongoDB at Craigslist is a very typical example usage. Craigslist has archived all 5 billion user posts, and has the ability to perform a search on the database. Craigslist uses MongoDB deployed on commodity hardware to store and retrieve the post documents.
  • ODS operational data store
  • Why is data so in vogue these days?This resurgence is driven by an unprecedented explosion of data, as organizations capture a greater quantity and diversity of data than ever before. The challenge is how to both manage and extract more value from the data assets being accumulated. We’re seeing a surge of new tools and new techniques as traditional databases are being stretched beyond their capabilities.Explosion of dataNew techniques and technologies to manage dataDeriving new value from the data
  • New trends in data

    1. 1. NoSQL, NewSQL, whose SQL?Key trends in data you need to know<br />
    2. 2. Agenda<br />What’s driving new data offerings<br />Key trends in harnessing data<br />Learn more this week<br />
    3. 3. Agenda<br />What’s driving new data offerings<br />Key trends in harnessing data<br />Learn more this week<br />
    4. 4. Traditional Relational Database Management System (RDBMS)<br />Primary Use cases<br /><ul><li>Packaged enterprise applications
    5. 5. Classic Business Intelligence (BI) applications
    6. 6. Custom business apps with OLTP database needs</li></ul>Billing<br />App<br />Expense<br />App<br />SAPApp<br />Expense<br />App<br />Billing<br />App<br />SAP App<br />VMware vSphere<br />
    7. 7. Data Challenges for the Cloud Era<br />Elastic scalability/Low-latency<br />Multi-Site / Multi-Cloud<br />Distributed Processing<br />WAN<br />
    8. 8. Modern Apps – Indeterminate usage<br />Spike in usage<br />
    9. 9. Agenda<br />What’s driving new data offerings<br />Key trends in harnessing data<br />Learn more this week<br />
    10. 10. Trend #1: One size no longer fits all<br />Traditional RDBMS not designed for modern distributed systems<br />How to scale-out, elastically?<br />This is important to achieve performance under shifting load<br />How to reduce disk latency?<br />How to provide a consistent view of data across geographies?<br />Complex to Scale Out<br />Costly to Scale Up<br />
    11. 11. Trend #2: Modern data questions, In-memory answers<br />In-memory systems<br />Pooling and sharing memory (and compute, disk resources)<br />Data lives in memory, asynchronous write to disk<br />Low latency -> memory faster than disk<br />
    12. 12. Elastic Scalability / Low Latency<br />Before<br />After<br />Web <br />Server<br />Web <br />Server<br />Web <br />Server<br />As load increases, virtualization allows stateless web and app tier to be rapidly scaled<br />Data in Memory Pool<br />Web <br />Server<br />Web <br />Server<br />Web <br />Server<br />As load increases, virtualization allows stateless web and app tier to be rapidly scaled<br />App <br />Server<br />App<br />Server<br />App <br />Server<br />But stateful Database tier must be over-provisioned in advanced – and sit idle<br />App <br />Server<br />App<br />Server<br />App <br />Server<br />Data in-memory<br /><ul><li>Reduces databases required
    13. 13. Allows for linear application scalability</li></li></ul><li>Trend #3: New ways to work with data<br />NoSQL<br />In-memory<br />Key/value pairs, simplicity, high productivity<br />Different offerings, different data models: document, graph, big table, column<br />NewSQL<br />In-memory<br />Scalability benefits of in-memory systems with standardized SQL<br />+<br />SQL<br />
    14. 14. VMware solutions<br />vFabric GemFire<br />Memory-oriented distributed data grid<br />Cloud scale with database-like reliability<br />Object interface<br />vFabric SQLFire (public beta)<br />In-memory SQL database<br />Leverage SQL knowledge <br />Horizontal scale, speed and high availability<br />Cloud Foundry<br />MongoDB, Redis<br />VMware vFabric GemFire® <br />
    15. 15. Multi-Site / Multi-Cloud<br />Before<br />After<br />System of record<br />Batch load to ODS<br />Real-time<br />GemFire Node<br />WAN<br />Nightly replication<br />WAN<br />GemFire Node<br />Object interface: GemFire<br />SQL interface: SQLFire<br />
    16. 16. Distributed processing<br />Memory-oriented database with elastic scalability, lightning-fast performance & HA <br />Client<br />Client<br />Client<br />WAN<br />GemFire Node<br />GemFire Node<br />GemFire Node<br />Co-locate compute with data<br />Object interface: GemFire<br />SQL interface: SQLFire<br />
    17. 17. Agenda<br />What’s driving new data offerings<br />Key trends in harnessing data<br />Learn more this week<br />
    18. 18. Learn more<br />Demo’s in the VMware booth <br />vFabric SQLFire, vFabric GemFire, Cloud Foundry<br />Hands-on Lab<br />“Optimizing Data Access for Your Cloud Infrastructure” (SQLFire)<br />Ongoing, HOL12, Twitter hashtag #HOL12<br />Expert One-on-One<br />Schedule 15 minutes with Jags Ramnarayan, Chief Architect <br />Tuesday 4pm, Experts-06, Twitter hashtag #Experts-06<br />Group Discussion<br />Hosted by Jags<br />Wednesday, 3:30 pm GD31, Twitter hashtag #GD31<br />VMware vFabric GemFire® <br />CPU Pool<br />
    19. 19. Learn more this week and beyond<br />Sessions / Panel<br />Managing High Performance Data with vFabric SQLFire<br />Tuesday, 1pm, CAP1942, Twitter hashtag: #CAP1942<br />A Customer Scenario for Next-Generation Data Management with vFabric<br />Wednesday, 4pm, CAP2471, Twitter hashtag #CAP2471<br />Building Resilient, High Performance, Distributed Applications That Are Data Intensive (GemFire)<br />Replay ~ 2 weeks from now on vmworld 2011 website, CAP1992, Twitter hashtag #CAP1992<br />Big Compute and Big (NoSQL) Data Panel<br />Replay ~ 2 weeks from now on vmworld 2011 website, CAP3362, Twitter hashtag #CAP3362<br />

    ×