Choosing which big data, nosql or database technology to use

5,130 views

Published on

Basic overview of how to evaluate and match workloads to the various database technologies available, from NoSQL to relational. Workloads have different characteristics. If you don’t understand them you can end up implementing the wrong solution for the problem you have.

The video from this presentation is available at https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=4953842&rKey=d03b10ecd9163770

Published in: Technology, Business
1 Comment
7 Likes
Statistics
Notes
  • hola licen
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
5,130
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
88
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Choosing which big data, nosql or database technology to use

  1. One Size Doesn’t Fit AllChoosing which big data,NoSQL or databasetechnology to useMarch 14, 2012Mark R. Madsenhttp://ThirdNature.net
  2. The problem of “big” is three problems of volume Computations! Number Amount of users! of data!
  3. Big data? Unstructured data isn’t  really unstructured. The problem is that this  data is unmodeled. The real challenge is  complexity.
  4. The holy grail of databases under current market hypeA key problem is that we’re talking mostly about computation over data when we talk about “big data” and analytics, a potential mismatch for both relational and nosql.
  5. Solving the Problem Depends on the Diagnosis
  6. You must understand your workload ‐ throughput and response time requirements aren’t enough. ▪ 100 simple queries accessing  month‐to‐date data ▪ 90 simple queries accessing  month‐to‐date data plus 10  complex queries using two  years of history ▪ Hazard calculation for the  entire customer master ▪ Performance problems are  rarely due to a single factor. 
  7. Workload: One big query or many small queries?Retrieval: small return set or large?Selectivity: large volume of data scanned or small?
  8. Important workload parameters to know• Read‐intensive  vs. write‐intensive
  9. Important workload parameters to know• Read‐intensive  vs. write‐intensive• Mutable vs. immutable data
  10. Important workload parameters to know• Read‐intensive  vs. write‐intensive• Mutable vs. immutable data• Immediate vs. eventual consistency
  11. Important workload parameters to know• Read‐intensive  vs. write‐intensive• Mutable vs. immutable data• Immediate vs. eventual consistency• Short vs. long access latency
  12. Important workload parameters to know• Read‐intensive  vs. write‐intensive• Mutable vs. immutable data• Immediate vs. eventual consistency• Short vs. long access latency• Predictable vs. unpredictable data access patterns
  13. Types of workloadsWrite‐biased:  Read‐biased: ▪ OLTP ▪ Query ▪ OLTP, batch ▪ Query, simple retrieval ▪ OLTP, lite ▪ Query, complex ▪ Object persistence ▪ Query‐hierarchical /  ▪ Data ingest, batch object / network ▪ Data ingest, real‐time ▪ Analytic Mixed? Inline analytic execution, operational BI
  14. Matching to parameters, at assumption of data scaleWorkload  Write‐ Read‐ Updateable Eventual  Un‐ Computeparameters biased biased data consistency  predictable intensive ok query pathStandard RDBMSParallelRDBMSNoSQL (kv,dht, obj)Hadoop*Streaming database You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb.
  15. Matching to parameters, at assumption of data scaleWorkload  Complex  Selective  Low latency  High  High ingest parameters queries queries queries concurrency rateStandard RDBMSParallel RDBMSNoSQL (kv, dht, obj)HadoopStreaming database You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters.
  16. Always build a proof of concept!
  17. Image AttributionsThanks to the people who supplied the images used in this presentation:Holy Grail – © Monty Python Ltd.Cupcakes – <lost attribution on Flickr>rock‐fall‐roadblock.jpg ‐ http://www.flickr.com/photos/wsdot/4679360979/roadblock‐sheep.jpg ‐ http://www.flickr.com/photos/brizo_the_scot/4013939756/ Slide 17
  18. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
  19. About Third NatureThird Nature is a research and consulting firm focused on new andemerging technology and practices in analytics, business intelligence, andperformance management. If your question is related to data, analytics,information strategy and technology infrastructure then you‘re at the rightplace.Our goal is to help companies take advantage of information-drivenmanagement practices and applications. We offer education, consultingand research services to support business and IT organizations as well astechnology vendors.We fill the gap between what the industry analyst firms cover and what ITneeds. We specialize in product and technology analysis, so we look atemerging technologies and markets, evaluating technology and hw it isapplied rather than vendor market positions.

×