Big Data          in 10What’s real and what’s fluff   Abhishek Pamecha        Mar-2013
What is Big Data• It is all about data   – But not about “how much”   – But about correlations and increased reach
BigData ArchitectureIt influences or changes your   • Data source choices   • Data storing choices   • Data analyzing/mini...
Caution!BigData is not a “substitute” for existing warehousing practices.It complements existing practices.
Architectures – Data sources• Traditional DW           • BigData adds   – Production DB            – Log files   – Diction...
Architectures – Data Storage• Traditional DW               • BigData adds  – Production DB                – Distributed fi...
Architectures – Analytic approaches•   Traditional DW                                  •   BigData adds     – Production D...
Big Data Architectures                                 Pros and Cons•   Pros     –     Incorporate low value and social da...
Challenges•   Architectural     –   “Big” data management     –   Data consistency     –   Read heavy or write heavy     –...
Thank you!
Upcoming SlideShare
Loading in …5
×

Bigdata

980 views

Published on

In 10 slides explains bigData. It separates the hype from reality about BigData. Explains what it is and what was already from before. No big numbers, no big claims : just plain simple truth.

The "red pill"

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
980
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
70
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bigdata

  1. 1. Big Data in 10What’s real and what’s fluff Abhishek Pamecha Mar-2013
  2. 2. What is Big Data• It is all about data – But not about “how much” – But about correlations and increased reach
  3. 3. BigData ArchitectureIt influences or changes your • Data source choices • Data storing choices • Data analyzing/mining approachesIt helps • Address highly focused use cases • Correlate more data sources • address scale and fault tolerance issues
  4. 4. Caution!BigData is not a “substitute” for existing warehousing practices.It complements existing practices.
  5. 5. Architectures – Data sources• Traditional DW • BigData adds – Production DB – Log files – Dictionaries – Social graphs – ETL/ELT pipelines – Streaming data – External Data marts
  6. 6. Architectures – Data Storage• Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Resolved references – Distributed hash maps – Columnar representations – ROLAP or MOLAP databases • Star schema – Graph data bases • Materialized views • Virtual data marts – Document collections • Partitioned tables – Still relational – Other NoSQL variants
  7. 7. Architectures – Analytic approaches• Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Map reduce frameworks and chaining • Resolved references – Pre-generate results – Distributed hash maps • Single key predominant – ROLAP databases • Star schema – Multidimensional queries – Columnar representations • Materialized views • Extracts select columns per row – adhoc explorations on subsets • Still relational – Graph data bases • Virtual data marts • Navigate links – adhoc explorations on subsets • Partitioned tables – Document collections • Simplified schemas – Other NoSQL approaches • Stream pattern matching and pipelining
  8. 8. Big Data Architectures Pros and Cons• Pros – Incorporate low value and social data in analysis – Increase analysis reach to non-structured data – Correlate across data sources on the same platform – Very strong in their sweet spots. – Efficiency in terms of • data movement volume, • scale • fault tolerance and • responsiveness.• Cons – Not relational. Gives up on some of the relational advantages. • Joins • Aggregations etc. – Little standards – Non portable solutions – Less support with end-user tools and applications [ though growing ] – Not a replacement to DW but just an extension to it. – Incompatible with different classes of use-cases. Have sweet spots. – Heterogeneous setup in Development and Operations.
  9. 9. Challenges• Architectural – “Big” data management – Data consistency – Read heavy or write heavy – Scaling – Distributed deployment• Functional – data quality – Problem set choice• Organizational – Data backed decisions – Going overboard – SLAs and operations management – Data Privacy
  10. 10. Thank you!

×