Big Data 2.0: Hadoop and Enterprises
Wuheng Luo
Ankur Gupta
06.2013
MetaScale is a subsidiary of
Sears Holdings Corporation
Topics
 Accomplishments of Big Data 1.0
 Characteristics of Big Data 2.0
 Future for Hadoop and enterprise big data
Big Data Timelines: First Wave
MapReduce,
2004
Hadoop, 2
005
Bigtable, 2
006
Dynamo, 2
007
Hadoop in
production,
2008
Timelines: Big Data 1.0
Landmark events in the first wave of big data movement:
 2004 – MapReduce paper published (Jeffre...
Big Data Timelines: Second Wave
Dremel,
Enterprise
Hadoop, 2010 2011
PB Scale data,
Enterprise Big Data
Services,
2012
Had...
Timelines: Big Data 2.0
 2010 – Dremel paper published (Sergey Melnik et al., Google)
 2010 – Enterprise Hadoop effort i...
Big Data 1.0: Accomplishments
 Ecosystem: in healthy growth
 Technologies: emerging and maturing
 The Community: taking...
Big Data 1.0 Pyramid
Enterprise
features
Hadoop
Technology Stack
Hadoop
Platform/Ecosystem
Big Data
Big Data 2.0 Is Here Now
Paradigm Shift vs Focus Shift
1.0: Paradigm Shift
 New computational algorithm: MapReduce
 New ...
Big Data 2.0 Is Here Now
Big Data Goes Mainstream
 Enterprises try to gain distinct competing capability from the use of ...
Big Data 2.0 Is Here Now
Big Data: 1.0 vs 2.0
Big Data 1.0 Big Data 2.0
Started as a “web” phenomenon Becomes an enterpris...
Big Data 2.0 Is Here Now
Defining Big Data 2.0
 A new phase of the big data movement is enterprise-centric, with
focus on...
Big Data 2.0 Is Here Now
Big Data 2.0 Pyramid
Enterprise Data
Features
Technology
stack
Platform/Ecosystem
Big Data 2.0 Is Here Now
Big Data 2.0 Characteristics: a Top 10 List
1. The lines between data and metadata become further...
Big Data 2.0 Is Here Now
Enterprise Big Data: Better, Easier, Faster
Better data (1)
 Data should ender change: schema in...
Big Data 2.0 Is Here Now
Next-Gen Hadoop: What to Expect
 If Big Data 2.0 focuses on enterprise data, Hadoop is
becoming ...
Big Data 2.0 Is Here Now
Hadoop as Enterprise Data Platform/Solution?
Criticism: Hadoop is not a data integration platform...
Big Data 2.0 Is Here Now
Hadoop as It Is Now: What Is Really Missing?
As probably the next OS for enterprise data, current...
Big Data 2.0 Is Here Now
Open Standard for Enterprise Big Data
Need a consortium on open standards for enterprise big data...
Thank You!
For further information
email:
visit:
contact@metascale.com
www.metascale.com
MetaScale is a subsidiary of
Sear...
Upcoming SlideShare
Loading in …5
×

Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era

1,956 views
1,768 views

Published on

A new era of big data is coming, an era we would call ?Big Data 2.0,? with characteristics including: 1. The lines between data and metadata, storage and processing logic become further blurred 2. Data integration pattern is shifting from ETL (extract, transform and load) to the 3 T?s in Hadoop (transfer, transform and translate) 3. Batch-oriented data pipeline is challenged, even surpassed by stream-based data flow 4. In-memory big data processing emerges as a new promising trend 5. Latency from raw data to business intelligence is dramatically shortened toward real-time or near real-time 6. Hadoop and other No-SQL solutions are further integrated into the same environment 7. Mapping and conversion between relational/row-based and column-based data becomes end-user friendly 8. More ad hoc, interactive, query-based analytics outgrow pure MapReduce 9. Hadoop evolves from data server-centric to client rich 10. Hadoop becomes the centerpiece of enterprise data systems, with roles of database, data warehouse, and data center storage, all in one, as integrated platform and solutions This vision of Big Data 2.0 is based on Sears? research, development and production experience, and best practice in enterprise data solutions, which indicate that Hadoop is ready for its prime time in this new era.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,956
On SlideShare
0
From Embeds
0
Number of Embeds
43
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era

  1. 1. Big Data 2.0: Hadoop and Enterprises Wuheng Luo Ankur Gupta 06.2013 MetaScale is a subsidiary of Sears Holdings Corporation
  2. 2. Topics  Accomplishments of Big Data 1.0  Characteristics of Big Data 2.0  Future for Hadoop and enterprise big data
  3. 3. Big Data Timelines: First Wave MapReduce, 2004 Hadoop, 2 005 Bigtable, 2 006 Dynamo, 2 007 Hadoop in production, 2008
  4. 4. Timelines: Big Data 1.0 Landmark events in the first wave of big data movement:  2004 – MapReduce paper published (Jeffrey Dean et al., Google)  2005 – Hadoop created (Doug Cutting et al., Yahoo!)  2006 – Bigtable paper published (Fay Chang et al., Google)  2007 – Dynamo paper published (G. DeCandia et al., Amazon)  2008 – First production Hadoop cluster launched (Yahoo!)
  5. 5. Big Data Timelines: Second Wave Dremel, Enterprise Hadoop, 2010 2011 PB Scale data, Enterprise Big Data Services, 2012 Hadoop Summit ‘03, Big Data 2.0 Year One, A.D. 2013
  6. 6. Timelines: Big Data 2.0  2010 – Dremel paper published (Sergey Melnik et al., Google)  2010 – Enterprise Hadoop effort initiated (Sears et al.)  2012 – Enterprise big data focused start-up MetaScale founded  2012 – Data growth rate reached PB scale daily (Facebook)  Present – Hadoop Summit 2013 (Year One A.D. for big data 2.0?)
  7. 7. Big Data 1.0: Accomplishments  Ecosystem: in healthy growth  Technologies: emerging and maturing  The Community: taking big data from hype into the real-world  Enterprises: testing the water in Hadoop
  8. 8. Big Data 1.0 Pyramid Enterprise features Hadoop Technology Stack Hadoop Platform/Ecosystem Big Data
  9. 9. Big Data 2.0 Is Here Now Paradigm Shift vs Focus Shift 1.0: Paradigm Shift  New computational algorithm: MapReduce  New data platform: Bigtable, Dynamo, Hadoop  New technology: Hadoop tech stack 2.0: Focus Shift  Still technology and innovation driven, but focus is shifted to enterprise and its data  Enterprise-centric, enterprise data problem focused
  10. 10. Big Data 2.0 Is Here Now Big Data Goes Mainstream  Enterprises try to gain distinct competing capability from the use of big data tools  Volume is no longer the big issue: focus shifting away from storage  Latency becomes more and more a big concern  Hadoop shows promising signs to become the platform of choice for enterprise data integration and analytics
  11. 11. Big Data 2.0 Is Here Now Big Data: 1.0 vs 2.0 Big Data 1.0 Big Data 2.0 Started as a “web” phenomenon Becomes an enterprise phenomenon Evangelical: Hadoop for big data as gospel Pragmatic: big data as enterprise data reality Hadoop as Apache projects, or open source distributions Hadoop as enterprise data platform and solutions Focus on big data technology itself Focus on solving real enterprise data problems using big data technologies Technology-centric innovations Enterprise data-centric innovations Volume is a big deal Latency is a big deal Big data technology is avant-garde Big Data solutions become mainstream
  12. 12. Big Data 2.0 Is Here Now Defining Big Data 2.0  A new phase of the big data movement is enterprise-centric, with focus on enterprise big data, its flow and process  Most energy and effort at this new stage will be invested in easier, better and faster approaches to converting enterprise source data to data products that support business decisions  Big Data 2.0: problem-oriented (instead of technology-oriented) enterprise big data management
  13. 13. Big Data 2.0 Is Here Now Big Data 2.0 Pyramid Enterprise Data Features Technology stack Platform/Ecosystem
  14. 14. Big Data 2.0 Is Here Now Big Data 2.0 Characteristics: a Top 10 List 1. The lines between data and metadata become further blurred 2. Data integration pattern is shifting away from ETL to 3Ts 3. Batch-oriented pipeline is challenged by stream-based flow 4. In-memory approach emerges as a new promising trend 5. Latency from raw data to intelligence is dramatically reduced 6. Hadoop and No-SQL solutions are further integrated 7. Adhoc, interactive querry outgrows hard-core MapReduce 8. Hadoop evolves from data server-centric to client rich 9. Hadoop becomes the centerpiece of enterprise data systems 10. Young Elephant learns old tricks – relational and SQL-like features
  15. 15. Big Data 2.0 Is Here Now Enterprise Big Data: Better, Easier, Faster Better data (1)  Data should ender change: schema in data (Avro, Dremel, Parquet), schema-less/open schema? Easy mapping between row and column-based data? Simpler processing (2, 6)  Data process should be simplified: Hadoop will further integrate storage and warehouse  ETL replaced by 3Ts? What about non- Hadoop solutions? Quicker Results (3, 4, 5, 7, 8, 9, 10)  Latency to turn raw data to business intelligence should be minimized: most energy and effort will focus on this one  real real-time analytics?
  16. 16. Big Data 2.0 Is Here Now Next-Gen Hadoop: What to Expect  If Big Data 2.0 focuses on enterprise data, Hadoop is becoming the centerpiece of the picture: an integrated platform for all enterprise data  If current Hadoop distribution is JDK …  Next-gen Hadoop should be J2EE
  17. 17. Big Data 2.0 Is Here Now Hadoop as Enterprise Data Platform/Solution? Criticism: Hadoop is not a data integration platform/solution  “Not only are many key data integration capabilities immature or missing from the stack, but many have not been addressed…”  “Through 2016, the Apache Hadoop technology stack will not offer the functionality necessary for the creation and operation of a well- governed data integration regime.” 2013 Gartner report, Merv Adrian and Ted Friedman How to make Hadoop a true platform/solution for all enterprise data, or, what is really missing now?
  18. 18. Big Data 2.0 Is Here Now Hadoop as It Is Now: What Is Really Missing? As probably the next OS for enterprise data, currently Hadoop has:  Many distributions, but no open standards  A possible, but not desirable consequence of current Hadoop: something similar to the Unix wars between BSD and System V  Competition is good and necessary, but we need cooperation on some common ground
  19. 19. Big Data 2.0 Is Here Now Open Standard for Enterprise Big Data Need a consortium on open standards for enterprise big data  A consortium and governance body of the Hadoop community, including vendors, service providers, enterprise big data practitioners, researchers and innovators to advocate and develop open standards for Hadoop-based enterprise data platforms and solutions
  20. 20. Thank You! For further information email: visit: contact@metascale.com www.metascale.com MetaScale is a subsidiary of Sears Holdings Corporation

×