Rob anderson

1,398 views
1,327 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,398
On SlideShare
0
From Embeds
0
Number of Embeds
174
Actions
Shares
0
Downloads
45
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Rob anderson

  1. 1. BIG DATA IS CHANGING THE WORLD© Copyright 2010 EMC Corporation. All rights reserved. 1
  2. 2. IN THIS DECADE THE DIGITAL UNIVERSE WILL GROW 44X FROM 0.9 ZETTABYTES TO 35.2 ZETTABYTES Source : 2010 IDC Digital Universe Study© Copyright 2010 EMC Corporation. All rights reserved. 2
  3. 3. 90% OF THE DIGITAL UNIVERSE IS UNSTRUCTURED Source: 2011 IDC Digital Universe Study© Copyright 2010 EMC Corporation. All rights reserved. 3
  4. 4. Big Data Has Arrived Electronic Payments Video Rendering Video Mobile Sensors Social Media Surveillance Medical Imaging Gene Sequencing Geophysical Smart Grids Exploration© Copyright 2010 EMC Corporation. All rights reserved. 4
  5. 5. Deliver Better Healthcare With Big Data Billion Dollar Specialty Care Service Provider Legacy System & New System & International Traditional Data Results Big Data Quality Of Patient Care Treatment Pathways On Treatment All The Data Pathways On Individual Summary Data Patient History Social & Economic Factors© Copyright 2010 EMC Corporation. All rights reserved. 5
  6. 6. Increase Profit Margins With Big Data Retail Banking Firm Aligns Offers To Customers Legacy System & New System & Traditional Data Big Data Profit-Based Customer Profit Recommendations Identify Agent “At-Risk” “Best Guess” Customers User Based Recommendations© Copyright 2010 EMC Corporation. All rights reserved. 6
  7. 7. Classifying and segmenting Big Data • Rich content stores—original intellectual property or value-added – Media, VOD, content creation, special effects, satellite imagery, GIS data • Generated from workflow—must be managed/processed quickly & cheaply – Manufacturing, simulation, electronic design • Develop new intellectual property based on big data – Pharmaceutical companies doing customised drug development • Companies, public sector, utilities mining data for business advantage • Some mine consumer data—higher-volume and potentially higher-value© Copyright 2010 EMC Corporation. All rights reserved. 7
  8. 8. Big Data is File & Unstructured Data 90 80 70 60 EXABYTES 50 40 30 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC© Copyright 2010 EMC Corporation. All rights reserved. 8
  9. 9. Why is Big Data appearing now? Source: IDC© Copyright 2010 EMC Corporation. All rights reserved. 9
  10. 10. Gartner’s 3 V’s of Big Data© Copyright 2010 EMC Corporation. All rights reserved. 10
  11. 11. “The Internet of Things” • Massive explosion of smart devices, all sending, receiving, storing data – handhelds, tablets, cameras – Human-oriented devices • Non-human-oriented devices – sensors, embedded CPUs • Social networking messages & data grow exponentially – Twitter feeds, Facebook updates, LinkedIn messages • Increasingly, business is conducted digitally – or digitized • Big Data is global – any source to any target© Copyright 2010 EMC Corporation. All rights reserved. 11
  12. 12. Source: GoGlobe© Copyright 2010 EMC Corporation. All rights reserved. 12
  13. 13. Companies want to store big data—Why? • Google – Originally thought of as “search engine” – Now: Storing the Internet, storing every search query • Facebook, Twitter – Just social media? – Storing every message you send, monitoring every market trend • Amazon – your every purchase, forever • Carriers – Storing location-based data on everyone© Copyright 2010 EMC Corporation. All rights reserved. 13
  14. 14. Social Networking Analysis Courtesy of NSF Workshop on Social Modeling© Copyright 2010 EMC Corporation. All rights reserved. 14
  15. 15. The race is on • Big Data leads to the Optimised Organisation • Takes a long time to build a functioning data warehouse, analytics tools, connect to business • Many companies have a head start • Every CIO needs to consider Big Data in their strategy to stay ahead – How to manage, how to leverage© Copyright 2010 EMC Corporation. All rights reserved. 15
  16. 16. A little retailer I once knew • Why can Amazon beat everyone on price? • Purchase information used to adjust supply chain • Shipping and logistics adjusted according to conditions on the ground and supply chain • Other customers’ information used to provide recommendations, improve experience • Not just Amazon: Tesco, Carrefour, Metro, etc all taking advantage© Copyright 2010 EMC Corporation. All rights reserved. 16
  17. 17. How do we make decisions? • Good data is hard to get—so often on no data at all • Often on information from peers, colleagues, reports, or because it’s always been done that way • Many companies fail because they fail to detect shifts in consumer demand • Internet has made customers more segmented, and causes customer choice to change faster© Copyright 2010 EMC Corporation. All rights reserved. 17
  18. 18. Moving to a Data-Driven Model • Managing with the facts • Making a science out of data! • Experimental model—different than BI • Moving from “gut feel” to rational, scentific decisions© Copyright 2010 EMC Corporation. All rights reserved. 18
  19. 19. Big-Data-based Decisions • Unlock value by making information transparent and useable at higher frequency • More accurate information (e.g. inventories, trends) • Tailor products more precisely • Sophisticated analytics makes for better decisions • Better products (via web feedback, sensors, etc) Source: McKinsey© Copyright 2010 EMC Corporation. All rights reserved. 19
  20. 20. What holds back big data? • Not ICT—compute & storage getting bigger, cheaper, easier • Not the quantity of data (see slide 1) • Not the value—large-scale Big Data projects generally have great ROI • Real problems are organisational change and talent acquisition© Copyright 2010 EMC Corporation. All rights reserved. 20
  21. 21. © Copyright 2010 EMC Corporation. All rights reserved. 21
  22. 22. How are people doing it? • Enterprises ingesting > 1PB data per day within 5 yrs • Big data is often largely unstructured • Hadoop is an application written to analyze big data – open source, Java-based • Big data can mean billions to trillions of files – Each file can be gigabytes to terabytes in size • Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are examples of big data techniques • This means petabytes to exabytes of data© Copyright 2010 EMC Corporation. All rights reserved. 22
  23. 23. How do you manage and design for Big Data? • Scale and parallelism are the keys – Big data is far too big to process sequentially – Too much coming in too quickly – Example: Banks seeking to process market data more quickly, reducing decision making time from days to minutes • Answer: Scale-out storage and scale-out processing© Copyright 2010 EMC Corporation. All rights reserved. 23
  24. 24. Cramming big data onto traditional models Server Scalability Network Performance Management Availability Cost Storage© Copyright 2010 EMC Corporation. All rights reserved. 24
  25. 25. A different idea – scale-out Server Scalability Network Performance Management Availability Cost Storage© Copyright 2010 EMC Corporation. All rights reserved. 25
  26. 26. Enterprise Hadoop: Greenplum & Isilon • Easier and more reliable – Packaged Hadoop distribution with Isilon storage • Purpose-built Hadoop infrastructure – Faster, less risk • Sharing expertise to address the talent gap – Architecture, data science, and roadmap services • Proven at scale with worldwide support – 24x7 one call Hadoop support from EMC – Key component of Greenplum UAP – Unstructured data processing© Copyright 2010 EMC Corporation. All rights reserved. 26
  27. 27. Increasing Demand for Advanced Analytics • Complex – Deep, rich analysis of big data sets – Ad hoc, interactive analysis, not structured reports • Timely – On-going, frequent analysis (e.g. daily, weekly) – Insights delivered in minutes/seconds • Actionable – Forward looking, predictive insight – Create new business value© Copyright 2010 EMC Corporation. All rights reserved. 27
  28. 28. EMC Greenplum: Purpose-built for Big Data • EMC Greenplum is a shared nothing, massively parallel processing (MPP) data warehouse system • Core principle of data computing is to move the processing dramatically closer to the data and to the people Fast Data Loading Extreme Performance Unified & Elastic Scalability Data Access© Copyright 2010 EMC Corporation. All rights reserved. 28
  29. 29. MPP Shared-Nothing Architecture Greenplum’s Massively MapReduce Parallel Processing (MPP) Database has extreme scalability on general purpose Master systems Servers ... ... Query planning Automatic parallelization and dispatch – Load and query like any Network database Interconnect Scan and process in parallel Segment ... – Extremely scalable and I/O Servers ... optimized Storage and query ... ... ... ... ... ... ... ... ... ... Linear scalability by adding processing nodes External – Each adds storage, query Sources performance and loading MPP loading, streaming, etc. performance© Copyright 2011 EMC Corporation. All rights reserved. EMC Confidential – NDA Required 29
  30. 30. EMC Hadoop. Open Source. Fully Supported By EMC.© Copyright 2010 EMC Corporation. All rights reserved. 30
  31. 31. The EMC Big Data “Stack”4 Collaborative Act Documentum xCP ?3 Real Time Analyze Greenplum, Hadoop2 Structured & Unstructured Store1 Petabyte Scale Isilon and Atmos© Copyright 2010 EMC Corporation. All rights reserved. 31
  32. 32. THANK YOU HAVE A GREAT CONFERENCE!© Copyright 2010 EMC Corporation. All rights reserved. 32

×