More Related Content
Similar to Rob anderson (20)
Rob anderson
- 1. BIG
DATA
IS CHANGING THE
WORLD
© Copyright 2010 EMC Corporation. All rights reserved. 1
- 2. IN THIS DECADE THE DIGITAL UNIVERSE
WILL GROW 44X
FROM 0.9 ZETTABYTES TO 35.2 ZETTABYTES
Source : 2010 IDC Digital Universe Study
© Copyright 2010 EMC Corporation. All rights reserved. 2
- 3. 90% OF THE
DIGITAL UNIVERSE IS
UNSTRUCTURED
Source: 2011 IDC Digital Universe Study
© Copyright 2010 EMC Corporation. All rights reserved. 3
- 4. Big Data Has Arrived
Electronic
Payments Video Rendering
Video
Mobile Sensors Social Media Surveillance
Medical Imaging
Gene
Sequencing
Geophysical
Smart Grids
Exploration
© Copyright 2010 EMC Corporation. All rights reserved. 4
- 5. Deliver Better Healthcare With Big Data
Billion Dollar Specialty Care Service Provider
Legacy System & New System &
International
Traditional Data Results Big Data
Quality Of Patient Care
Treatment
Pathways On
Treatment All The Data
Pathways On Individual
Summary Data Patient History
Social &
Economic
Factors
© Copyright 2010 EMC Corporation. All rights reserved. 5
- 6. Increase Profit Margins With Big Data
Retail Banking Firm Aligns Offers To Customers
Legacy System & New System &
Traditional Data Big Data
Profit-Based
Customer Profit
Recommendations
Identify
Agent
“At-Risk”
“Best Guess”
Customers
User Based
Recommendations
© Copyright 2010 EMC Corporation. All rights reserved. 6
- 7. Classifying and segmenting Big Data
• Rich content stores—original intellectual property or value-added
– Media, VOD, content creation, special effects, satellite imagery, GIS data
• Generated from workflow—must be managed/processed quickly & cheaply
– Manufacturing, simulation, electronic design
• Develop new intellectual property based on big data
– Pharmaceutical companies doing customised drug development
• Companies, public sector, utilities mining data for business advantage
• Some mine consumer data—higher-volume and potentially higher-value
© Copyright 2010 EMC Corporation. All rights reserved. 7
- 8. Big Data is File & Unstructured Data
90
80
70
60
EXABYTES
50
40
30
20
10
0
2009 2010 2011 2012 2013 2014
File Based: 60.7% CAGR Block Based: 21.8% CAGR
By 2012, 80% of all storage capacity sold will be for file-based data
Source: IDC
© Copyright 2010 EMC Corporation. All rights reserved. 8
- 9. Why is Big Data appearing now?
Source: IDC
© Copyright 2010 EMC Corporation. All rights reserved. 9
- 10. Gartner’s 3 V’s of Big Data
© Copyright 2010 EMC Corporation. All rights reserved. 10
- 11. “The Internet of Things”
• Massive explosion of smart devices, all sending, receiving, storing data
– handhelds, tablets, cameras
– Human-oriented devices
• Non-human-oriented devices
– sensors, embedded CPUs
• Social networking messages & data grow exponentially
– Twitter feeds, Facebook updates, LinkedIn messages
• Increasingly, business is conducted digitally – or digitized
• Big Data is global – any source to any target
© Copyright 2010 EMC Corporation. All rights reserved. 11
- 12. Source:
GoGlobe
© Copyright 2010 EMC Corporation. All rights reserved. 12
- 13. Companies want to store big data—Why?
• Google – Originally thought of as “search engine”
– Now: Storing the Internet, storing every search query
• Facebook, Twitter – Just social media?
– Storing every message you send, monitoring every
market trend
• Amazon – your every purchase, forever
• Carriers – Storing location-based data on everyone
© Copyright 2010 EMC Corporation. All rights reserved. 13
- 14. Social Networking Analysis
Courtesy of NSF Workshop on Social Modeling
© Copyright 2010 EMC Corporation. All rights reserved. 14
- 15. The race is on
• Big Data leads to the Optimised Organisation
• Takes a long time to build a functioning data
warehouse, analytics tools, connect to business
• Many companies have a head start
• Every CIO needs to consider Big Data in their
strategy to stay ahead
– How to manage, how to leverage
© Copyright 2010 EMC Corporation. All rights reserved. 15
- 16. A little retailer I once knew
• Why can Amazon beat everyone on price?
• Purchase information used to adjust supply chain
• Shipping and logistics adjusted according to conditions on
the ground and supply chain
• Other customers’ information used to provide
recommendations, improve experience
• Not just Amazon: Tesco, Carrefour, Metro, etc all taking
advantage
© Copyright 2010 EMC Corporation. All rights reserved. 16
- 17. How do we make decisions?
• Good data is hard to get—so often on no data at all
• Often on information from peers, colleagues,
reports, or because it’s always been done that way
• Many companies fail because they fail to detect
shifts in consumer demand
• Internet has made customers more segmented, and
causes customer choice to change faster
© Copyright 2010 EMC Corporation. All rights reserved. 17
- 18. Moving to a Data-Driven Model
• Managing with the facts
• Making a science out of data!
• Experimental model—different
than BI
• Moving from “gut feel” to
rational, scentific decisions
© Copyright 2010 EMC Corporation. All rights reserved. 18
- 19. Big-Data-based Decisions
• Unlock value by making information transparent
and useable at higher frequency
• More accurate information (e.g. inventories, trends)
• Tailor products more precisely
• Sophisticated analytics makes for better decisions
• Better products (via web feedback, sensors, etc)
Source: McKinsey
© Copyright 2010 EMC Corporation. All rights reserved. 19
- 20. What holds back big data?
• Not ICT—compute & storage getting
bigger, cheaper, easier
• Not the quantity of data (see slide 1)
• Not the value—large-scale Big Data
projects generally have great ROI
• Real problems are organisational
change and talent acquisition
© Copyright 2010 EMC Corporation. All rights reserved. 20
- 22. How are people doing it?
• Enterprises ingesting > 1PB data per day within 5 yrs
• Big data is often largely unstructured
• Hadoop is an application written to analyze big data
– open source, Java-based
• Big data can mean billions to trillions of files
– Each file can be gigabytes to terabytes in size
• Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language
processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are
examples of big data techniques
• This means petabytes to exabytes of data
© Copyright 2010 EMC Corporation. All rights reserved. 22
- 23. How do you manage and design for Big Data?
• Scale and parallelism are the keys
– Big data is far too big to process sequentially
– Too much coming in too quickly
– Example: Banks seeking to process market data
more quickly, reducing decision making time from
days to minutes
• Answer: Scale-out storage and scale-out processing
© Copyright 2010 EMC Corporation. All rights reserved. 23
- 24. Cramming big data onto traditional models
Server
Scalability
Network
Performance
Management
Availability
Cost
Storage
© Copyright 2010 EMC Corporation. All rights reserved. 24
- 25. A different idea – scale-out
Server
Scalability
Network
Performance
Management
Availability
Cost
Storage
© Copyright 2010 EMC Corporation. All rights reserved. 25
- 26. Enterprise Hadoop: Greenplum & Isilon
• Easier and more reliable
– Packaged Hadoop distribution with Isilon storage
• Purpose-built Hadoop infrastructure
– Faster, less risk
• Sharing expertise to address the talent gap
– Architecture, data science, and roadmap services
• Proven at scale with worldwide support
– 24x7 one call Hadoop support from EMC
– Key component of Greenplum UAP
– Unstructured data processing
© Copyright 2010 EMC Corporation. All rights reserved. 26
- 27. Increasing Demand for Advanced Analytics
• Complex
– Deep, rich analysis of big data sets
– Ad hoc, interactive analysis, not structured reports
• Timely
– On-going, frequent analysis (e.g. daily, weekly)
– Insights delivered in minutes/seconds
• Actionable
– Forward looking, predictive insight
– Create new business value
© Copyright 2010 EMC Corporation. All rights reserved. 27
- 28. EMC Greenplum: Purpose-built for Big Data
• EMC Greenplum is a shared nothing, massively parallel
processing (MPP) data warehouse system
• Core principle of data computing is to move the processing
dramatically closer to the data and to the people
Fast Data
Loading
Extreme Performance Unified
& Elastic Scalability Data Access
© Copyright 2010 EMC Corporation. All rights reserved. 28
- 29. MPP Shared-Nothing Architecture
Greenplum’s Massively MapReduce
Parallel Processing (MPP)
Database has extreme
scalability on general purpose Master
systems Servers ... ...
Query planning
Automatic parallelization and dispatch
– Load and query like any Network
database Interconnect
Scan and process in parallel Segment ...
– Extremely scalable and I/O Servers ...
optimized Storage and
query ... ... ... ... ... ... ... ... ... ...
Linear scalability by adding processing
nodes External
– Each adds storage, query Sources
performance and loading MPP loading,
streaming, etc.
performance
© Copyright 2011 EMC Corporation. All rights reserved. EMC Confidential – NDA Required 29
- 30. EMC Hadoop.
Open Source.
Fully Supported By
EMC.
© Copyright 2010 EMC Corporation. All rights reserved. 30
- 31. The EMC Big Data “Stack”
4 Collaborative Act
Documentum xCP
?
3 Real Time Analyze
Greenplum, Hadoop
2 Structured &
Unstructured
Store
1 Petabyte
Scale
Isilon and Atmos
© Copyright 2010 EMC Corporation. All rights reserved. 31
- 32. THANK
YOU
HAVE A GREAT
CONFERENCE!
© Copyright 2010 EMC Corporation. All rights reserved. 32