SlideShare a Scribd company logo
1
Big Data
Past, Present & Future
Where are We Headed?
Rob Peglar
CTO Americas
Isilon Storage Division
EMC Corporation
rob.peglar@emc.com
@peglarr
2
• In order to understand what’s coming, we must
understand our past
• We must also understand that
Big Data is fundamentally
different than what we’re used to
• Consider the difference between a still photograph
and a movie – and our human perception of them
– More than a collection of still photographs – why?
Prediction is Very Difficult -
Especially About the Future
- Niels Bohr
3
The Past –
and I Mean the Past
• Consider the census…
• From the Latin “censere”
– meaning “to estimate”
• “In those days a decree went out from Emperor Augustus that all
the world should be registered.” Luke 2:1
• The Domesday Book of 1086 – England
– Comprehensive tally of people, their land, and property
• The US Constitution mandates a decennial census
– The 1880 census took eight years (!) to complete
• This led to Hollerith’s punched card tabulator in 1890
– The beginning of automated data processing
– Reduced the census time to one year
4
Sampling – Good or Bad?
• Sampling precision improves optimally
with randomness
– Not sample size
– Jerzy Neyman (Poland, 1934) proved this
• Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of
the Royal Statistical Society, 97 (4), 557–625
• Good - Sampling was a solution to information overload
• Bad - Systematic bias in sampling gives wrong conclusions
• A seismic shift is occurring – from
– Sampling, keeping datasets small on purpose, using them once…to
– N=all, keeping datasets large on purpose, using them many times
• Why? The outliers are the most interesting!
– Examples – credit card fraud, language translation, insurability
– Don’t just follow the rules, look for the exceptions
Williams
Tube
1946
1024 bits
5
The Journey from
Clean to Messy
• 1998 – Linden et al, collaborative
filtering patent, working at a Seattle startup selling books
online
– G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com),
Patent and Trademark Office, Washington, D.C., 2001
• “If it works perfectly, Amazon should show you just one
book – the next one you will buy.” (Linden)
• Hypothesis-driven approach becomes data-driven
– “Proving” something (causation)  correlation
• McGregor et al – using big data to improve the NICU
– 16 data streams, 1,260 data points/sec
– Valid improvement of premature infant adverse outcomes
– No “proof” – it helps doctors make better diagnostic decisions
– Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
6
Manholes and Raw Data - Correlations
• 94,000 miles of underground cable in NYC, 51,000 manholes in
just Manhattan w/service boxes below
• 1 in 20 cables laid before 1930; some Edison-era
• Records kept since 1880’s – 38 different terms
– All hand-written, paper, cards, ledgers, etc.
• 2008 - How to prevent fires, exploding manholes?
• Machine-correlate 106 predictors of imminent disaster
– Top 10% predicted were 44% of total failures
• Chris Anderson – “data deluge makes scientific method obsolete”
– http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
• “Datafication” – everything is data
– Numbers to words to images to locations to relationships to feelings …
– Graph theory & graph analysis changes the way we perceive the world
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
The Present - Architecture
BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION
END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS
Shared Nothing
Scale-out Storage + SSD
MPP + In-Memory
Compute
Hadoop
Hi-Speed / -
Resiliency
Networking
Converged
Infrastructure
Cloud
Non-relational
DWH
SYSTEMS INTEGRATION
VOLUMEVELOCITYVARIETY
OBJECTIVES
Stream Processing
Event Management
Data Exploration
Contextualized Data
Modeling / Scenarios
Forecasting
DELIVERY MODELS
Access-Anywhere
Analytics Services
Context-Aware
Business Applications
ON-DEMAND
Location-Based
Services
Alert and
Respond
PUSH
Workflow and
Interaction
Automation
Smart devices
and systems
EMBEDDED
Email and
Messaging
Mobile Apps Data
Transaction and
Usage Logs
Machine and
Sensors
Geolocation
Relationships and
Social Influence
Real-time
Events
Deep
Insights
VALUE
8
The Present – Business Value of Data
• Data is valuable – re-use of data even more so
– Not ephemeral value – can be re-consumed ad infinitum
– Economists call this a “non-rivalrous” good
• Cost/benefit of storage ~ 0 – so keep everything
– Ewan Birney, European Biomatics Information Institute, “Hidden Treasures
In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna
– Last 50 years, cost/byte ~1/2x every 2 years
– Density has increased ~50 million times since 1956
• Consider electric cars:
– Battery level indicates when to “fill up” from the power grid
– Power utility monitors grid usage over time
– Correlate both data sets together
• Determine when/where to build recharge stations on which roads
• Recombinant data
– “Old” data combined into new forms for new insights
– “Noisy” datasets enable feedback loops – e.g. better/faster search/index
9
The Future 1 – Wild, Wild West?
• Can we treat data as a corporate asset?
– A ledger entry, like “brand value” (intangible)
– Or is data a tangible asset to be kept on the books?
– Does data have “cash value”? Asset amortization?
– Can a business be legally “liable” for its data collection?
• Facebook book-valued at $6.3B. IPO value: $104B
– Why the difference? Facebook is essentially data
– Or, every FB user is worth ~ $100 (~1B subscribers)
• We will see much more “data value chain” ahead
– Ingest, analyze, sell results, analyze, sell results …downstreaming
– Licensing of data in its infancy – much more to come
– Think about the data just from your car – 40 uPs
10
The Future 2 – Data as Policy -
Can Data save Us from Us?
• “In God We Trust – all others bring data”
– Commonly attributed to W. Edward Deming
• New jobs/titles coming out of the woodwork
– CAO (Chief Analytics Officer), CDO (Data)
– Data Scientist, Data Correlationist, Data Ethicist
• Knowing “what” not “why” is good enough. Is it?
• Remember Bayes’ “inductive probability” (250 yrs!)
– We update our beliefs about something as new data arrives
– Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418.
• Data Policy in the immortal words of Yogi Berra:
– “We make too many wrong mistakes”
– “You can observe a lot just by watching.”
11
The Future 3 – N=all?
Keep Everything? Seriously?
• Data Silos or the Data Lake?
– HDFS presents a crisis: i.e. 危機, weiji
• dangerous ‘critical point’ (not crisis; mis-translation)
– Write-once, read-many, modify-never; delete-never?
– Time is not your friend when moving data
• (So, don’t move it between repositories; move it to the CPU)
• One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s
• One million seconds is 277.7 hours (~ 11.5 days)
• 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is …
• Non-shared (1 protocol) or shared (N protocols)?
• Time versus Space – the Essential Judgment
• Cost of Having Data vs. Cost of Not Having Data
12
THANK YOU

More Related Content

What's hot

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
Booz Allen Hamilton
 
Big data
Big dataBig data
Big data
Claire Choong
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBM
paulp-mc2
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Jedha Bootcamp
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Data ScienceTech Institute
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
Data science
Data scienceData science
Data science
SwapnilDahake2
 
U4 l01 What is big data?
U4 l01 What is big data?U4 l01 What is big data?
U4 l01 What is big data?
Chapelgate Christian Academy
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
Angelo Mariano
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
The Marketing Distillery
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
Annie Pettit, Research Methodologist
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
Chouaieb NEMRI
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
Ícaro Medeiros
 
Business analytics
Business analyticsBusiness analytics
Business analytics
SwarnaLatha177
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-Frank
DataWorks Summit
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
Booz Allen Hamilton
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
James Hendler
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
Prashant Kumar Jadia
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
Jeffrey Strickland, Ph.D., CMSP
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 

What's hot (20)

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
Big data
Big dataBig data
Big data
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBM
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data science
Data scienceData science
Data science
 
U4 l01 What is big data?
U4 l01 What is big data?U4 l01 What is big data?
U4 l01 What is big data?
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-Frank
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 

Similar to Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

DBMS
DBMSDBMS
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
Andrew Gardner
 
Big data
Big dataBig data
Big data
Prince Barai
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
DATAVERSITY
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
InnoTech
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
IT Network marcus evans
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
Adam Leadbetter
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Spark
SparkSpark
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
suresh sood
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
Terry Bunio
 
Big Data – Are You Ready?
Big Data – Are You Ready?Big Data – Are You Ready?
Big Data – Are You Ready?
Talentica Software
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?
inside-BigData.com
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Doug Denton
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
RahulTr22
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
Lisette ZOUNON
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
Semantic Web Company
 

Similar to Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014 (20)

DBMS
DBMSDBMS
DBMS
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Big data
Big dataBig data
Big data
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Spark
SparkSpark
Spark
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Big Data – Are You Ready?
Big Data – Are You Ready?Big Data – Are You Ready?
Big Data – Are You Ready?
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 

Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

  • 1. 1 Big Data Past, Present & Future Where are We Headed? Rob Peglar CTO Americas Isilon Storage Division EMC Corporation rob.peglar@emc.com @peglarr
  • 2. 2 • In order to understand what’s coming, we must understand our past • We must also understand that Big Data is fundamentally different than what we’re used to • Consider the difference between a still photograph and a movie – and our human perception of them – More than a collection of still photographs – why? Prediction is Very Difficult - Especially About the Future - Niels Bohr
  • 3. 3 The Past – and I Mean the Past • Consider the census… • From the Latin “censere” – meaning “to estimate” • “In those days a decree went out from Emperor Augustus that all the world should be registered.” Luke 2:1 • The Domesday Book of 1086 – England – Comprehensive tally of people, their land, and property • The US Constitution mandates a decennial census – The 1880 census took eight years (!) to complete • This led to Hollerith’s punched card tabulator in 1890 – The beginning of automated data processing – Reduced the census time to one year
  • 4. 4 Sampling – Good or Bad? • Sampling precision improves optimally with randomness – Not sample size – Jerzy Neyman (Poland, 1934) proved this • Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of the Royal Statistical Society, 97 (4), 557–625 • Good - Sampling was a solution to information overload • Bad - Systematic bias in sampling gives wrong conclusions • A seismic shift is occurring – from – Sampling, keeping datasets small on purpose, using them once…to – N=all, keeping datasets large on purpose, using them many times • Why? The outliers are the most interesting! – Examples – credit card fraud, language translation, insurability – Don’t just follow the rules, look for the exceptions Williams Tube 1946 1024 bits
  • 5. 5 The Journey from Clean to Messy • 1998 – Linden et al, collaborative filtering patent, working at a Seattle startup selling books online – G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com), Patent and Trademark Office, Washington, D.C., 2001 • “If it works perfectly, Amazon should show you just one book – the next one you will buy.” (Linden) • Hypothesis-driven approach becomes data-driven – “Proving” something (causation)  correlation • McGregor et al – using big data to improve the NICU – 16 data streams, 1,260 data points/sec – Valid improvement of premature infant adverse outcomes – No “proof” – it helps doctors make better diagnostic decisions – Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
  • 6. 6 Manholes and Raw Data - Correlations • 94,000 miles of underground cable in NYC, 51,000 manholes in just Manhattan w/service boxes below • 1 in 20 cables laid before 1930; some Edison-era • Records kept since 1880’s – 38 different terms – All hand-written, paper, cards, ledgers, etc. • 2008 - How to prevent fires, exploding manholes? • Machine-correlate 106 predictors of imminent disaster – Top 10% predicted were 44% of total failures • Chris Anderson – “data deluge makes scientific method obsolete” – http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory • “Datafication” – everything is data – Numbers to words to images to locations to relationships to feelings … – Graph theory & graph analysis changes the way we perceive the world
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Present - Architecture BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS Shared Nothing Scale-out Storage + SSD MPP + In-Memory Compute Hadoop Hi-Speed / - Resiliency Networking Converged Infrastructure Cloud Non-relational DWH SYSTEMS INTEGRATION VOLUMEVELOCITYVARIETY OBJECTIVES Stream Processing Event Management Data Exploration Contextualized Data Modeling / Scenarios Forecasting DELIVERY MODELS Access-Anywhere Analytics Services Context-Aware Business Applications ON-DEMAND Location-Based Services Alert and Respond PUSH Workflow and Interaction Automation Smart devices and systems EMBEDDED Email and Messaging Mobile Apps Data Transaction and Usage Logs Machine and Sensors Geolocation Relationships and Social Influence Real-time Events Deep Insights VALUE
  • 8. 8 The Present – Business Value of Data • Data is valuable – re-use of data even more so – Not ephemeral value – can be re-consumed ad infinitum – Economists call this a “non-rivalrous” good • Cost/benefit of storage ~ 0 – so keep everything – Ewan Birney, European Biomatics Information Institute, “Hidden Treasures In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna – Last 50 years, cost/byte ~1/2x every 2 years – Density has increased ~50 million times since 1956 • Consider electric cars: – Battery level indicates when to “fill up” from the power grid – Power utility monitors grid usage over time – Correlate both data sets together • Determine when/where to build recharge stations on which roads • Recombinant data – “Old” data combined into new forms for new insights – “Noisy” datasets enable feedback loops – e.g. better/faster search/index
  • 9. 9 The Future 1 – Wild, Wild West? • Can we treat data as a corporate asset? – A ledger entry, like “brand value” (intangible) – Or is data a tangible asset to be kept on the books? – Does data have “cash value”? Asset amortization? – Can a business be legally “liable” for its data collection? • Facebook book-valued at $6.3B. IPO value: $104B – Why the difference? Facebook is essentially data – Or, every FB user is worth ~ $100 (~1B subscribers) • We will see much more “data value chain” ahead – Ingest, analyze, sell results, analyze, sell results …downstreaming – Licensing of data in its infancy – much more to come – Think about the data just from your car – 40 uPs
  • 10. 10 The Future 2 – Data as Policy - Can Data save Us from Us? • “In God We Trust – all others bring data” – Commonly attributed to W. Edward Deming • New jobs/titles coming out of the woodwork – CAO (Chief Analytics Officer), CDO (Data) – Data Scientist, Data Correlationist, Data Ethicist • Knowing “what” not “why” is good enough. Is it? • Remember Bayes’ “inductive probability” (250 yrs!) – We update our beliefs about something as new data arrives – Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418. • Data Policy in the immortal words of Yogi Berra: – “We make too many wrong mistakes” – “You can observe a lot just by watching.”
  • 11. 11 The Future 3 – N=all? Keep Everything? Seriously? • Data Silos or the Data Lake? – HDFS presents a crisis: i.e. 危機, weiji • dangerous ‘critical point’ (not crisis; mis-translation) – Write-once, read-many, modify-never; delete-never? – Time is not your friend when moving data • (So, don’t move it between repositories; move it to the CPU) • One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s • One million seconds is 277.7 hours (~ 11.5 days) • 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is … • Non-shared (1 protocol) or shared (N protocols)? • Time versus Space – the Essential Judgment • Cost of Having Data vs. Cost of Not Having Data