SlideShare a Scribd company logo
1 of 42
Thinking Big
An Introduction to Big Data
About Me
Shawn Hermans
● Data Engineer/Scientist
● Technology consultant
● Physics, math, data geek
About this Talk
● Non-technical introduction to Big Data
● Not focused on any technology or platform
● Focus on concepts
Should you believe the hype?
● No need for scientific method
● Predict disease outbreaks before the CDC
● Cure cancer
● Innovating healthcare
● Solve world hunger
● Bring about world peace
Big Data Promises
Big Data Criticism
● Garbage in, Garbage out
● Ignores the role of the scientific method
● Lots of questions don’t require large
amounts of data to get good stats
● Privacy issues
Big Data is just another way to think about data
Mental Models
“A mental model is simply a representation of
an external reality inside your head. Mental
models are concerned with understanding
knowledge about the world.”
- Farnam Street Blog
Examples
● Occam's razor
● Mind maps
● Law of supply and demand
● Never get in a land war in Asia
All models are wrong, but some are useful
Relational Resistance
Resistance to big data concepts, technologies,
and techniques because of belief that the
relational model is the only way to think about
data.
See also: Theory induced blindness
Data Mental Models
● Relational
● Linked
● Object Oriented
● Geospatial
● Temporal
● Semantic
● Event Based
● Data as Code
● Bayesian
● Unstructured
What is Big Data?
“Big data is high volume, high velocity, and/or
high variety information assets that require new
forms of processing to enable enhanced
decision making, insight discovery and process
optimization.”
According to Gartner
According to Me
Big data is the Bazaar to
traditional data’s Cathedral
Cathedral and Bazaar
Traditional Data
● Clean
● Top down
● Carefully collected
● Scales vertically
● One true way
Big Data
● Disorderly
● Bottom up
● Randomly collected
● Scales horizontally
● More than one way
Big Data Differences
Relational
● Normalization
● ACID
● SQL/Query
● Structured/Schema
Big Data
● Denormalization
● BASE
● MapReduce/Other
● Loosely Structured
Integrating all available data is the promise of Big Data
Why should you care?
Information as an Asset
● Target specific customer's needs rather than
broad segments
● Just-in-time inventory management
● Evaluating demand for product
● Predict and track traffic patterns
Big Data and You
● What information do you have, that no one
else has?
● Can you easily integrate your data or is it
locked in silos?
● What data don’t you collect?
● What data don’t you archive?
Big Data Technology
Big Data Platforms
Cloud
● AWS
● Google
● Microsoft
Hadoop
● Cloudera
● MapR
● Hortonworks
This isn’t an all inclusive list, but a sample of
the big players in the space.
Big Data Stack
● Batch Processing
● Data Collection
● SQL/Query
● Search
● Machine Learning
● Serialization
● Security
● Stream Processing
● File Storage
● Resource
management
● Online NoSQL
● Data Pipeline
What about data science?
● Data science is statistics on a Mac
● A data scientist is a statistician who lives in
San Francisco
● Person who is better at statistics than any
software engineer and better at software
engineering than any statistician.
What IS Data Science?
The need for Data Science
● There is a LOT of data
● Too much data for people to look at it all
● Probabilistic models help extract signal from
the noise
● Need to automate the analysis and
exploitation of data
Big Data has its limits
Black Swans and Big Data
● There are fundamental limits to prediction
● Hard to predict rare events where no prior
data exists (i.e. Black Swans)
● Complex systems often have feedback loops
(e.g. stock market)
What’s next?
Business
● Identify some
unresolved questions
● Figure out what data
could answer those
questions
● Pick the easiest and test
out your hypothesis
Getting Started
Technology
● Pick a technology you
know or want to learn
● Pick a platform
● Pick a data set and
identify some basic
problems to solve
My Info
Twitter: @shawnhermans
Github: github.com/shawnhermans
Blog: http://shawnhermans.github.io/ (In Progress)
Slideshare: www.slideshare.net/shawnhermans/
Quora: http://www.quora.com/Shawn-Hermans
Backup Slides
The Fourth Quadrant and the Failure of Statistics
Soothsayer
● Simple HTTP/JSON
API for
training/classifying
data
● Lots of built in
classifier statistics
https://github.com/shawnhermans/soothsayer
Thinking Big with Big Data

More Related Content

What's hot

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...Dataconomy Media
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...OrateTeam
 
Big Data Analytics: How to Get Started? | OPTIMUS 2015 Atlanta
Big Data Analytics: How to Get Started? | OPTIMUS 2015 AtlantaBig Data Analytics: How to Get Started? | OPTIMUS 2015 Atlanta
Big Data Analytics: How to Get Started? | OPTIMUS 2015 AtlantaORTEC US
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceKoo Ping Shung
 
How to start your journey as a data scientist
How to start your journey as a data scientistHow to start your journey as a data scientist
How to start your journey as a data scientistParvaneh Shafiei
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017HT2 Labs
 
Critical Success Factors for A Data Analytics Initiative
Critical Success Factors for A Data Analytics InitiativeCritical Success Factors for A Data Analytics Initiative
Critical Success Factors for A Data Analytics InitiativeSasken Technologies Ltd.
 
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...Jheronimus Academy of Data Science
 
Make good products great with data and analytics
Make good products great with data and analyticsMake good products great with data and analytics
Make good products great with data and analyticsDavid Mathias
 
Some Questions About Your Data
Some Questions About Your DataSome Questions About Your Data
Some Questions About Your DataDamian T. Gordon
 
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...Jheronimus Academy of Data Science
 

What's hot (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
 
Data science guide
Data science guideData science guide
Data science guide
 
Big Data Analytics: How to Get Started? | OPTIMUS 2015 Atlanta
Big Data Analytics: How to Get Started? | OPTIMUS 2015 AtlantaBig Data Analytics: How to Get Started? | OPTIMUS 2015 Atlanta
Big Data Analytics: How to Get Started? | OPTIMUS 2015 Atlanta
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
How to start your journey as a data scientist
How to start your journey as a data scientistHow to start your journey as a data scientist
How to start your journey as a data scientist
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
 
Innotech Dallas
Innotech DallasInnotech Dallas
Innotech Dallas
 
Critical Success Factors for A Data Analytics Initiative
Critical Success Factors for A Data Analytics InitiativeCritical Success Factors for A Data Analytics Initiative
Critical Success Factors for A Data Analytics Initiative
 
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
3. Workshop Responsible Data Science - Discussion on Accuracy in data science...
 
Data analytics course in bangalore
Data analytics course in bangaloreData analytics course in bangalore
Data analytics course in bangalore
 
Make good products great with data and analytics
Make good products great with data and analyticsMake good products great with data and analytics
Make good products great with data and analytics
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Some Questions About Your Data
Some Questions About Your DataSome Questions About Your Data
Some Questions About Your Data
 
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 

Similar to Thinking Big with Big Data

Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
How tech startups can leverage data analytics and visualization
How tech startups can leverage data analytics and visualizationHow tech startups can leverage data analytics and visualization
How tech startups can leverage data analytics and visualizationVishanth Bala
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!Dylan
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdfngVnThng12
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolkfear
 

Similar to Thinking Big with Big Data (20)

Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
How tech startups can leverage data analytics and visualization
How tech startups can leverage data analytics and visualizationHow tech startups can leverage data analytics and visualization
How tech startups can leverage data analytics and visualization
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
So you want to be a Data Scientist?
So you want to be a Data Scientist?So you want to be a Data Scientist?
So you want to be a Data Scientist?
 
Big data
Big dataBig data
Big data
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPTool
 

Recently uploaded

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Recently uploaded (20)

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Thinking Big with Big Data

  • 2. About Me Shawn Hermans ● Data Engineer/Scientist ● Technology consultant ● Physics, math, data geek
  • 3. About this Talk ● Non-technical introduction to Big Data ● Not focused on any technology or platform ● Focus on concepts
  • 4. Should you believe the hype?
  • 5. ● No need for scientific method ● Predict disease outbreaks before the CDC ● Cure cancer ● Innovating healthcare ● Solve world hunger ● Bring about world peace Big Data Promises
  • 6.
  • 7. Big Data Criticism ● Garbage in, Garbage out ● Ignores the role of the scientific method ● Lots of questions don’t require large amounts of data to get good stats ● Privacy issues
  • 8. Big Data is just another way to think about data
  • 9. Mental Models “A mental model is simply a representation of an external reality inside your head. Mental models are concerned with understanding knowledge about the world.” - Farnam Street Blog
  • 10. Examples ● Occam's razor ● Mind maps ● Law of supply and demand ● Never get in a land war in Asia
  • 11. All models are wrong, but some are useful
  • 12. Relational Resistance Resistance to big data concepts, technologies, and techniques because of belief that the relational model is the only way to think about data. See also: Theory induced blindness
  • 13.
  • 14. Data Mental Models ● Relational ● Linked ● Object Oriented ● Geospatial ● Temporal ● Semantic ● Event Based ● Data as Code ● Bayesian ● Unstructured
  • 15. What is Big Data?
  • 16. “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” According to Gartner
  • 17. According to Me Big data is the Bazaar to traditional data’s Cathedral
  • 18. Cathedral and Bazaar Traditional Data ● Clean ● Top down ● Carefully collected ● Scales vertically ● One true way Big Data ● Disorderly ● Bottom up ● Randomly collected ● Scales horizontally ● More than one way
  • 19. Big Data Differences Relational ● Normalization ● ACID ● SQL/Query ● Structured/Schema Big Data ● Denormalization ● BASE ● MapReduce/Other ● Loosely Structured
  • 20. Integrating all available data is the promise of Big Data
  • 21. Why should you care?
  • 22.
  • 23. Information as an Asset ● Target specific customer's needs rather than broad segments ● Just-in-time inventory management ● Evaluating demand for product ● Predict and track traffic patterns
  • 24. Big Data and You ● What information do you have, that no one else has? ● Can you easily integrate your data or is it locked in silos? ● What data don’t you collect? ● What data don’t you archive?
  • 26. Big Data Platforms Cloud ● AWS ● Google ● Microsoft Hadoop ● Cloudera ● MapR ● Hortonworks This isn’t an all inclusive list, but a sample of the big players in the space.
  • 27. Big Data Stack ● Batch Processing ● Data Collection ● SQL/Query ● Search ● Machine Learning ● Serialization ● Security ● Stream Processing ● File Storage ● Resource management ● Online NoSQL ● Data Pipeline
  • 28.
  • 29. What about data science?
  • 30. ● Data science is statistics on a Mac ● A data scientist is a statistician who lives in San Francisco ● Person who is better at statistics than any software engineer and better at software engineering than any statistician. What IS Data Science?
  • 31.
  • 32. The need for Data Science ● There is a LOT of data ● Too much data for people to look at it all ● Probabilistic models help extract signal from the noise ● Need to automate the analysis and exploitation of data
  • 33. Big Data has its limits
  • 34. Black Swans and Big Data ● There are fundamental limits to prediction ● Hard to predict rare events where no prior data exists (i.e. Black Swans) ● Complex systems often have feedback loops (e.g. stock market)
  • 36. Business ● Identify some unresolved questions ● Figure out what data could answer those questions ● Pick the easiest and test out your hypothesis Getting Started Technology ● Pick a technology you know or want to learn ● Pick a platform ● Pick a data set and identify some basic problems to solve
  • 37. My Info Twitter: @shawnhermans Github: github.com/shawnhermans Blog: http://shawnhermans.github.io/ (In Progress) Slideshare: www.slideshare.net/shawnhermans/ Quora: http://www.quora.com/Shawn-Hermans
  • 39.
  • 40. The Fourth Quadrant and the Failure of Statistics
  • 41. Soothsayer ● Simple HTTP/JSON API for training/classifying data ● Lots of built in classifier statistics https://github.com/shawnhermans/soothsayer

Editor's Notes

  1. https://twitter.com/BigDataBorat/status/349293502498213888
  2. Quote by http://en.wikiquote.org/wiki/George_E._P._Box
  3. See http://www.bloomberg.com/news/2011-10-25/bias-blindness-and-how-we-truly-think-part-2-daniel-kahneman.html
  4. Inspired by Eric Raymond’s Cathedral and the Bazaar - http://www.catb.org/esr/writings/cathedral-bazaar/introduction/
  5. BASE (basically available soft-state eventual consistency) See CAP theorem for more details http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  6. Big data might not save the world, but it could entertain us http://www.fastcodesign.com/1671893/the-secret-sauce-behind-netflixs-hit-house-of-cards-big-data
  7. http://blogs.wsj.com/digits/2014/01/17/amazon-wants-to-ship-your-package-before-you-buy-it/ http://en.wikipedia.org/wiki/Google_Traffic#Crowdsourced_traffic_data
  8. “Big Data and You” sounds like a good children’s book title.
  9. This is admin screen for Amazon Web Services. Not all of these services are Big Data, but it gives you a good idea of an integrated Big Data platform.
  10. https://twitter.com/cdixon/status/428914681911070720 https://twitter.com/BigDataBorat/status/372350993255518208 https://twitter.com/josh_wills/status/198093512149958656 Although use of the term data science has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs.[13] In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”[14]
  11. From Drew Conway http://en.wikipedia.org/wiki/Data_science#mediaviewer/File:Data_Science_Venn_Diagram.png
  12. See Nassim Taleb’s excellent essay The Fourth Quadrant - http://edge.org/conversation/the-fourth-quadrant-a-map-of-the-limits-of-statistics
  13. See http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public for datasets
  14. http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting