SlideShare a Scribd company logo
1 of 38
Jojo Anonuevo for J&J, Acacia Hotel 10 Jan 2019
Data Science and the Data Revolution
Definition - What does Data Science mean?
Data science is a broad field that refers to the collective
processes, theories, concepts, tools and technologies
that enable the review, analysis and extraction of
valuable knowledge and information from raw data. It is
geared toward helping individuals and organizations
make better decisions from stored, consumed and
managed data.
Source:
Data to Insight to Decision
Information Technologists
Data Engineers
Data and Business Analysts
Data Scientists, Analysts
Decision-Makers
What types of decisions?
● Prediction (predict a value based on inputs)
● Classification (e.g., spam or not spam)
● Recommendations (e.g., Amazon and Netflix recommendations)
● Pattern detection and grouping (e.g., classification without known
classes)
● Anomaly detection (e.g., fraud detection)
● Recognition (image, text, audio, video, facial, …)
● Actionable insights (via dashboards, reports, visualizations, …)
● Automated processes and decision-making (e.g., credit card
approval)
● Scoring and ranking (e.g., FICO/Credit score)
● Segmentation (e.g., demographic-based marketing)
● Optimization (e.g., risk management)
● Forecasts (e.g., sales and revenue)
Data to Decision Frameworks
Where’s the Data?
Diverse Data Sources converging into Datawarehouse
GeoViz
Latency
Examples of Data to Decision Frameworks
Information Technologists
Data Engineers
Data and Business Analysts
Data Scientists, Analysts
Decision-Makers
Data to Decision frameworks
Gartner - 4 Analytics Capabilities
Machine Centered
Human Centered
Data
Decision
Predictive
What will happen Action
Prescriptive
What should I do
Decision Support
Decision Automation
Analytics
Descriptive
What happened
Diagnostic
Why did it happen
Exponential Growth of Data
Diverse Streams of Data
ANALYTICS PLATFORM
Terabytes of Data
/ Day
Mobile and Data First
Moderate Images
Extract Text, Images
Measure Upload Speed
Exponential Data Streams with IoT
Data Consumption
DIVERSITY OF INSIGHTS
Variance
Stage 1
Stage 2
Stage 3
Geovisualization
ANALYTICS PLATFORM
Average of
1.9 Hours
Beyond SLA of 3 Days: 8.43%
Tyranny of the 1 Number The tyranny of averages is a phrase used in
applied statistics to describe the often
overlooked fact that the mean does not provide
any information about the shape of the
probability distribution of a data set or skewness,
and that decisions or analysis based on only the
mean—as opposed to median and standard
deviation—may be faulty.
From Wikipedia, the free encyclopedia
Is an average
of 1.9 hours
ok if you’re targeting
95% to be
under 3 hours?
Eliminate the noise
Pareto
Principle
Pre-Processing and Transformation
(Extract Transform Load or ETL)
Challenges with Inconsistencies
Region name
instead of Province
Missing
Province
Invalid City
Names
Challenges with Inconsistencies
Manila
Importance of Standards in Data Quality
Street / Apt / Condo
MMXCode 01 001
2 Char Province
Metro Manila
2 Digit City
Caloocan City
3 Digit Barangay
Barangay 1
Latitude Longitude
XXXXX
Extended XCode for higher
granularity using LatLong
http://nap.psa.gov.ph/activestats/psgc/default.asp
Granular visibility with XCode (new Postal Code)
Data Modeling
Modeling Disciplines
Modeling and Data Science Disciplines
Economics
AI
Machine
Learning
Python
R
executive.berkeley.edu/programs/berkeley-program-data-science-analytics
Python + Libraries and Simple Regression Model
Building a Multiple Regression Model:
Case Study: Predict Demand for CapitalBikeshare
𝜇 rider count | temp, humidity, ..., month =
𝛽0 + 𝛽1 ∗ (temp)+ 𝛽2 ∗ (humidity) +...+ 𝛽𝑘∗(month)
www.capitalbikeshare.com/system-data
Response
Explanatory Variables
Case Study: Predict Demand for CapitalBikeshare
𝜇 rider count | temp, humidity, month = 𝛽0 + 𝛽1 ∗ (temp) + 𝛽2 ∗ (humidity) +...+ 𝛽𝑘∗(month)
www.capitalbikeshare.com/system-data
Datascience - Classroom and Corporate Tours
Netflix HQ Tour
Los Gatos, California
Key Take-away:
Data Science is a key pillar in product development ....
Data Science Disciplines and Data Scientist
Uber Data Scientist
Pandora Data Scientist
How we leverage Machine Learning - Improve Customer
Service
ERA OF INNOVATE OR BE DISRUPTED
The Industries That Are Being Disrupted the Most by Digital
The Science of Fail: Why the New Gap Logo Made Our Brains Angry
Thank you...

More Related Content

Similar to Data science and the data revolution

Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
Lecture 2 Teaching Digital Technologies 2016
Lecture 2 Teaching Digital Technologies 2016Lecture 2 Teaching Digital Technologies 2016
Lecture 2 Teaching Digital Technologies 2016Jason Zagami
 
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NC
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NCSmart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NC
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NCMary Jane Clark
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smcShaun Comfort
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxCloudBusiness2
 
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods  | Artificial Intelligence | Rahul Gulab SinghData scientist Methods  | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods | Artificial Intelligence | Rahul Gulab SinghRahul Singh
 
DATA SCIENCE PPT1.pptx
DATA SCIENCE PPT1.pptxDATA SCIENCE PPT1.pptx
DATA SCIENCE PPT1.pptxDMKurnool
 
DATA SCIENCE PPT.pptx
DATA SCIENCE PPT.pptxDATA SCIENCE PPT.pptx
DATA SCIENCE PPT.pptxDMKurnool
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science kumari36
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET Journal
 
Computational Thinking
Computational ThinkingComputational Thinking
Computational ThinkingJason Zagami
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuideAivada
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxMadhumitha N
 

Similar to Data science and the data revolution (20)

Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Lecture 2 Teaching Digital Technologies 2016
Lecture 2 Teaching Digital Technologies 2016Lecture 2 Teaching Digital Technologies 2016
Lecture 2 Teaching Digital Technologies 2016
 
Assignment-4.pdf
Assignment-4.pdfAssignment-4.pdf
Assignment-4.pdf
 
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NC
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NCSmart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NC
Smart Predictive Data Analysis by MJ Clark Business Consulting Raleigh NC
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smc
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
 
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods  | Artificial Intelligence | Rahul Gulab SinghData scientist Methods  | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
 
DATA SCIENCE PPT1.pptx
DATA SCIENCE PPT1.pptxDATA SCIENCE PPT1.pptx
DATA SCIENCE PPT1.pptx
 
DATA SCIENCE PPT.pptx
DATA SCIENCE PPT.pptxDATA SCIENCE PPT.pptx
DATA SCIENCE PPT.pptx
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
 
Computational Thinking
Computational ThinkingComputational Thinking
Computational Thinking
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
Intro to ai application emeritus uob-final
Intro to ai application emeritus uob-finalIntro to ai application emeritus uob-final
Intro to ai application emeritus uob-final
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

Data science and the data revolution

  • 1. Jojo Anonuevo for J&J, Acacia Hotel 10 Jan 2019 Data Science and the Data Revolution
  • 2.
  • 3. Definition - What does Data Science mean? Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. It is geared toward helping individuals and organizations make better decisions from stored, consumed and managed data. Source:
  • 4. Data to Insight to Decision Information Technologists Data Engineers Data and Business Analysts Data Scientists, Analysts Decision-Makers
  • 5. What types of decisions? ● Prediction (predict a value based on inputs) ● Classification (e.g., spam or not spam) ● Recommendations (e.g., Amazon and Netflix recommendations) ● Pattern detection and grouping (e.g., classification without known classes) ● Anomaly detection (e.g., fraud detection) ● Recognition (image, text, audio, video, facial, …) ● Actionable insights (via dashboards, reports, visualizations, …) ● Automated processes and decision-making (e.g., credit card approval) ● Scoring and ranking (e.g., FICO/Credit score) ● Segmentation (e.g., demographic-based marketing) ● Optimization (e.g., risk management) ● Forecasts (e.g., sales and revenue)
  • 6. Data to Decision Frameworks Where’s the Data?
  • 7. Diverse Data Sources converging into Datawarehouse GeoViz Latency
  • 8. Examples of Data to Decision Frameworks Information Technologists Data Engineers Data and Business Analysts Data Scientists, Analysts Decision-Makers
  • 9. Data to Decision frameworks Gartner - 4 Analytics Capabilities Machine Centered Human Centered Data Decision Predictive What will happen Action Prescriptive What should I do Decision Support Decision Automation Analytics Descriptive What happened Diagnostic Why did it happen
  • 10. Exponential Growth of Data Diverse Streams of Data ANALYTICS PLATFORM Terabytes of Data / Day
  • 11. Mobile and Data First Moderate Images Extract Text, Images Measure Upload Speed
  • 17. Average of 1.9 Hours Beyond SLA of 3 Days: 8.43% Tyranny of the 1 Number The tyranny of averages is a phrase used in applied statistics to describe the often overlooked fact that the mean does not provide any information about the shape of the probability distribution of a data set or skewness, and that decisions or analysis based on only the mean—as opposed to median and standard deviation—may be faulty. From Wikipedia, the free encyclopedia Is an average of 1.9 hours ok if you’re targeting 95% to be under 3 hours?
  • 20. Challenges with Inconsistencies Region name instead of Province Missing Province Invalid City Names
  • 22. Importance of Standards in Data Quality Street / Apt / Condo MMXCode 01 001 2 Char Province Metro Manila 2 Digit City Caloocan City 3 Digit Barangay Barangay 1 Latitude Longitude XXXXX Extended XCode for higher granularity using LatLong http://nap.psa.gov.ph/activestats/psgc/default.asp
  • 23. Granular visibility with XCode (new Postal Code)
  • 26. Modeling and Data Science Disciplines Economics AI Machine Learning Python R executive.berkeley.edu/programs/berkeley-program-data-science-analytics
  • 27. Python + Libraries and Simple Regression Model
  • 28. Building a Multiple Regression Model: Case Study: Predict Demand for CapitalBikeshare 𝜇 rider count | temp, humidity, ..., month = 𝛽0 + 𝛽1 ∗ (temp)+ 𝛽2 ∗ (humidity) +...+ 𝛽𝑘∗(month) www.capitalbikeshare.com/system-data Response Explanatory Variables
  • 29. Case Study: Predict Demand for CapitalBikeshare 𝜇 rider count | temp, humidity, month = 𝛽0 + 𝛽1 ∗ (temp) + 𝛽2 ∗ (humidity) +...+ 𝛽𝑘∗(month) www.capitalbikeshare.com/system-data
  • 30. Datascience - Classroom and Corporate Tours
  • 31. Netflix HQ Tour Los Gatos, California
  • 32. Key Take-away: Data Science is a key pillar in product development ....
  • 33. Data Science Disciplines and Data Scientist
  • 36. How we leverage Machine Learning - Improve Customer Service
  • 37. ERA OF INNOVATE OR BE DISRUPTED The Industries That Are Being Disrupted the Most by Digital The Science of Fail: Why the New Gap Logo Made Our Brains Angry

Editor's Notes

  1. 7 Steps From Raw Data to Insight What is the data science process?
  2. Uber’s Big Data Platform: 100+ Petabytes with Minute Latency 17Oct2018 Google Cloud SQL vs Cloud DataStore vs BigTable vs BigQuery vs Spanner 10Jun2017 7 Steps From Raw Data to Insight What is the data science process?
  3. 7 Steps From Raw Data to Insight What is the data science process?
  4. The e-car: Electronics technology applied to sustainable mobility, Part 1 Best Wearable Tech for 2019
  5. Benchmarking Google Vision, Amazon Rekognition, Microsoft Azure on Image Moderation
  6. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
  7. AI Is Going to Change the 80/20 Rule Only 0.15 percent of mobile gamers account for 50 percent of all in-game revenue (exclusive) 26Feb2014
  8. Data Science: An Introduction/A Mash-up of Disciplines Knowledge Discovery in Databases
  9. What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?
  10. This is how Netflix's top-secret recommendation system works 22Aug2017
  11. What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? 29Jul2016 DEVOPEDIA for developers. by developers. Data Science What Is Data Science, and What Does a Data Scientist Do? Excerpt below: The Pillars of Data Science Expertise While data scientists often come from many different educational and work experience backgrounds, most should be strong in, or in an ideal case be experts in four fundamental areas. In no particular order of priority or importance, these are: Business domain Statistics and probability Computer science and software programming Written and verbal communication
  12. An Introduction to Google Machine Learning APIs 13Jan2017
  13. Alibaba's Jack Ma shares insight into innovation, entrepreneurship - chinaplus.cri.cn | Updated: 18Sep2017 10:19 The Science of Fail: Why the New Gap Logo Made Our Brains Angry The Industries That Are Being Disrupted the Most by Digital