SlideShare a Scribd company logo
2
• Data vs. Information
• Data science
• Big Data
• Big data vs. conventional/small data
• Data warehousing vs. data mining vs. big data
analytics
• Data science vs. data analytics
• Steps of data analytics
• Concluding remarks
• Data is:
• A collection of facts
• Statistics used for reference or analysis
• A series of observations
• Measurements
• Things known as facts, making the basis of
reasoning or calculation.
3
4
• Information is:
• Processed data
• Meaning given to data by the way it is interpreted.
• How do we use it
• Decision making
• Thing/artifact:
• Information is what’s captured in a book, web page, or other
resource.
• More information is digital
• Data on its own has no meaning, only when interpreted by
some kind of data processing system does it take on
meaning and become information.
5
Yes, Yes, No, Yes, No, Yes,
No, Yes, No, Yes, Yes
Raw Data
Context
Responses to the market
research question – “Would
you buy brand x at price y?”
Information ???
Processing
6
14082018
Simply a number, no meaning
14/08/2018
Now it becomes a date, just by adding backward
slashes
Formatting makes it meaningful
7
What is Data Science?
8
Data Science
Data Science
Business
Technology
An interdisciplinary field that employs sophisticated tools
and techniques to extract knowledge and actionable
insights from structured or unstructured data in order to
optimize business objectives.
9
Data – The 4 V‘s of Big Data
10
The 4 V‘s of Big Data - Volume
11
• Most of world‘s current data
is in the form of unstructured
data – natural text, images,
videos, raw sensory motor
data
• An autonomous car can
generate as much as 4
Terabyetes data per day.
• Facebook‘s last analysis on 60
PetaByte of data -- Spark
• Most business‘ have data in
classical formats and of
relatively small size.
The 4 V‘s of Big Data - Velocity
• A car can generate
data at 2ms scale.
• A real-time bidding
engine such as Google
RTB has to complete a
cycle within 100 ms.
• An increasing trend is
to control the data at
the edge device –
Edge Computing.
12
The 4 V‘s of Big Data - Vareity
13
14
The 4 V‘ – Vareity (II)
15
• Free Text
• Images
• Videos
• Audio
• Sensory-Motor
Data
Un-Structured
• Partially
modeled data
• XML, JSON,
MongoDB
• Variable Length
Time series e.g.
sensor readings
• Google Protocol
buffer
(Serializing
structured data)
Semi-structured
• Has a data
model ends up
in tabular form
• Relational
Databases /
Data
warehouses
• CSVs / Excels
Structured
• In advance AI fields
non-structured data
becomes vital.
• In classical business,
still the most values
lies in the structured
data.
• By anymeans include
human-experience in
the data.
The 4 V‘s - Veracity
• When does a data
scientist looks
stupid?
• What could be
sources of non-
reliability on the
data?
16
The 4 V‘s – Veracity (II)
17
Common Challenges in Data
The 4 V‘s – Veracity (III)
18
Best practices to increase reliability on what you see
The 4 V‘s – The 5th V
19
20
21
22
23
24
25
Data Science – The Science
26
Data Science
Data Science
Business
Technology
The Science – Artificial Intelligence
27
The Science – Algorithms
28
How Data Science algorithms/software can help in
decision making
The Science – The Maths Behind
29
Data
Structures
Statistics
Probability
Linear
Algebra
Caculus
Algorithms
Optimization
The Science – Machine Learning
• Machine learning is an algorithm implementation of
statistical methods for approximations and predictions.
30
Machine Learning - Workflow
31
Data
Preprocessing
Feature
Engineering
Dimensionality
Reduction +
Feature
Selection
Model Building
Model
Evaluation
Hyper
parameters
optimization
The Science – Deep Learning
32
Cognition
as a Service
GPU /
Scalable
Computing
Deep
Learning /
Neural
Network
Vast
Training
Data
Deep learning is a compute-intensive and adaptive machine learning
method that encapsulates feature engineering and model building.
The Technology
• The Tools that put life into a logical plan.
33
Data Science
Data Science
Business
Technology
The Technology – it‘s a lot
34
The Technology – A Subset
35
Programming
Tools
Cloud
Computing
Data
Storage and
Processing
Hardware
Applications/Business – What matters at
the end
36
Data Science
Data Science
Business
Technology
The Applications – Some Areas
37
Logistics Banking Insurance
E-commerce Retail Energy
Marketing Manufacturing Healthcare
Automotive Electronics Defense
38
USE CASES: Data Science In Retail
39
USE CASES: Data Science in Manufacturing
Reduction of
Supply Chain
Risk
Optimization of
Operations to a
Higher Degreed
than Ever
Perfecting
Quality as a
Competitive
Advantage
Predictive
Maintenance to
Reduce Cost
After-Sales
Improvements
Mass and
Individual
Customization
New Data-
Driven Revenue
Sources and
Business Models
From Local to
Enterprise-Level
Data Analytics
40
USE CASE: Data Science in Insurance Industry
Risk
Assessment
Fraud
Detection
Customer
Insights
Marketing Customer
Experience
Automation
41
Big data vs. Data mining
Big data vs. Data warehousing
Data warehousing vs. Data mining
42
Data warehousing is the process of compiling and organizing
data into one common database.
Data mining is the process of extracting meaningful data
from that database, it relies on the data compiled in the
data warehousing phase in order to detect meaningful
patterns.
Data analytics is the process of examining data sets in order
to draw conclusions about the information they contain,
with the aid of data mining approaches.
Big data is the term used for extremely large amounts of
data, refers to the effective classification and storage as
well as retrieval and analysis of large amount of data.
43
• Data science is the art of:
• Data analysis
• Utilizing statistics, mathematics and probabilistic theories
• Utilizing some programming (usually scripting languages)
• Discovering hidden patterns inside the data
• Enabling the algorithms to "learn themselves" based on the
patterns they unravel
• Data science is an umbrella term that encompasses
• Data analytics
• Data mining
• Machine learning
• Several other related disciplines
44
Data
warehousing
Data mining
Data
analytics
Data Science
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
•This is the era of the big data.
•Business competition’s fundamental tool is big data
analytics.
• The new benefits that big data analytics brings are:
• Speed
• Efficiency
• Few years ago a business would have gathered information,
run analytics and unearthed information that could be used
for future decisions.
• Today that business can identify insights for immediate
decisions. The ability to work faster – and stay agile – gives
organizations a competitive edge they didn’t have before.
61

More Related Content

Similar to DataScienceIntroduction.pptx

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 

Similar to DataScienceIntroduction.pptx (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Thilga
ThilgaThilga
Thilga
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 

Recently uploaded (20)

2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 

DataScienceIntroduction.pptx

  • 1.
  • 2. 2 • Data vs. Information • Data science • Big Data • Big data vs. conventional/small data • Data warehousing vs. data mining vs. big data analytics • Data science vs. data analytics • Steps of data analytics • Concluding remarks
  • 3. • Data is: • A collection of facts • Statistics used for reference or analysis • A series of observations • Measurements • Things known as facts, making the basis of reasoning or calculation. 3
  • 4. 4 • Information is: • Processed data • Meaning given to data by the way it is interpreted. • How do we use it • Decision making • Thing/artifact: • Information is what’s captured in a book, web page, or other resource. • More information is digital • Data on its own has no meaning, only when interpreted by some kind of data processing system does it take on meaning and become information.
  • 5. 5 Yes, Yes, No, Yes, No, Yes, No, Yes, No, Yes, Yes Raw Data Context Responses to the market research question – “Would you buy brand x at price y?” Information ??? Processing
  • 6. 6 14082018 Simply a number, no meaning 14/08/2018 Now it becomes a date, just by adding backward slashes Formatting makes it meaningful
  • 7. 7
  • 8. What is Data Science? 8 Data Science Data Science Business Technology An interdisciplinary field that employs sophisticated tools and techniques to extract knowledge and actionable insights from structured or unstructured data in order to optimize business objectives.
  • 9. 9
  • 10. Data – The 4 V‘s of Big Data 10
  • 11. The 4 V‘s of Big Data - Volume 11 • Most of world‘s current data is in the form of unstructured data – natural text, images, videos, raw sensory motor data • An autonomous car can generate as much as 4 Terabyetes data per day. • Facebook‘s last analysis on 60 PetaByte of data -- Spark • Most business‘ have data in classical formats and of relatively small size.
  • 12. The 4 V‘s of Big Data - Velocity • A car can generate data at 2ms scale. • A real-time bidding engine such as Google RTB has to complete a cycle within 100 ms. • An increasing trend is to control the data at the edge device – Edge Computing. 12
  • 13. The 4 V‘s of Big Data - Vareity 13
  • 14. 14
  • 15. The 4 V‘ – Vareity (II) 15 • Free Text • Images • Videos • Audio • Sensory-Motor Data Un-Structured • Partially modeled data • XML, JSON, MongoDB • Variable Length Time series e.g. sensor readings • Google Protocol buffer (Serializing structured data) Semi-structured • Has a data model ends up in tabular form • Relational Databases / Data warehouses • CSVs / Excels Structured • In advance AI fields non-structured data becomes vital. • In classical business, still the most values lies in the structured data. • By anymeans include human-experience in the data.
  • 16. The 4 V‘s - Veracity • When does a data scientist looks stupid? • What could be sources of non- reliability on the data? 16
  • 17. The 4 V‘s – Veracity (II) 17 Common Challenges in Data
  • 18. The 4 V‘s – Veracity (III) 18 Best practices to increase reliability on what you see
  • 19. The 4 V‘s – The 5th V 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. Data Science – The Science 26 Data Science Data Science Business Technology
  • 27. The Science – Artificial Intelligence 27
  • 28. The Science – Algorithms 28 How Data Science algorithms/software can help in decision making
  • 29. The Science – The Maths Behind 29 Data Structures Statistics Probability Linear Algebra Caculus Algorithms Optimization
  • 30. The Science – Machine Learning • Machine learning is an algorithm implementation of statistical methods for approximations and predictions. 30
  • 31. Machine Learning - Workflow 31 Data Preprocessing Feature Engineering Dimensionality Reduction + Feature Selection Model Building Model Evaluation Hyper parameters optimization
  • 32. The Science – Deep Learning 32 Cognition as a Service GPU / Scalable Computing Deep Learning / Neural Network Vast Training Data Deep learning is a compute-intensive and adaptive machine learning method that encapsulates feature engineering and model building.
  • 33. The Technology • The Tools that put life into a logical plan. 33 Data Science Data Science Business Technology
  • 34. The Technology – it‘s a lot 34
  • 35. The Technology – A Subset 35 Programming Tools Cloud Computing Data Storage and Processing Hardware
  • 36. Applications/Business – What matters at the end 36 Data Science Data Science Business Technology
  • 37. The Applications – Some Areas 37 Logistics Banking Insurance E-commerce Retail Energy Marketing Manufacturing Healthcare Automotive Electronics Defense
  • 38. 38 USE CASES: Data Science In Retail
  • 39. 39 USE CASES: Data Science in Manufacturing Reduction of Supply Chain Risk Optimization of Operations to a Higher Degreed than Ever Perfecting Quality as a Competitive Advantage Predictive Maintenance to Reduce Cost After-Sales Improvements Mass and Individual Customization New Data- Driven Revenue Sources and Business Models From Local to Enterprise-Level Data Analytics
  • 40. 40 USE CASE: Data Science in Insurance Industry Risk Assessment Fraud Detection Customer Insights Marketing Customer Experience Automation
  • 41. 41 Big data vs. Data mining Big data vs. Data warehousing Data warehousing vs. Data mining
  • 42. 42 Data warehousing is the process of compiling and organizing data into one common database. Data mining is the process of extracting meaningful data from that database, it relies on the data compiled in the data warehousing phase in order to detect meaningful patterns. Data analytics is the process of examining data sets in order to draw conclusions about the information they contain, with the aid of data mining approaches. Big data is the term used for extremely large amounts of data, refers to the effective classification and storage as well as retrieval and analysis of large amount of data.
  • 43. 43 • Data science is the art of: • Data analysis • Utilizing statistics, mathematics and probabilistic theories • Utilizing some programming (usually scripting languages) • Discovering hidden patterns inside the data • Enabling the algorithms to "learn themselves" based on the patterns they unravel • Data science is an umbrella term that encompasses • Data analytics • Data mining • Machine learning • Several other related disciplines
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 59. 59
  • 60. 60 •This is the era of the big data. •Business competition’s fundamental tool is big data analytics. • The new benefits that big data analytics brings are: • Speed • Efficiency • Few years ago a business would have gathered information, run analytics and unearthed information that could be used for future decisions. • Today that business can identify insights for immediate decisions. The ability to work faster – and stay agile – gives organizations a competitive edge they didn’t have before.
  • 61. 61