The document discusses key concepts related to data science including:
- The difference between data and information
- An overview of the volume, velocity, variety, and veracity (4 V's) of big data
- The steps involved in data analytics from data collection to model building
- Applications of data science in various industries like retail, manufacturing, and insurance
2. 2
• Data vs. Information
• Data science
• Big Data
• Big data vs. conventional/small data
• Data warehousing vs. data mining vs. big data
analytics
• Data science vs. data analytics
• Steps of data analytics
• Concluding remarks
3. • Data is:
• A collection of facts
• Statistics used for reference or analysis
• A series of observations
• Measurements
• Things known as facts, making the basis of
reasoning or calculation.
3
4. 4
• Information is:
• Processed data
• Meaning given to data by the way it is interpreted.
• How do we use it
• Decision making
• Thing/artifact:
• Information is what’s captured in a book, web page, or other
resource.
• More information is digital
• Data on its own has no meaning, only when interpreted by
some kind of data processing system does it take on
meaning and become information.
5. 5
Yes, Yes, No, Yes, No, Yes,
No, Yes, No, Yes, Yes
Raw Data
Context
Responses to the market
research question – “Would
you buy brand x at price y?”
Information ???
Processing
6. 6
14082018
Simply a number, no meaning
14/08/2018
Now it becomes a date, just by adding backward
slashes
Formatting makes it meaningful
8. What is Data Science?
8
Data Science
Data Science
Business
Technology
An interdisciplinary field that employs sophisticated tools
and techniques to extract knowledge and actionable
insights from structured or unstructured data in order to
optimize business objectives.
11. The 4 V‘s of Big Data - Volume
11
• Most of world‘s current data
is in the form of unstructured
data – natural text, images,
videos, raw sensory motor
data
• An autonomous car can
generate as much as 4
Terabyetes data per day.
• Facebook‘s last analysis on 60
PetaByte of data -- Spark
• Most business‘ have data in
classical formats and of
relatively small size.
12. The 4 V‘s of Big Data - Velocity
• A car can generate
data at 2ms scale.
• A real-time bidding
engine such as Google
RTB has to complete a
cycle within 100 ms.
• An increasing trend is
to control the data at
the edge device –
Edge Computing.
12
15. The 4 V‘ – Vareity (II)
15
• Free Text
• Images
• Videos
• Audio
• Sensory-Motor
Data
Un-Structured
• Partially
modeled data
• XML, JSON,
MongoDB
• Variable Length
Time series e.g.
sensor readings
• Google Protocol
buffer
(Serializing
structured data)
Semi-structured
• Has a data
model ends up
in tabular form
• Relational
Databases /
Data
warehouses
• CSVs / Excels
Structured
• In advance AI fields
non-structured data
becomes vital.
• In classical business,
still the most values
lies in the structured
data.
• By anymeans include
human-experience in
the data.
16. The 4 V‘s - Veracity
• When does a data
scientist looks
stupid?
• What could be
sources of non-
reliability on the
data?
16
17. The 4 V‘s – Veracity (II)
17
Common Challenges in Data
18. The 4 V‘s – Veracity (III)
18
Best practices to increase reliability on what you see
28. The Science – Algorithms
28
How Data Science algorithms/software can help in
decision making
29. The Science – The Maths Behind
29
Data
Structures
Statistics
Probability
Linear
Algebra
Caculus
Algorithms
Optimization
30. The Science – Machine Learning
• Machine learning is an algorithm implementation of
statistical methods for approximations and predictions.
30
31. Machine Learning - Workflow
31
Data
Preprocessing
Feature
Engineering
Dimensionality
Reduction +
Feature
Selection
Model Building
Model
Evaluation
Hyper
parameters
optimization
32. The Science – Deep Learning
32
Cognition
as a Service
GPU /
Scalable
Computing
Deep
Learning /
Neural
Network
Vast
Training
Data
Deep learning is a compute-intensive and adaptive machine learning
method that encapsulates feature engineering and model building.
33. The Technology
• The Tools that put life into a logical plan.
33
Data Science
Data Science
Business
Technology
37. The Applications – Some Areas
37
Logistics Banking Insurance
E-commerce Retail Energy
Marketing Manufacturing Healthcare
Automotive Electronics Defense
39. 39
USE CASES: Data Science in Manufacturing
Reduction of
Supply Chain
Risk
Optimization of
Operations to a
Higher Degreed
than Ever
Perfecting
Quality as a
Competitive
Advantage
Predictive
Maintenance to
Reduce Cost
After-Sales
Improvements
Mass and
Individual
Customization
New Data-
Driven Revenue
Sources and
Business Models
From Local to
Enterprise-Level
Data Analytics
40. 40
USE CASE: Data Science in Insurance Industry
Risk
Assessment
Fraud
Detection
Customer
Insights
Marketing Customer
Experience
Automation
41. 41
Big data vs. Data mining
Big data vs. Data warehousing
Data warehousing vs. Data mining
42. 42
Data warehousing is the process of compiling and organizing
data into one common database.
Data mining is the process of extracting meaningful data
from that database, it relies on the data compiled in the
data warehousing phase in order to detect meaningful
patterns.
Data analytics is the process of examining data sets in order
to draw conclusions about the information they contain,
with the aid of data mining approaches.
Big data is the term used for extremely large amounts of
data, refers to the effective classification and storage as
well as retrieval and analysis of large amount of data.
43. 43
• Data science is the art of:
• Data analysis
• Utilizing statistics, mathematics and probabilistic theories
• Utilizing some programming (usually scripting languages)
• Discovering hidden patterns inside the data
• Enabling the algorithms to "learn themselves" based on the
patterns they unravel
• Data science is an umbrella term that encompasses
• Data analytics
• Data mining
• Machine learning
• Several other related disciplines
60. 60
•This is the era of the big data.
•Business competition’s fundamental tool is big data
analytics.
• The new benefits that big data analytics brings are:
• Speed
• Efficiency
• Few years ago a business would have gathered information,
run analytics and unearthed information that could be used
for future decisions.
• Today that business can identify insights for immediate
decisions. The ability to work faster – and stay agile – gives
organizations a competitive edge they didn’t have before.