@FernanOrtega
Trends in data science
Fernando O. Gallego
fogallego@us.es
@FernanOrtega
@FernanOrtega
About me
• Co-Founder of Opileak
• CRO of Opileak
• PhD candidate at US
• Advisor: R. Corchuelo
• Member of TDG-Group
• Lecturer at D&T subject
2
@FernanOrtega
What people think about science
3
@FernanOrtega
What people think about science
4
@FernanOrtega
The truth is…
5
@FernanOrtega
What people think about data science
6
@FernanOrtega
Why data science is important
7
@FernanOrtega
Worldwide searches of data science
8
@FernanOrtega
Spain searches of data science
9
@FernanOrtega
Data science courses
10
@FernanOrtega
Data science jobs
11
@FernanOrtega
But…what is data science about?
12
@FernanOrtega
Lighten up my friend!
13
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
Brief history
• 1974 – Peter Naur – Datalogy & Data science
• 2002 – Committee on Data for Science & Technology
• 2003 – Journal of Data Science
• 2010 – Drew Conway – The data science Venn diagram
• 2010 – Mike Loukadis – What is Data science?
• 2011 – Irizarry, Peng & Leek – The keyword in “Data Science”
• 2013 – Vasant Dhar – “Data Science and Prediction”
@FernanOrtega
Drew Conway, 2010
@FernanOrtega
Mike Loukides, 2010
18
“Data science enables the creation of data
products”
@FernanOrtega
Irizarry, Peng & Leek, 2013
19
“The key word in data science is not ‘data’; but
‘science’. Data science is only useful when the
data are used to answer a question”
@FernanOrtega
Vasant Dhar, 2013
20
“Data science is the study of the
generalizable extraction of knowledge
from data”
@FernanOrtega
The general idea
Data
Model
Knowledge
21
@FernanOrtega
Knowledge is power!
1. Answer questions
2. Questions to products
3. Products to profit
22
@FernanOrtega
And…big data
23
•Untrusted
•Uncleansed
•Speed of
generation
•Rate of analysis
•Unstructured
•Semi-structured
•Structured
•Log
•Event
•Social media
•Click stream
Volume Variety
VeracityVelocity
Data scienceBig data
@FernanOrtega
And…business intelligence
24
Data scienceBusiness intelligence
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data5.Communicate
6.Implement
product
@FernanOrtega
And…data mining
25
Data scienceData mining
@FernanOrtega
Remember Drew Conway
26
@FernanOrtega
Multidisciplinary
Maths Statistics
Machine
learning
Software
engineering
27
@FernanOrtega
Everything is about hypes
Cloud
computing
IOT Big data
Data mining Data science
Business
intelligence
Opinion
mining
Social media
analysis
28
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Ask a question
• Skills:
– Science
– Domain expertise
– Curiosity
• Tools:
– Your brain
– Talking to experts
– Experience
32
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Get the data
• Skills:
– Web scraping
– Data cleaning
– Querying databases
• Tools:
– Web parsers
– SQL
– Python (pandas)
34
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Explore the data
• Skills:
– Get to know data
– Develop hypotheses
– Detect pattern or
anomalies
• Tools:
– D3.js
– Matplotlib
– Excel!
36
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Model the data
• Skills:
– Regression
– Machine learning
– Big data
• Tools:
– Spark (MLlib)
– Hadoop
– Mrjob
38
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Communicate
• Skills:
– Presentation
– Speaking
– Writing
• Tools:
– Chart tools
– LaTeX
– Powerpoint
40
@FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Implement product
• Skills:
– Product management
– Communication
– Programming
• Tools:
– Programming languages
– Servers
– Project management
42
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
Marketing
• Product
recommendation
• Channel optimisation
• Discount targeting
@FernanOrtega
Bank
• Financial products
recommendation
• Credit risk
• Fraud detection
45
@FernanOrtega
Customer support
• Call routing
• Message optimisation
• Customer satisfaction
46
@FernanOrtega
Human resources
• Ranking candidates
• Predict leaving
employees
• Training
recommendation
47
@FernanOrtega
Health
• Predict flu activity
• Cancer defeating
• Early detection of
Alzheimer’s dissease
48
@FernanOrtega
Social media
• Face recognition
• Contact suggestion
• Topic opinion
49
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
The team (1/3)
CEO Team Leader CRO
@FernanOrtega
The team (2/3)
CCO Comm. Technician Systems Admin
@FernanOrtega
The team (3/3)
Front-end Back-end DevOps
@FernanOrtega
At the beginning
54
@FernanOrtega
The right way
55
@FernanOrtega
The inspiration
56
@FernanOrtega
The first approach
57
@FernanOrtega
But the problem is
58
@FernanOrtega
Solution
59
@FernanOrtega
Retail
60
@FernanOrtega
Press
61
@FernanOrtega
Town councils
62
@FernanOrtega
Opileak technology features
• Volumetric measures
• Topics and summary
• Opinion mining
63
@FernanOrtega
My PhD thesis
64
@FernanOrtega
Polarity analysis
Attribute Polarity
“zoom quality” Positive
65
Attribute Polarity
“resolution” Neutral
“Build quality” Negative
@FernanOrtega
The problem with polarity analysis
66
@FernanOrtega
Conditionals definition
67
Wait! the opinion is
only true in a certain
situation
“This camera doesn’t work
well at night”
@FernanOrtega
Our proposal
68
Torii!
Opinion-mining process
that computes
overviews making the
conditions of opinions
and factual information
explicit.
@FernanOrtega
Roadmap
Introduction
Background
Real-world applications
Opileak
Conclusions
@FernanOrtega
Data science
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
@FernanOrtega
Lesson learned today
71
Multidisciplinary Many applications Our technology
@FernanOrtega
Opileak rocks!
72
@FernanOrtega
Thanks
Fernando O. Gallego
fogallego@us.es
@FernanOrtega

Trends in data science