Anastasiia Kornilova
1
WHO AM I?
• 3+ years in Data Science
• MS in Applied Mathematics
• Professional interests: recommendations systems, natural language
processing, scalable data science solutions
• Authors of two blogs: energyfirefox.blogspot.com,
datascientistdiary.blogspot.com
• Fan of online education (20+ finished MOOCs)
• What is Data Science and why do we need it?
• Data Scientists.Who they are and what do they
do?
• How to start?
• Practical case
AGENDA
3
DATA IS EVERYWHERE
TONNS OF DATA
DATA EXCHANGE
MAKING SENSE OF DATA
USERS FEEDBACK
DATA ISTHE NEW OIL
WHAT IS DATA SCIENCE?
Andrew Conway
10
FRAUD DETECTION
RECOMMENDATIONS
OPTIMISATION
SENTIMENT ANALYSIS
IMAGE/SPEECH/TEXT
RECOGNITION
FUTURE PREDICTION
FIND PATTERNS IS USER
BEHAVIOUR/ACTIONS
WHO USES DATA SCIENCE?
18
DATA SCIENTISTS DEMAND
19
WHO DATA SCIENTISTS ARE
AND WHAT DOTHEY DO?
20
TYPES OF DATA SCIENTISTS
A - Analysis
B - Building
Robert Chang
DSTYPE “A” - ANALYSIS
• making sense of data or working with it in a fairly static way.
• very similar to a statistician (and may be one)
• knows all the practical details of working with data that
aren’t taught in the statistics curriculum: data cleaning,
methods for dealing with very large data sets, visualization,
deep knowledge of a particular domain, writing well
about data
• share some statistical background withType A
• very strong coders and may be trained software
engineers
• mainly interested in using data “in production.”
• build models which interact with users, often serving
recommendations (products, people you may know, ads,
movies, search results).
DSTYPE “B” - BUILDING
WHAT DOTHEY DO?
Understand
Collect
Data exploration
Clean and
transform
Model
Validate
Communicating
results
Deploy
WHAT DOTHEY DO?
TYPICAL DATA SCIENCE
WORKFLOW
• Preparing to run a model (Gathering, cleaning,
transformation)
• Running the model
• Interpreting the results
“80% of work” - Aaron Kimball
“Other 80% of the work”
26
REQUIRED SKILLS
27
DOMAIN KNOWLEDGE AND
SOFT SKILLS
• Passionate about the business
• Curios about data
• Influence without authority
• Hacker mindset
• Problem solver
• Strategic, proactive, creative, innovative and collaborative
28
MATH AND STATISTICS
• Machine learning
• Statistical modelling
• Experiment design
• Supervised learning
• Unsupervised learning
• Optimisation
29
PROGRAMMING AND
DATABASES
• Computer science fundamentals
• Scripting language
• Statistical computing language
• Databases
• Relational algebra
• Distributed computations
30
COMMUNICATION AND
VISUALIZATION
• Ability to engage with senior management
• Storytelling skills
• Visual art design
• Knowledge of a vizualisation tool
• Translate data-driven insights into decisions and actions
31
WHERETO OBTAIN SKILLS?
HARD WAY
33
TRADITIONAL WAY
• LITS - Machine Learning
• UCU - CS Master Degree
• Data Science Degree
• Kyivstar - Big Data University
34
EASY WAY
35
MACHINE LEARNING
≠
DATA SCIENCE
AND NOW WHAT?
38
AREYOU GOOD ENOUGH?
AREYOU GOOD ENOUGH?41
KAGGLE STORY
42
Problem owners
Problem solvers
43
WHAT CAN YOU FIND ON
KAGGLE?
• Knowledge
• Money
• Job
• Reputation
44
1. Understand
2. Collect
3. Data exploration
4. Clean and
transform
5. Model
6. Validate
7. Communicating
results
Deploy
45
PASSION AND PERSISTENCE
TIME FOR FUN
San Francisco crimes analysis
THANK YOU!
LINKS
• https://medium.com/@rchang/my-two-year-journey-as-a-data-scientist-at-twitter-f0c13298aee6#.49jdojamn
• https://blog.kissmetrics.com/how-netflix-uses-analytics/
• http://recode.net/2015/10/07/jawbone-isnt-a-hardware-company-anymore-says-ceo-hosain-rahman/
• https://jawbone.com/blog/napa-earthquake-effect-on-sleep/
• http://cs.ucu.edu.ua/
• http://lits.com.ua/course/machine-learning/
• http://bigdata.kyivstar.ua/
• https://www.kaggle.com/
• http://inversquare.github.io/moon/mooncrime.html#part-2-crimes-of-passion
• http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059030#abstract0
• http://blog.babajob.com/wp-content/uploads/2015/08/manyhands-300x161.jpg
• http://www.criminalelement.com/images/stories/-2015-Jul-Sep/sherlock-holmes-benedict-cumberbatch.jpg

Introduction to Data Science