Data Science
the art of foul play

Sergey Shelpuk
SoftServe, Inc.
sshel@softserveinc.com
September, 2013
“Your goal should not be to buy players, it
should be to buy wins. In order to buy
wins you should buy runs” (c)
More data is
available for
companies

Storage
technologies
allow storing
and operating it

Advanced
analytics could
be app...
Data Scientist: The Sexiest
Job of the 21st Century
For Today’s Graduate,
Just One Word: Statistics

Hal Varian
chief econ...
Top 10 Strategic Technology Trends for 2013
• Mobile Device Battles
• Mobile Applications and
HTML5
• Personal Cloud
• Ent...
McKinsey Global Institute projects
approximately 140,000 to 190,000 unfilled
positions of data analytics experts in the U....
Business
Tasks

• Define prospective
customers
• Define traffic jams in
the city
• Recommend
restaurants and
menus
• Adjus...
Cross Industry Standard Process
for Data Mining
SoftServe Data Science Group
Knowledge Model
Business

Logic

Technology

Level

Level

Level

• Basics of Business
Analys...
Deep Learning Neural Networks
Feature extractor

Task: recognize a motorcycle

Learning
algorithm
The concept of Autoencoder
The concept of Autoencoder

…
…

…

…
…

© Andrew Y. Ng
Large scale deep learning networks

See more: Building high-level features using large scale unsupervised learning
Deep learning neural networks
Pre-trained as Autoencoder

Typical classification
neural network
Few results
Images

Video

Text/NLP

© Andrew Y. Ng
Deep Learning in SoftServe
Phase 1 results
(old-fashion anomaly detection)

Phase 2 prototype
(deep learning approach)
Useful Resources
• Introduction to Statistics
• Introduction to Artificial Intelligence

• Machine Learning
• Probabilisti...
Thank you!
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy Shelpuk
Upcoming SlideShare
Loading in …5
×

Data Science: The Art of Foul Play by Serhiy Shelpuk

1,103 views
957 views

Published on

Serhiy Shelpuk, Lead Data Scientist, Competence Manager at SoftServe, Inc., delivered an insightful presentation on Data Science and SoftServe`s Data Science Group Knowledge Model at the 2013 IT Weekend Ukraine conference that took place on September 14, 2013, in Kyiv, Ukraine. Here`s his presentation.

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,103
On SlideShare
0
From Embeds
0
Number of Embeds
54
Actions
Shares
0
Downloads
16
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Data Science: The Art of Foul Play by Serhiy Shelpuk

  1. 1. Data Science the art of foul play Sergey Shelpuk SoftServe, Inc. sshel@softserveinc.com September, 2013
  2. 2. “Your goal should not be to buy players, it should be to buy wins. In order to buy wins you should buy runs” (c)
  3. 3. More data is available for companies Storage technologies allow storing and operating it Advanced analytics could be applied to this new data to achieve competitive advantage
  4. 4. Data Scientist: The Sexiest Job of the 21st Century For Today’s Graduate, Just One Word: Statistics Hal Varian chief economist at Google “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” Data Scientist: The Hottest Job You Haven't Heard Of
  5. 5. Top 10 Strategic Technology Trends for 2013 • Mobile Device Battles • Mobile Applications and HTML5 • Personal Cloud • Enterprise App Stores • The Internet of Things • Hybrid IT and Cloud Computing • Strategic Big Data • Actionable Analytics • In Memory Computing • Integrated Ecosystems
  6. 6. McKinsey Global Institute projects approximately 140,000 to 190,000 unfilled positions of data analytics experts in the U.S. by 2018 and a shortage of 1.5 million managers and analysts who have the ability to understand and make decisions using big data.
  7. 7. Business Tasks • Define prospective customers • Define traffic jams in the city • Recommend restaurants and menus • Adjust UI to the particular user • Classify body part on X-Ray image • Define market niche • Define influencers in the social networks • Define similar customers or projects in portfolio • Define informal groups in the organization • Define fraud bank transaction • Define network intrusion attempts • Provide automatic aircraft engine testing • Provide automatic IT infrastructure monitoring • Provide clinical test analysis • Define the best price for the goods or services to maximize profits • Define best working schedule for the store • Define best amount of production • Define best business rules Model Family Classification Clustering Anomaly Detection Optimization • Naïve Bayes • Logistic regression • Support Vector Machines • Neural Networks • K-Means • K nearest neighbor • Self-organized maps • Mixture of Gaussians Algorithms • Mixture of Gaussians • Self-learning anomaly detection • • • • • Gradient descent Simplex method Newton’s method Normal equations Genetic algorithms
  8. 8. Cross Industry Standard Process for Data Mining
  9. 9. SoftServe Data Science Group Knowledge Model Business Logic Technology Level Level Level • Basics of Business Analysis • Basics of Economics • Basics of Product Management • Basics of Organizational Behavior • Statistics/Probability • Machine Learning • Data Mining • Artificial Intelligence • Matlab/Octave •R • SQL • Parallel Computing
  10. 10. Deep Learning Neural Networks
  11. 11. Feature extractor Task: recognize a motorcycle Learning algorithm
  12. 12. The concept of Autoencoder
  13. 13. The concept of Autoencoder … … … … … © Andrew Y. Ng
  14. 14. Large scale deep learning networks See more: Building high-level features using large scale unsupervised learning
  15. 15. Deep learning neural networks Pre-trained as Autoencoder Typical classification neural network
  16. 16. Few results Images Video Text/NLP © Andrew Y. Ng
  17. 17. Deep Learning in SoftServe Phase 1 results (old-fashion anomaly detection) Phase 2 prototype (deep learning approach)
  18. 18. Useful Resources • Introduction to Statistics • Introduction to Artificial Intelligence • Machine Learning • Probabilistic Graphical Models • Statistics One
  19. 19. Thank you!

×