An overview of SoftServe's Data Science service line.
- Data Science Group
- Data Science Offerings for Business
- Machine Learning Overview
- AI & Deep Learning Case Studies
- Big Data & Analytics Case Studies
Visit our website to learn more: http://www.softserveinc.com/en-us/
3. Today, SoftServe is a leading technology solutions company with 4,000 employees,
specializing in software product and application development and services.
6. Data Science Group
Iurii Milovanov
Lead Data Scientist
Tetiana Gladkikh
Data Scientist,
Competency Manager
Roman Grubnyk
Data Scientist
Ihor Kostiuk
Data Scientist
Taras Hnot
Data Analyst
Volodymyr Solskyy
Data Analyst
Pavlo Kramarenko
Data Analyst,
BI Consultant
7. Core Competency
Artificial
Intelligence
State-of-the-art
Machine Learning
Deep human-level
Insight
Unstructured and
High-dimensional data
High Performance
Computing
Big Data
Apache Hadoop
Ecosystem
Data Collection
and Augmentation
Big Analytics
Real-time & Batch
Data Processing
Predictive
Analytics
Forecasting
Risk Analysis
Cluster Analysis
Decision Support
Systems
Data
Analysis
Data Exploration
Statistical Inference
Visualization
Business Intelligence
8. Domain-Specific Expertise
• Computer Vision – deep image and video understanding
• Natural Language Processing – human language
processing and understanding
• Speech Recognition – spoken language processing
(speech-to-text and text-to-speech)
• Social Media Analysis – web mining, behavioral analytics,
and social network analysis
• Recommender Engines – help users find content they
might like by making automatic personalized
recommendations
10. Methodology & Typical Roadmap
Initial Stage
Research
Phase
Prototyping
Data
Collection
Data
Exploration
Data
Modelling
Result
Communication
Performance
Tuning
Model
Integration
Deployment
Phase
Evaluation
Inputs:
• Problem definition
• Initial requirements
Outputs:
• Data processing model
• Final requirements
1
2
11. Tiny Neural Network Framework
TNNF – an open source GPU-friendly Deep Learning library developed by Data Science Group @ SoftServe
13. Data Science in Retail
Business Area:
• Customer 360 view
• Product recommendation
• Direct marketing
• Opinion mining
• Sales analytics
• Logistics optimization
Improves customer and business insights, provides a deep understanding of
customer’s profile and behavior.
14. Data Science in Healthcare
Business Area:
• EMR processing
• Patient monitoring
• Biometric data analytics
• Decision support systems
• Computer-aided diagnosis
• Precision medicine
Helps physicians make better decisions across the board – from personalized
treatments to preventive care.
15. Data Science in Telecom
Business Area:
• CDR analytics
• Geospatial analysis
• Anomaly and fraud detection
• Network optimization
Applies real-time and batch predictive analytics to analyze subscriber behavior and
create individual network usage policies.
16. Data Science in HR
Business Area:
• Workforce analysis
• Capacity management
• Employee retention
• Talent analytics
• Resume screening
Provides a deep insight on company's employee profile in order to help HR
department in solving employee-focused challenges.
17. Data Science in Social Media Marketing
Business Area:
• Social profiling
• Information flow analysis
• Promotion optimization
• Community detection
• Behavioral analytics
Discovers hidden trends, patterns and relationships in social media in order to
enable micro-market campaign management, maximize engagement and optimize
social promotion strategy.
18. Data Science in IT & Security
Leverages ultra-large volumes of data from IT Infrastructure, improves overall
service availability and reduces time required for root cause analysis.
Business Area:
• Operations analytics • Network log analysis
• Anomaly and Intruder detection • Cloud optimization
19. Data Science in Finance
Gives a significant competitive advantage by incorporating new types of
unstructured and semi-unstructured data into financial decision-making, building
predictive models and live market simulations.
Business Area:
• Financial forecasting
• Price optimization
• Risk management
• Fraud detection
• Bitcoin analytics
21. Premise of Machine Learning
Complex problems (such as image, text or
speech processing) usually are:
• High-dimensional (1000+ dimensions)
• Poorly defined, since we still don’t know how its
done in our brain
Therefore, hand-coding for such problems
suffers a 'complexity collapse' and is not really
feasible
22. Basic idea of Machine Learning
Training
Data
Learning
Algorithm
Model
Prediction
Engine
New
Data
Predictions
Instead of writing a program by hand, we use a set of observations to uncover an underlying
process which can be generalized to a new data
CAVEAT: Although Machine Learning has been already proved to be theoretically feasible,
we need efficient algorithms to uncover complex patterns and relationships in data
Testing
Data
23. AI & Deep Learning
Application Domains:
• Image Classification
• Object Recognition
• Motion Detection
• Speech-to-Text
• Emotion Recognition
• Robotics
Deep Learning – family of Machine Learning
techniques inspired by cognitive and neuroscience,
decent state-of-the-art in Artificial Intelligence
24. Successful applications of Deep Learning
• Apple, Google and Baidu use Deep Learning for speech
recognition
• Content recommendation engines at Amazon, Netflix
and Google highly rely on Deep Learning
• Facebook applies Deep Learning to facial detection and
recognition
• Twitter analyze their twit-database using DL techniques
• Deep Learning plays an important role in fraud
detection at PayPal
25. Biggest challenges in Machine Learning
• Training data
• Noisy and missing values
• Model generalization
• Non-convex optimization
• Hyperparameters tuning
• Result interpretation
• Computational resources
26. GPU-accelerated Computing
Perfectly fits to iterative Machine Learning algorithms
Gives an approximately up to 40x speedup on training time
Inherently more energy efficient than other ways of
computation
CUDA – general purpose processing framework developed
by NVIDIA
Where GPUs are deployed:
28. Case Study: X-Ray Image Recognition
Technologies:
Matlab/Octave
Python
Deep Learning
Probabilistic modeling
Business Area:
Healthcare. Computer-aided diagnosis system
(CADe) that can recognize human body part
on X-Ray image and detect broken or
fractured bones
Analytical Engine
This is a hand. Broken
bone detected
X-Ray
Image
29. Case Study: Image Object Recognition
Business Area:
Retail. Software solution to analyze and
recommend optimal products placement on store
shelves
Key Steps:
Preprocessing – scaling, normalization etc.
Segmentation – define areas of interest
Recognition – where is the product located
Classification – what kind of product we can see
30. Case Study: Smart Agents, DRLearner.org
Business Area:
DRLearner is SoftServe’s open source implementation of the
deep reinforcement learning algorithm for game playing,
invented by Google DeepMind. This is a successful approach to
mimic aspects of human brain to solve complex problems such
as autonomous car control
Techniques:
Convolutional Neural Networks
Reinforcement Learning
Python
TNNF/Theano
32. Case Study: Social Trends Analysis
Business Area:
Distributed solution to monitor and analyze
customers' opinion on Ukrainian banking industry
Key Steps:
Web Crawling
Data Transformation
Sentiment Analysis
Social Network Analysis (SNA)
Time-series Analysis
Data Visualization
33. Case Study: Social Trends Analysis
Learning-based Sentiment analysis:
• Collect a training set of positive and negative
examples
• Perform data cleaning and normalization on
unstructured textual data
• Build a model that generalizes to different domains
Social Network Analysis:
• Discover hidden social communities
• Perform bot-detection
• Discover social information flow
Time-series analysis:
• Calculate basic time-series statistics
• Discover hidden trends and fluctuations in time-series
• Compare time-series sequences
34. Case Study: Recommender Systems & SmartTraveler
Business Area:
Helps users find content they might like by
making automatic personalized
recommendations
Application Domains:
E-commerce
News
Entertainment
Social Networks
Tourism and visitor guides
36. Case Study: Log Analytics and Anomaly Detection
Business case:
• Discover hidden patterns and relationships in
Netflow logs in order to identify unusual
activity in corporate network infrastructure
Problem Statement:
Identify the items, events or observations which
do not conform to an expected pattern or
behavior
37. Case Study: Log Analytics and Anomaly Detection
Timestamp
Number of packets
Volume of packets (in bytes)
Source IP
Destination IP
Source port
Destination port
Protocol
Netflow Data:
38. Case Study: Log Analytics and Anomaly Detection
Time-Series SegmentationDynamic Thresholds
39. Check out our Data Science and Big Analytics web pages
For more details on our Advanced Analytics service line