A Day of
Empowerment

Building Predictive Analytics on
Big Data Platforms
1. Opportunity: Big Data
2. Demystifying Predictive Analytics
3. Taking advantage of combined power
Striving for an
“unfair”
competitive advantage
Old Days
New Days
Big Data
could be looking
like rubbish
Until
you
find out
the use
of it
“Data are becoming the new raw
material of business”
- Craig Mundie, head of research and strategy, Microsoft
Modeling true risk

Network data analysis to
predict failure

Customer churn analysis

Threat analysis

Recommendations

F...
Collect and
Store

• Complex data (text
files, audio, video, images, …)
• Multiple sources
• Lots of data

Process

• Batc...
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Aggregation and
Transformati...
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Transport

Event Aggregation...
“The idea that the future is
unpredictable is undermined every
day by the ease with which the
past is explained”
― Daniel ...
More data is
available for
companies

Storage
technologies
allow to store
and operate it

Advanced
analytics could
be appl...
Descriptive

Diagnostic

Predictive

Prescriptive

What happened?

Why did it
happen?

What is going to
happen?

What shou...
Senior
(Executive)
Management

Ambiguity
The goals to be achieved or the problem to be solved is unclear
Alternatives are ...
Define objective

• Increase customer
satisfaction level
• Identify
prospective
customers
• Identify crossselling
opportun...
Business
Tasks

Model Family

Algorithms

• Define prospective
customers
• Define traffic jams in
the city
• Recommend
res...
Google to Buy Waze
for $1.3 Billion
Xerox plans to clear
traffic on I-10

The promise of better
data has MetLife investing...
Description:
Cloud-based service for providing more
accurate estimates of the credit
worthiness (loan scoring) using publi...
Facebook

Twitter

LinkedIn
API

Processing

Preprocessing

MySQL

(data filtering,
data cleansing)

SAP HANA

Credit scor...
Description:
Computer aid diagnostic
system that can
recognize human body
part on X-Ray image and
detect broken or
fractur...
Technology Expertise
Services
Big Data and NoSQL

Data Warehouse

Data Integration

BI Platforms
Big Data Analytics
Predictive Analytics
Data Science Service
Data Integration
Data Warehousing

Data Visualization and Ana...
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
Upcoming SlideShare
Loading in …5
×

Building Predictive Analytics on Big Data Platforms

868 views
673 views

Published on

SoftServe Innovation Conference in Austin, Texas 2013
Building Predictive Analytics on Big Data Platforms presented by Olha Hrytsay (BI Consultant) and Serhiy Shelpuk (Lead Data Scientist)

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
868
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Building Predictive Analytics on Big Data Platforms

  1. 1. A Day of Empowerment Building Predictive Analytics on Big Data Platforms
  2. 2. 1. Opportunity: Big Data 2. Demystifying Predictive Analytics 3. Taking advantage of combined power
  3. 3. Striving for an “unfair” competitive advantage
  4. 4. Old Days
  5. 5. New Days
  6. 6. Big Data could be looking like rubbish
  7. 7. Until you find out the use of it
  8. 8. “Data are becoming the new raw material of business” - Craig Mundie, head of research and strategy, Microsoft
  9. 9. Modeling true risk Network data analysis to predict failure Customer churn analysis Threat analysis Recommendations Feature Usage analysis Ad targeting …
  10. 10. Collect and Store • Complex data (text files, audio, video, images, …) • Multiple sources • Lots of data Process • Batch processing • Parallel execution • Cluster solution Analyze • • • • • Simple visualization (reports, dashboard) Text mining Sentiment analysis Prediction models Collaborative filtering
  11. 11. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Aggregation and Transformation Event Transport Event Serialization and Archiving Event Processing and Analytics Presentation Query Engine Interactive Search User Full-text Search engine Event DB Rules Engine Reports and Dashboards Full-text Index Predictive Analytics Alerts Visualization E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion
  12. 12. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Transport Event Aggregation and Apache Flume Transformation Event Serialization and Archiving Protobuf, Avro, Thrif t, MessagePack Event Processing and Analytics Presentation Query Engine Impala Interactive Search Custom User Full-text Solr, ElasticSe Search engine arch Full-text Event DB HDFS, Hbase, Cas Index sandra Rules Engine Drools Reports and JasperSoft, Dashboards Tableau Predictive Analytics R Alerts Visualization Custom E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion Cloudera Manager, Apache Ambari
  13. 13. “The idea that the future is unpredictable is undermined every day by the ease with which the past is explained” ― Daniel Kahneman, Thinking, Fast and Slow
  14. 14. More data is available for companies Storage technologies allow to store and operate it Advanced analytics could be applied to this new data to achieve competitive advantage
  15. 15. Descriptive Diagnostic Predictive Prescriptive What happened? Why did it happen? What is going to happen? What should we do about that? Hindsight Insight Foresight
  16. 16. Senior (Executive) Management Ambiguity The goals to be achieved or the problem to be solved is unclear Alternatives are difficult to define Information about outcomes is unavailable. Uncertainty Middle Management Managers know which goals they wish to achieve. Information about alternatives and future events is incomplete. Risk Junior (Line) Management A decision has clear goals and good information is available, but the future outcomes associated with each alternative are subject to chance. Certainty All of the information the decision maker needs is fully available
  17. 17. Define objective • Increase customer satisfaction level • Identify prospective customers • Identify crossselling opportunities • Decrease time to market • Decrease costs of marketing campaigns Identify data sets Design the model • Historical data on • Classification model for Internet customers from users defining CRM system what one is • Geographical interested in location data • Smartphone data • Adaptive control models for • Social network managing IT and data network • Text data from the infrastructure Internet pages • Probabilistic • Image data from model for defining the medical credit worthiness sources Design the solution • Data storage type • Logical database design • Availability and scalability of the solution • Integration into corporate information environment • Solution deployment model Implement the solution • Add new functionality to the existing corporate BI platform • Implement new BI solution • Enrich existing business system (CRM, ERP) with the predictive analytics functionality
  18. 18. Business Tasks Model Family Algorithms • Define prospective customers • Define traffic jams in the city • Recommend restaurants and menus • Adjust UI to the particular user • Classify body part on X-Ray image • Define market niche • Define influencers in the social networks • Define similar customers or projects in portfolio • Define informal groups in the organization • Define fraud bank transaction • Define network intrusion attempts • Provide automatic aircraft engine testing • Provide automatic IT infrastructure monitoring • Provide clinical test analysis • Define the best price for the goods or services to maximize profits • Define best working schedule for the store • Define best amount of production • Define best business rules Classification Clustering Anomaly Detection Optimization • Naïve Bayes • Logistic regression • Support Vector Machines • Neural Networks • K-Means • K nearest neighbor • Self-organized maps • Mixture of Gaussians • Mixture of Gaussians • Self-learning anomaly detection • • • • • Gradient descent Simplex method Newton’s method Normal equations Genetic algorithms
  19. 19. Google to Buy Waze for $1.3 Billion Xerox plans to clear traffic on I-10 The promise of better data has MetLife investing $300M in new tech Gracenote did a whole business on recommending music Obama’s data scientists built a volunteer army on Facebook
  20. 20. Description: Cloud-based service for providing more accurate estimates of the credit worthiness (loan scoring) using publicly available data from social networks. Service is oriented to be used by banks. Technologies:      Amazon EC2 MySQL SAP HANA R JAVA Credit Score
  21. 21. Facebook Twitter LinkedIn API Processing Preprocessing MySQL (data filtering, data cleansing) SAP HANA Credit scoring API (scoring model)
  22. 22. Description: Computer aid diagnostic system that can recognize human body part on X-Ray image and detect broken or fractured bones X-Ray Image Technologies:      Matlab/Octave Python PyBrain NumPy SciPy Analytical Engine This is a hand. Broken bone detected
  23. 23. Technology Expertise Services
  24. 24. Big Data and NoSQL Data Warehouse Data Integration BI Platforms
  25. 25. Big Data Analytics Predictive Analytics Data Science Service Data Integration Data Warehousing Data Visualization and Analysis

×