®© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
July 23, 2014
®© 2014 MapR Technologies 2
Our Speakers
Jin Kim
VP, Marketing
Skytree
Nitin Bandugula
Product Marketing
MapR
®© 2014 MapR Technologies 3
Agenda
•  Introduction to Hadoop
•  Machine Learning on Hadoop
•  Advanced Machine Learning
• ...
®© 2014 MapR Technologies 4
Big Data is Overwhelming Traditional Systems
•  Mission-critical reliability
•  Transaction gu...
®© 2014 MapR Technologies 5
Hadoop: The Disruptive Technology at the Core of Big Data
JOB TRENDS FROM INDEED.COM
Jan ‘06 J...
®© 2014 MapR Technologies 6
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
•  Data staging
•  Archive
•  Data tra...
®© 2014 MapR Technologies 7
MapR: Best Hadoop Distribution for Customer Success
Top Ranked
Exponential
Growth
500+
Custome...
®© 2014 MapR Technologies 8
The Power of the Open Source CommunityManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOS...
®© 2014 MapR Technologies 9
Machine Learning StackManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
Y...
®© 2014 MapR Technologies 10
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONS
INTELLIGEN...
®© 2014 MapR Technologies 11
How Does Big Data Help Machine Learning
Big Data => Better Models
•  A machine that has playe...
®© 2014 MapR Technologies 12
Common Machine Learning Use Cases on Hadoop
•  Linear/Polynomial Regression – fit to an equat...
®© 2014 MapR Technologies 13© 2014 MapR Technologies
®
Machine Learning on Hadoop
®© 2014 MapR Technologies 14
Modeling Process – Constant Iterations / Free to Fail
•  Modeling Data Set + Validation Data ...
®© 2014 MapR Technologies 15
Development and Deployment Process
Need newer data sets from production for model building an...
®© 2014 MapR Technologies 16
Volumes and Mirroring
The Conflict:
Experimental, Free to Fail Modeling Process Needs Product...
®© 2014 MapR Technologies 17
Snapshots
The Idea: Version control of data as well as models
Data Version Control:
How does ...
®© 2014 MapR Technologies 18
Read Write NFS Access
•  Existing applications, custom libraries all work out-of-the-box
•  B...
®© 2014 MapR Technologies 19© 2014 MapR Technologies
®
Machine Learning Options
®© 2014 MapR Technologies 20
Apache Spark
•  Spark – In Memory Processing Framework
•  Works well with the iterative machi...
®© 2014 MapR Technologies 21
Apache Mahout
•  In-built algorithms for popular techniques such as
Recommenders, Classificat...
®© 2014 MapR Technologies 22
Advanced Machine Learning with Skytree
DATA MARTS DATA WAREHOUSE
MapR Data Platform
Offload
R...
®© 2014 MapR Technologies 23© 2014 MapR Technologies
®
Skytree
®© 2014 MapR Technologies 24
Q&AEngage with us!
1.  Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox
2. Download...
THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Jin H. Kim
VP of Marketing
jin@skytree.net!
1
THE MACHINE LEARNING COMPANY ®
Machine learning: !
The modern science of finding patterns and making predictions from data:...
THE MACHINE LEARNING COMPANY ®
Machine Learning has finally arrived!
50’s-70s Mid 90’s - Today80’s-90’s
3
1st Wave:
Artific...
THE MACHINE LEARNING COMPANY ®
Skytree: Machine Learning for High-Value,
High-Complexity Problems!
•  Predictive optimal d...
THE MACHINE LEARNING COMPANY ®
High-Value, High-Complexity Problems: 

Critical Elements in Common!
1.  High-accuracy need...
THE MACHINE LEARNING COMPANY ®
Use Case Examples!
6
Financial
Services
Fraud Analysis
Credit Scoring
Pricing
Churn Analysi...
THE MACHINE LEARNING COMPANY ®
Global Leaders Select Skytree
WORLD’S	
  LEADING:	
  
Anomaly detection
Logis3cs	
  &	
  Sh...
THE MACHINE LEARNING COMPANY ®
“10	
  Hot	
  Big	
  Data	
  Startups	
  to	
  Watch”	
  
“Skytree	
  Looms	
  in	
  Big	
 ...
THE MACHINE LEARNING COMPANY ®
Insurance: Targeted Auto Policies with
Telemetric Data!
•  Business challenge!
–  Inaccurat...
THE MACHINE LEARNING COMPANY ®
•  Global 100 Financial Institution!
•  Major Pain points: Speed & Accuracy of Current appr...
THE MACHINE LEARNING COMPANY ®
Asset Intensive: Predict Parts Failure through
Telemetric Data!
•  Business challenge!
–  E...
THE MACHINE LEARNING COMPANY ®
Predict Parts Failure through Telemetric Data!
12
Data Stored on Hadoop Cluster
12
Build fa...
THE MACHINE LEARNING COMPANY ®
Improve Customer Retention with Machine
Learning!
•  Business challenge!
–  Cost of attract...
THE MACHINE LEARNING COMPANY ®14 Skytree Confidential
Performance Studies by Customers!
Next Logical Product – Right Offer...
THE MACHINE LEARNING COMPANY ®
Real-Time Fraud Detection!
•  Business challenge!
–  Growing complexity of fraud patterns!
...
THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network – Before!
Transaction Data
Transferred
From Database to
Lin...
THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network - Now!
Modeling&Real-TimeScoreEnvironment
•  Customer can
u...
THE MACHINE LEARNING COMPANY ®
“Key to increasing fraud detection accuracy”!
•  Use all of the data: Sampling can decrease...
THE MACHINE LEARNING COMPANY ®19
Skytree Maximizes Predictive Accuracy!
19
Advantages Benefits
Greater chance of having th...
THE MACHINE LEARNING COMPANY ®
Sources of Generalization Error!
20
Motivations: Sources of Generalization Error
Excess Err...
THE MACHINE LEARNING COMPANY ®
First Principles: Sources of prediction error!
21
Motivations: Sources of Generalization Er...
THE MACHINE LEARNING COMPANY ®
1.x
MAPR Data Platform
Spark
2.x/
YARN
ZooKeeper
Web Services
DataSources/Targets
OLTP / ED...
THE MACHINE LEARNING COMPANY ®
Why Skytree? 

Why do companies pick us for Big Data analytics?!
23
INVESTORS!
(22M+)!
Buil...
THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Thank You.
www.skytree.net
!
24
THE MACHINE LEARNING COMPANY ®
Q&AEngage with us!
1.  Download the MapR Sandbox for Hadoop: !
www.mapr.com/sandbox!
!
2. D...
Upcoming SlideShare
Loading in...5
×

MapR & Skytree:

1,010

Published on

Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.

Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,010
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "MapR & Skytree: "

  1. 1. ®© 2014 MapR Technologies 1 ® © 2014 MapR Technologies July 23, 2014
  2. 2. ®© 2014 MapR Technologies 2 Our Speakers Jin Kim VP, Marketing Skytree Nitin Bandugula Product Marketing MapR
  3. 3. ®© 2014 MapR Technologies 3 Agenda •  Introduction to Hadoop •  Machine Learning on Hadoop •  Advanced Machine Learning •  Customer Case Studies
  4. 4. ®© 2014 MapR Technologies 4 Big Data is Overwhelming Traditional Systems •  Mission-critical reliability •  Transaction guarantees •  Deep security •  Real-time performance •  Backup and recovery •  Interactive SQL •  Rich analytics •  Workload management •  Data governance •  Backup and recovery Enterprise Data Architecture ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  5. 5. ®© 2014 MapR Technologies 5 Hadoop: The Disruptive Technology at the Core of Big Data JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
  6. 6. ®© 2014 MapR Technologies 6 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS •  Data staging •  Archive •  Data transformation •  Data exploration •  Streaming, interactions Hadoop Relieves the Pressure from Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  7. 7. ®© 2014 MapR Technologies 7 MapR: Best Hadoop Distribution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors 3X bookings Q1 ‘13 – Q1 ‘14 80% of accounts expand 3X 90% software licenses <1% lifetime churn >$1B in incremental revenue generated by 1 customer
  8. 8. ®© 2014 MapR Technologies 8 The Power of the Open Source CommunityManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue *  Cer&fica&on/support  planned  for  2014  
  9. 9. ®© 2014 MapR Technologies 9 Machine Learning StackManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue *  Cer&fica&on/support  planned  for  2014  
  10. 10. ®© 2014 MapR Technologies 10 ENTERPRISE DATA HUB MARKETING OPTIMIZATION RISK & SECURITY OPTIMIZATION OPERATIONS INTELLIGENCE • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration • Recommendation engines & targeting • Customer 360 • Click-stream analysis • Social media analysis • Ad optimization • Network security monitoring • Security information & event management • Fraudulent behavioral analysis • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Smart meter analysis Machine Learning Cuts Across All Use Cases
  11. 11. ®© 2014 MapR Technologies 11 How Does Big Data Help Machine Learning Big Data => Better Models •  A machine that has played 1 million checkers game will be smarter than the one that played just a 100 games •  Improves accuracy of the model esp. for unsupervised learning •  Unlikely to overfit because of the variety of data Past Data Model New Data Results
  12. 12. ®© 2014 MapR Technologies 12 Common Machine Learning Use Cases on Hadoop •  Linear/Polynomial Regression – fit to an equation - predict prices •  Logistic Regression – probability of occurrence - classify spam •  K-means Clustering – group things together - customer segmentation •  Recommender Systems and Collaborative Filtering – product recommendation •  Anomaly Detection – credit card fraud The data scientist decides what works best
  13. 13. ®© 2014 MapR Technologies 13© 2014 MapR Technologies ® Machine Learning on Hadoop
  14. 14. ®© 2014 MapR Technologies 14 Modeling Process – Constant Iterations / Free to Fail •  Modeling Data Set + Validation Data Set •  Constant Iterations and plotting –  Underfit vs. Overfit –  Feature manipulation –  Adjusting learning rates –  False Positive vs. False Negatives – precision levels –  Measuring Error etc •  Legacy applications, libraries, code used to manipulate data
  15. 15. ®© 2014 MapR Technologies 15 Development and Deployment Process Need newer data sets from production for model building and validation – need complete autonomy for inventions Develop the final solution based on models and test and deploy working with Ops – need to coordinate heavily Need to provide data and deploy apps while ensuring data consistency, data compliance, HA, DR etc. PLAYERS ACTIVITY Mathematicians Developers Operations Staff Lots of Operational Issues
  16. 16. ®© 2014 MapR Technologies 16 Volumes and Mirroring The Conflict: Experimental, Free to Fail Modeling Process Needs Production Data Solutions: 1.  Same Cluster: Separate Volumes, Multi-tenancy, Labels, Queues, Data Placement Control etc.. 2. Different Cluster for R&D purposes: Mirroring – efficient, less network bandwidth, across the globe, easy to deploy and maintain
  17. 17. ®© 2014 MapR Technologies 17 Snapshots The Idea: Version control of data as well as models Data Version Control: How does my model work against new validation sets How did it change across many validation sets Model Version Control: How can I go back and check my new model against old datasets How do I prove that what I came up with worked for the data we had at the time – replicate scenarios
  18. 18. ®© 2014 MapR Technologies 18 Read Write NFS Access •  Existing applications, custom libraries all work out-of-the-box •  Browsers, modeling languages, scripts work out-of-the-box •  Data ingestion is easy –  Quickly move data in and out without having to wait for developers and administrators to build and maintain flume cluster
  19. 19. ®© 2014 MapR Technologies 19© 2014 MapR Technologies ® Machine Learning Options
  20. 20. ®© 2014 MapR Technologies 20 Apache Spark •  Spark – In Memory Processing Framework •  Works well with the iterative machine learning algorithms – the matrices can be pulled into memory •  100x better performance (in-memory) compared to MapReduce MLLib •  Inbuilt libraries for a variety of algorithms •  Python and NumPy support GraphX •  Libraries to model relationships between entities – social media
  21. 21. ®© 2014 MapR Technologies 21 Apache Mahout •  In-built algorithms for popular techniques such as Recommenders, Classification, Collaborative Filtering etc. •  Moving towards running on Spark
  22. 22. ®© 2014 MapR Technologies 22 Advanced Machine Learning with Skytree DATA MARTS DATA WAREHOUSE MapR Data Platform Offload Re-Load MapR-DB MapR-FS Batch (MR, Spark, Hive, Pig, …) Interactive (Impala, Drill, …) Streaming (Spark Streaming, Storm…) MAPR DISTRIBUTION FOR HADOOP Adv. Modeling – Exploration - Analytics Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA
  23. 23. ®© 2014 MapR Technologies 23© 2014 MapR Technologies ® Skytree
  24. 24. ®© 2014 MapR Technologies 24 Q&AEngage with us! 1.  Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox 2. Download machine learning e-books from Ted Dunning: http://www.mapr.com/resources/white-papers#e-books 3. Visit Skytree at www.skytree.net 4. Learn best practices for Hadoop ETL: www.mapr.com/EDH
  25. 25. THE MACHINE LEARNING COMPANY ® SAME DATA. BETTER RESULTS. Jin H. Kim VP of Marketing jin@skytree.net! 1
  26. 26. THE MACHINE LEARNING COMPANY ® Machine learning: ! The modern science of finding patterns and making predictions from data:! ! multivariate statistics, data mining, pattern recognition, advanced/predictive analytics! Our Vision 2 THE DATA DRIVEN ENTERPRISE POWERED BY MACHINE LEARNING
  27. 27. THE MACHINE LEARNING COMPANY ® Machine Learning has finally arrived! 50’s-70s Mid 90’s - Today80’s-90’s 3 1st Wave: Artificial Intelligence Pattern Recognition Universities Technology Evolution! Application Evolution! 2nd Wave: Neural Networks Data Mining Science Credit scoring OCR Now: Machine Learning on Big Data 3rd Wave: Machine Learning: Convergence Sales / Marketing Finance Biotech Retail Telco Government
  28. 28. THE MACHINE LEARNING COMPANY ® Skytree: Machine Learning for High-Value, High-Complexity Problems! •  Predictive optimal decision-making! –  High-frequency algorithmic trading ! –  Online advertising exchanges! –  Fast customer targeting and churn analysis! •  Predictive monitoring/discovery assistance! –  Point-of-compromise fraud tips/cues ! –  Network fault monitoring/diagnosis! –  Predictive maintenance of network of devices! –  Fraud analysis in claims! –  Insider threat/DLP and cyber security! 4
  29. 29. THE MACHINE LEARNING COMPANY ® High-Value, High-Complexity Problems: 
 Critical Elements in Common! 1.  High-accuracy needed (needle- finding)! –  Small number of known examples! –  Identify anomalies with no prior examples! ! 2.  Complex data fusion needed (unified objects)! –  Spatial-temporal behavior/event pattern- finding and tracking! –  Inference of activities, entities/identities, relations! 3.  Automation needed (augment human analysts)! –  Value-based attention-focusing, recommendation of relevant content! –  Real-time interactivity without waiting! –  Fast construction of new reports for agility! 5
  30. 30. THE MACHINE LEARNING COMPANY ® Use Case Examples! 6 Financial Services Fraud Analysis Credit Scoring Pricing Churn Analysis SDN/SON Government Fraud Analysis Scoring Anomaly Detection Fault Analysis SDN/SON Retail Segmentation Recommendation Churn Analysis Lead Scoring Pricing Asset Intensive Preventative Maintenance Defect/Fault Detection Supply Chain Management Cost Forecasting Failure Analysis
  31. 31. THE MACHINE LEARNING COMPANY ® Global Leaders Select Skytree WORLD’S  LEADING:   Anomaly detection Logis3cs  &  Shipping   Content recommendation Consumer  Electronics   On-board destination recommendation Automobile   Web  Portal   Ad targeting Customer lead scoring, fraud, credit risk scoring Financial  Services  &  Credit  Card  
  32. 32. THE MACHINE LEARNING COMPANY ® “10  Hot  Big  Data  Startups  to  Watch”   “Skytree  Looms  in  Big  Data  Forest  with  New  Funding”     “Skytree  Uses  Machine  Learning  To  Crunch  Big  Data”     Skytree  named  “Big  Data  Analy3cs  Vendor  to  Watch”     “The  Ten  Coolest  Big  Data  Startups  in  2013”     “One  giant  leap  for  machinekind”     Skytree  among  “10  Emerging  Technologies  for  Big  Data”     “…could  change  the  face  of  Big  Data”   Who’s  Who  of  Advanced  Analy3cs  
  33. 33. THE MACHINE LEARNING COMPANY ® Insurance: Targeted Auto Policies with Telemetric Data! •  Business challenge! –  Inaccurate policy pricing based on demographics and actuarial data! •  Example: many teens are good drivers but they often incur higher premiums ! –  Availability of new data sources including telemetry data ! •  Machine learning solution! –  Use telematics to price insurance based on near- real-time driving habits ! –  Base rates on an individual’s actual driving history! –  Data fusion to personalize and increase objectivity and accuracy in pricing and claims processing! •  Business benefit! –  Targeted customer pricing and policies! –  Improved customer retention! –  Higher customer satisfaction and margins! 9
  34. 34. THE MACHINE LEARNING COMPANY ® •  Global 100 Financial Institution! •  Major Pain points: Speed & Accuracy of Current approach! •  Current Solution: SAS, Hadoop, Homegrown! “I want our analysts to create models rather than writing software”! - Skytree Customer ! 10 Runtime 
 (minutes)! CURRENT:! 1,200 Cores @100 Node Hadoop Cluster! Runtime: 100 Minutes! Accuracy (Gini): 57%! 100! 12 Cores @1 Node! 1250x Speedup! Runtime: 8 Minutes! Accuracy (Gini): 60%! SKYTREE SERVER:! 8! Customers’ Use of Skytree! Targeting – Find New Customers
  35. 35. THE MACHINE LEARNING COMPANY ® Asset Intensive: Predict Parts Failure through Telemetric Data! •  Business challenge! –  Early infant mortality of parts due to rapid aging is not easily detectable during manufacturing and environmental acceptance tests! –  Utilize diagnostic data such as impedance, voltage, temperature (multidimensional data)! •  Machine learning solution! –  Detect transient indicators of rapid aging through telemetric data! •  Time between Beginning of Life and first transient is random! •  Time between first transient and End of Life is deterministic! –  Automatic parameter tuning! –  Data fusion! •  Business benefit! –  Efficient parts inventory management! –  Higher customer satisfaction ! –  Optimize preventative maintenance scheduling based on predicted Time To Failure (TTF)! 11
  36. 36. THE MACHINE LEARNING COMPANY ® Predict Parts Failure through Telemetric Data! 12 Data Stored on Hadoop Cluster 12 Build failure model from manufacturing test data 1 Real-time discovery of transient part behavior patterns to predict Time-To-Failure Geo-location Data Telemetric DataManufacturing Data Blend in data from telemetric and other big data sources 3 2
  37. 37. THE MACHINE LEARNING COMPANY ® Improve Customer Retention with Machine Learning! •  Business challenge! –  Cost of attracting new customers is many times more than retaining customers! –  Greater customer sophistication and competition increase churn levels! •  Machine learning solution! –  Identify events that predict customer needs! –  Isolate best targets and best offers for individual customers! •  Predict what offer or service would prevent a customer from switching! –  Discover purchase patterns and profiles of customer who leave for a deeper understanding! •  Business benefit! –  Reduced churn and increased customer loyalty! –  Increased margins and marketing effectiveness! –  Improved up/cross sell opportunities! ! 13
  38. 38. THE MACHINE LEARNING COMPANY ®14 Skytree Confidential Performance Studies by Customers! Next Logical Product – Right Offer to Right Customer •  Global Fortune 20 Company! •  Major Pain Points: Speed & Accuracy of Legacy Approach! •  Current Solution: Homegrown! •  1M Data Points for a “Pilot”! 35% accurate! 20% increase in 
 recommendation relevance in a fraction of the time.! Runtime (mins)! SKYTREE! LEGACY! 97! .07! Results!Precision@5 (%)! LEGACY! 35%! 42%! SKYTREE! “We are literally speechless”! - Skytree Customer !
  39. 39. THE MACHINE LEARNING COMPANY ® Real-Time Fraud Detection! •  Business challenge! –  Growing complexity of fraud patterns! –  Increased frequency of fraud! –  Minimize false positives without compromising fraud accuracy! •  Machine Learning solution! –  Leverage diverse big data for better context! –  Real-time update of model parameters! –  Faster and more accurate model for better fraud detection ! •  Business benefit! –  More accurate and agile fraud detection system! –  Improved customer satisfaction ! –  Improved financial results! 15
  40. 40. THE MACHINE LEARNING COMPANY ® Global 2000 Credit Card Network – Before! Transaction Data Transferred From Database to Linux Server Modeling Fraud Model created to detect fraud. Model is exported Real-timedetection Model is re-coded by New set of engineers for main-frame New model is “loaded” fraud could be detected In Real-time. •  Customer wanted a more accurate model •  Current model in system was designed to be updated on a yearly basis •  Running a model on large dataset took over 2 days •  Skytree’s goal is to move update of the model to daily or real time Hardware: Linux x86 Server, Mainframe Software: Internally developed random decision forests SLA: Fraud scored in real-time. Fraud model updated yearly XX XX
  41. 41. THE MACHINE LEARNING COMPANY ® Global 2000 Credit Card Network - Now! Modeling&Real-TimeScoreEnvironment •  Customer can use the same environment for modeling and for production •  Models can be updated on a daily or real- time basis depending on needs •  More frequent updates leads to significant increase in lift Hardware: Linux x86 Server Software: MapR, Skytree fraud detection models SLA: Fraud scored in real-time. Fraud model daily / real-time Data Stored on MapR Hadoop Cluster Fraud Model Created Using Fraud Model updated Daily / real-time Data Stored on MapR Hadoop Cluster Unsupervised ML Models Created Using Fraud Model updated Daily / real-time
  42. 42. THE MACHINE LEARNING COMPANY ® “Key to increasing fraud detection accuracy”! •  Use all of the data: Sampling can decrease accuracy of results •  Semi-supervised learning: Combination of supervised and unsupervised learning can improve fraud detection rates •  Weight transactions based on date: Skytree server allows each transaction to be weighted differently and allows fraud models to preferentially weigh recent fraud vs older fraud •  Use the most important variables: o  Were the last few transactions at an un-manned location? o  Is the transaction over the credit limit? o  Which day of the week was the fraud committed? o  Has the card been reported for fraud before? o  And more… •  Weight based on transaction value: we should care more about larger transactions Global 2000 Credit Card Network - Now!
  43. 43. THE MACHINE LEARNING COMPANY ®19 Skytree Maximizes Predictive Accuracy! 19 Advantages Benefits Greater chance of having the best model for your data Breadth of Advanced Methods: more powerful/advanced methods and options 1 1 Improved accuracy in the time available Speed & Scalability: use more data, test more parameters 2 2 More productive modelers, more people in the company can use it Automation / Ease of Use: shorter time to most accurate models 3 3 Skytree is designed from the ground up for these benefits.
  44. 44. THE MACHINE LEARNING COMPANY ® Sources of Generalization Error! 20 Motivations: Sources of Generalization Error Excess Error Improper Model Finite Samples Algorithmic Accuracy E⇠ ⇥ f(xt, ⇠) infx2H⇤ f(x, ⇠) ⇤ E⇠ ⇥XXXXXXX inf x2H f(x, ⇠) inf x2H⇤ f(x, ⇠) ⇤ | {z } ErrApproximation E⇠ ⇥ ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) XXXXXXX inf x2H f(x, ⇠) ⇤ | {z } ErrEstimation E⇠ ⇥ f(xt, ⇠) ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) ⇤ | {z } ErrExpected-Optimization ⇠ : data sample; N : number of data samples; H : hypothesis space of the model; H⇤ : “true” hypothesis space that contains the optimal x⇤ Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
  45. 45. THE MACHINE LEARNING COMPANY ® First Principles: Sources of prediction error! 21 Motivations: Sources of Generalization Error Excess Error Improper Model Finite Samples Algorithmic Accuracy E⇠ ⇥ f(xt, ⇠) infx2H⇤ f(x, ⇠) ⇤ E⇠ ⇥XXXXXXX inf x2H f(x, ⇠) inf x2H⇤ f(x, ⇠) ⇤ | {z } ErrApproximation E⇠ ⇥ ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) XXXXXXX inf x2H f(x, ⇠) ⇤ | {z } ErrEstimation E⇠ ⇥ f(xt, ⇠) ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) ⇤ | {z } ErrExpected-Optimization ⇠ : data sample; N : number of data samples; H : hypothesis space of the model; H⇤ : “true” hypothesis space that contains the optimal x⇤ Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8 Use the right model: Try many Use more data: All of it Use the right parameters: Try many
  46. 46. THE MACHINE LEARNING COMPANY ® 1.x MAPR Data Platform Spark 2.x/ YARN ZooKeeper Web Services DataSources/Targets OLTP / EDW Command Line Interface Skytree and Spark!
  47. 47. THE MACHINE LEARNING COMPANY ® Why Skytree? 
 Why do companies pick us for Big Data analytics?! 23 INVESTORS! (22M+)! Built on Solid Foundation
  48. 48. THE MACHINE LEARNING COMPANY ® SAME DATA. BETTER RESULTS. Thank You. www.skytree.net ! 24
  49. 49. THE MACHINE LEARNING COMPANY ® Q&AEngage with us! 1.  Download the MapR Sandbox for Hadoop: ! www.mapr.com/sandbox! ! 2. Download machine learning e-books from Ted Dunning:! http://www.mapr.com/resources/white-papers#e-books ! 3. Visit!Skytree at www.skytree.net ! 4. Learn best practices for Hadoop ETL:! !www.mapr.com/EDH! !

×