This presentation is part of the joint webinar that BigML hosted with Trifacta on July 6, 2017, to showcase how seamlessly both platforms fit together in turning raw data into real-life predictive use cases. What makes these tools special is their emphasis on casting the web wide when it comes to prioritizing ease of use to such an extent that Machine Learning becomes viable for thousand times more professionals.
BigML Machine Learning meets Trifacta Data Wrangling
1.
2. Poul Petersen
Enter questions into chat box – we’ll
answer some via chat; others at the end of
the session
https://bigml.com
Resources
Speaker
Contact
info@bigml.com
Twitter
@bigmlcom
Questions
Victor Coustenoble
https://www.trifacta.com
Resources
Speaker
Contact
sales@trifacta.com
Twitter
@trifacta
4. BigML & TrifactaBigML, Inc 4
Promise of ML
time
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Want
DecisionsData
Have
Lots of Data
5. BigML & TrifactaBigML, Inc 5
The need for ML
• Can you find any pattern in this tiny data set?
Talk Text Purchases Data Age Churn?
148 72 0 33,6 50 TRUE
85 66 0 26,6 31 FALSE
183 64 0 23,3 32 TRUE
89 66 94 28,1 21 FALSE
115 0 0 35,3 29 FALSE
166 72 175 25,8 51 TRUE
100 0 0 30 32 TRUE
118 84 230 45,8 31 TRUE
171 110 240 45,4 54 TRUE
159 64 0 27,4 40 FALSE
…. but this is a simple example
7. BigML & TrifactaBigML, Inc 7
Why BigML / Why Now?
Maturity of ML
techniques
Cost of
computation
Abundance
of data
Speed of
computation
Easy Tooling
Machine Learning techniques have been
around for decades… why now?
CART Trees: 1980
Deep Learning: 1984
Convolutional Neural Network: 1988
8. BigML & TrifactaBigML, Inc 8
BigML Platform
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
9. BigML & TrifactaBigML, Inc 9
Promise of ML
time
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Want
DecisionsData
Have
Lots of Data
10. BigML & TrifactaBigML, Inc 10
Reality of ML
time
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Want
DecisionsData
Have
Lots of Data
Crazy
11. BigML & TrifactaBigML, Inc 11
Reality of ML
time
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Want
DecisionsData
Have
Lots of Data
Crazy
12. BigML & TrifactaBigML, Inc 12
Reality of ML
Crazy
Have
Data
time
Want
Decisions
•churn
•conversion
•diagnosis
•fraud
•Etc.
ML Ready
Data
Need
13. BigML & TrifactaBigML, Inc 13
Today’s Demo
Lending
Club
Have
time
Want
Decisions
•Which
loans are
low risk
ML Ready
Data
Need
14. BigML & TrifactaBigML, Inc 14
BigML + Trifacta
•Best of Breed solutions
•Trifacta: Data Wrangling
•BigML: Machine Learning
•Both
•Easy to use / self-service
•Scalable / Interoperable
•Enable repeatability & collaboration
•Cost effective
15. BigML & TrifactaBigML, Inc 15
BigML + Trifacta
Together: BigML combined with Trifacta makes it
possible to easily go from the data you have to the
decisions you want.
Questions?
info@bigml.com sales@trifacta.com
18. Trifacta - Company Overview
Background
➔Headquartered in San Francisco, with offices in Boston,
London, Berlin, Paris
➔>100+ Employees
➔Created in 2012
Focus
➔100% focused on Data Wrangling and data preparation
➔Accelerate time to value and business use of Big Data
➔Visual, interactive and Self-Service Data Preparation
19. Before analytics processes, majority of the time (50% -80%) spent on data preparation activities.
What is Data Wrangling?
20. Self-service access for business analysts to raw data
operated under IT control
Business System Data Machine Generated Data Third Party Data
Reporting / BI
Business Analyst
LOB IT
Explore Structure Clean Enrich Validate Publish
Distributed Data Platform
Predictive Analytics / Data
Science
Machine Data /
Enterprise Processes
Applications /
processes
Reporting /
Data driven decision
Data Mining /
Machine Learning
22. Trifacta: The Global Leader in Data Wrangling
No. 1 by Analysts
#1 End User Data Preparation Vendor
2015
Leader in Forrester Wave for Data Preparation
Tools
2017
0
12.500
25.000
37.500
50.000
No. 1 by Users
No. 1 by Customers
No. 1 by Partners
2016
Oct 2015 Oct 2016 Oct 2017
2017
23. Demonstration : Loan Risk Analysis
v
Members
CRM
Loan
Purpose
Loans
History
Trifacta – BigML
Common workspace
Data Wrangling
Business solution
Modeling &
Deployment
Available data
24. Predictive Modeling Data Pipeline : An Iterative Process
Data Design
Preparation Training
Operationalization
Scoring Monitoring
Action