This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/otq2nQUSV3s
We will talk about the AI transformation journey at Vision Banco - Paraguay, from the early initiatives to futures use cases, and how we adopted open source H2O.ai and Driverless AI in our organization.
Bio:
Ruben Diaz
My name is Ruben Diaz, from Asunción, Paraguay. I am married and father of 3 children. I work as Data Scientist at Vision Banco
Luis Armenta:
Luis holds a BSc in Electrical Engineering from the National University of Mexico and a MSc in Electrical Engineering/Computer Science from the University of Waterloo in Canada. He is also currently completing an Executive MBA at McCombs School of Business at the University of Texas in Austin. Luis has over ~14 years of experience, having started his career as a Research Scientist at Intel Labs before being promoted to 2nd Line Engineering Manager, leading the high-speed interconnect hardware design of Intel’s server portfolio. Luis also has held roles as Product Manager of EM simulators at Ansys, Inc. and as a Systems Engineer of 4K and 8K UHDTVs at Macom.
5. #H2OWORLD
Paraguay
Population: 6.811 million (2017)
Area: 406,752 km2 / 157,048 sq mi (4.1% of USA area )
Official languages: Spanish & Guarani
ROI: 2nd best (22%) in Latin America
Point of interest: Asuncion, Itaipu Dam , Jesuit Missions, Chaco, Pantanal
Exports:
• 1st global Electric power exporter
• 1st global Organic sugar exporter
• 2nd global Stevia exporter
• 3rd global Yerba Mate exporter (Ilex paraguaiensis)
• 4th global Soybean exporter
• 5th global Chia exporter
• 6th global Meat exporter
6. #H2OWORLD
• Paraguayan bank member of
"Global Alliance for Banking on Values"
• Top 10 in the country
• The largest number of agencies (95), non-
banking correspondents (2,492) and largest
employer of the financial system (1,758
employees) in Paraguay
• Inclusive: ⅓ of population have accounts
• 800,000+ customers
• Microfinance & SMBs
Tus metas nos inspiran / Your goals inspire us
7. #H2OWORLD
Vision Banco ML Journey
• 1st Generation Credit Scoring Models, using Logistic Regression
Models developed with IBM SPSS, deployed as a stored procedure on IBM Db2 for i (AS/400) database
• 2nd Generation, state of the art algos: Random Forest, GBM, etc.
Developed with SPSS, KNIME and R, exported to the standard PMML format and implemented as REST
web services using openscoring.io
• 3rd Generation. H2O.ai open source. More algos: XGBoost, Deep Learning, Ensembles, etc.
The open source H2O platform surprised us with its speed to train models. The migration to H2O
involved changing the deployment of models in PMML format to H2O’s POJOS & MOJOS
• 4th Generation. Auto ML with Driverless AI on IBM Power System AC922 server accelerated with
NVIDIA® Tesla® V100 GPUs
Each Step improved the Accuracy and Speed of Model Building & Deploying
8. #H2OWORLD
Vision Banco Use Cases
Risk Management
• Credit Scoring
• Default Prediction
• Fraud Detection
Business
• Propensity to Purchase (Predictive Lead Scoring)
• Customer Churn Prediction
• Customer segmentation
• Recommendation Engines
9. #H2OWORLD
Automatic Machine Learning: Driverless AI !!!
Joined the Driverless AI Beta circa November 2017
What is Driverless AI?
• Automates a large part of the Data Science process
POC
• Env: Cloud VM with GPU
• Scenario: Entered the AnalyticsVidhya.com contest "Data
Science Hackathon: Churn Prediction"
• Result: Surprisingly, I got 8th place!
https://www.linkedin.com/pulse/how-get-eighth-place-data-science-competition-using-driverless-diaz/
10. #H2OWORLD
Driverless AI
Features Targe
t
Data Quality and
Transformation
Modeling
Table
Model
Building
Model
Data Integration
+
Driverless AI:
Automates Data Science and ML Workflows
11. Confidential11 Confidential11
Automatic Machine Learning 101
SQL
Local
Amazon S3
HDFS
X Y
Automatic Model Optimization
Automatic
Scoring Pipeline
Machine learning
Interpretability
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes
• i.i.d. data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Survival of the Fittest
Understand the data
shape, outliers,
missing values, etc.
Powered by GPU Acceleration
1 Drag and Drop Data
2 Automatic Visualization
Use best practice model recipes
and the power of high performance
computing to iterate across
thousands of possible models
including advanced feature
engineering and parameter tuning
3 Automatic Model Optimization
Deploy ultra-low latency
Python or Java Automatic
Scoring Pipelines that include
feature transformations and
models
4 Automatic Scoring Pipelines
Bring data in from
cloud, big data and
desktop systems
Google BigQuery
Azure Blog Storage
Snowflake
Model
Documentation
12. #H2OWORLD
Solution Architecture (Hardware)
The IBM Power System AC922 server
● Faster I/O - up to 5.6x more I/O bandwidth
than x86 servers
● The best GPUs - 2-6 NVIDIA® Tesla® V100
GPUs with NVLink
● Extraordinary CPUs - 2x POWER9 CPUs,
designed for AI
● Simplest AI architecture - Share RAM across
CPUs & GPUs
● Enterprise-ready - PowerAI DL frameworks
with IBM support
● Next Gen PCIe - PCIe Gen4 2x faster vs PCIe
Gen3 in x86
● Built for the world's biggest AI challenges
The best server for enterprise AI
Powered by GPU Acceleration
18. #H2OWORLD
Lesson Learned
• The automation of the process of Data Science reduces time
and "costs less money"
• It is very important that machine learning models are
interpretable to explain the decisions made by machine
learning algorithms to business people and even to the
company's customers
19. #H2OWORLD
Looking to the future
Use Cases
• Money laundering prevention
• Time series forecasting
• NLP
• Chatbots
• Voice / Sound recognition
• Image recognition
• Video detection