SlideShare a Scribd company logo
1 of 59
Download to read offline
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Sridhar Paladugu
Frank McQuillan
AI on Greenplum Using
Apache MADlib and MADlib Flow
Greenplum Integrated Analytics
Data Transformation
Traditional BI
Machine
Learning
Graph
Data Science
Productivity Tools
Geospatial
Text
Deep
Learning
Build
Manage
Deploy
■ Machine learning
■ Deep learning
■ Model management
■ Deployment and
orchestration of models
Agenda
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
1. Machine Learning with
Apache MADlib
Scalable, In-Database
Machine Learning
• Open source https://github.com/apache/madlib
• Downloads and docs http://madlib.apache.org/
• Wiki https://cwiki.apache.org/confluence/display/MADLIB/
Apache MADlib: Big Data Machine Learning in SQL
Open source,
top level
Apache project
For PostgreSQL
and Greenplum
Database
Powerful machine
learning, graph,
statistics and analytics
for data scientists
History
MADlib project was initiated in 2011 by EMC/Greenplum architects and
Professor Joe Hellerstein from University of California, Berkeley.
UrbanDictionary.com:
mad (adj.): an adjective used to enhance a
noun.
1- dude, you got skills.
2- dude, you got mad skills.
Functions
Data Types and Transformations
Array and Matrix Operations
Matrix Factorization
• Low Rank
• Singular Value Decomposition (SVD)
Norms and Distance Functions
Sparse Vectors
Encoding Categorical Variables
Path Functions
Pivot
Sessionize
Stemming
Apache MADlib 1.15.1
Graph
All Pairs Shortest Path (APSP)
Breadth-First Search
Hyperlink-Induced Topic Search (HITS)
Average Path Length
Closeness Centrality
Graph Diameter
In-Out Degree
PageRank and Personalized PageRank
Single Source Shortest Path (SSSP)
Weakly Connected Components
Model Selection
Cross Validation
Prediction Metrics
Train-Test Split
Statistics
Descriptive Statistics
• Cardinality Estimators
• Correlation and Covariance
• Summary
Inferential Statistics
• Hypothesis Tests
Probability Functions
Supervised Learning
Neural Networks
Support Vector Machines (SVM)
Conditional Random Field (CRF)
Regression Models
• Clustered Variance
• Cox-Proportional Hazards Regression
• Elastic Net Regularization
• Generalized Linear Models
• Linear Regression
• Logistic Regression
• Marginal Effects
• Multinomial Regression
• Naïve Bayes
• Ordinal Regression
• Robust Variance
Tree Methods
• Decision Tree
• Random Forest
Time Series Analysis
• ARIMA
Unsupervised Learning
Association Rules (Apriori)
Clustering (k-Means)
Principal Component Analysis (PCA)
Topic Modelling (Latent Dirichlet Allocation)
Utility Functions
Columns to Vector
Conjugate Gradient
Linear Solvers
• Dense Linear Systems
• Sparse Linear Systems
Mini-Batching
PMML Export
Term Frequency for Text
Vector to Columns
Nearest Neighbors
• k-Nearest Neighbors
Sampling
Balanced
Random
Stratified
Comprehensive and mature
data science library
Why MADlib on Greenplum?
• Better parallelism
• Better scalability
• Higher predictive accuracy
• Top level ASF project
“Apache MADlib Comes of Age”, Frank McQuillan, Oct. 2017,
https://content.pivotal.io/blog/apache-madlib-comes-of-age
Greenplum Database with MADlib
Standby
Master
…
Master
Host
SQL
Interconnect
Segment Host
Node1
Segment Host
Node2
Segment Host
Node3
Segment Host
NodeN
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS KafkaETL
Spring
Cloud
Data Flow
In-Database
Functions
Machine learning
&
statistics
&
math
&
graph
&
utilities
MassivelyParallelProcessing
Iterative Model Execution
Master
model = init(…)
WHILE model not converged
model =
SELECT
model.aggregation(…)
FROM
data table
ENDWHILE
Stored Procedure for Model
…
Broadcast
Segment 2
Segment n
…
Transition Function
Operates on tuples
or mini-batches to
update transition state
(model)
1
Merge
Function
Combines
transition states2
Final Function
Transforms transition
state into output value
3
Segment 1
Familiar SQL Interface
Train (build a predictive model)
Predict (use model on new data)
Familiar SQL Interface From house pricing model
SVM Scale with Data Size
Greenplum cluster:
● 1 master
● 4 segment hosts with
6 segments per host
Support Vector Machines
PageRank Scale with Graph Size
Greenplum cluster:
● 1 master
● 4 segment hosts with
6 segments per host
Normal random graphs with
mean degrees 50 edges per vertex
(i.e., 5B edges in the largest case)
5B edges
(1K) (10K) (100K) (1M) (10M) (100M)
Note: log-log scale
(100s)
(1s)
(10K s)
(1M s)
“Graph Processing on Greenplum Database using Apache MADlib”, Frank McQuillan, Jan 2018,
https://content.pivotal.io/blog/graph-processing-on-greenplum-database-using-apache-madlib
But modeling is only part of the story...
“It’s an absolute myth that you can send an algorithm
over raw data and have insights pop up.”
- Jeffrey Heer, Professor of Computer Science at the University of Washington and Co-
founder of Trifacta
“For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, Aug. 17, 2014
https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
Feature Engineering
Example
data science
workflow
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
2. Deep Learning
Deep Learning
• Type of machine
learning inspired by
biology of the brain
• Artificial neural
networks with
multiple layers
between input and
output
Example Deep Learning Algorithms
Multilayer
perceptron (MLP)
“The Original”
Recurrent
neural network (RNN)
E.g., machine translation
Convolutional
neural network (CNN)
E.g., image classification
Convolutional Neural Networks (CNN)
• Effective for computer vision
• Fewer parameters than fully
connected networks
• Translational invariance
• Classic networks: LeNet-5,
AlexNet, VGG
Graphics Processing Units (GPUs)
• Great at performing a
lot of simple
computations such as
matrix operations
• Well suited to deep
learning algorithms
GPU N
…
Single Server
Host
Node 1
GPU 1
Moving Data Greenplum <-> Single Server
Deep learningData preparation, feature generation,
machine learning, geospatial, etc.
Large
data
transfer
Suboptimal
Integrated Deep Learning with Greenplum
Standby
Master
…
Master
Host
SQL
Interconnect
Segment Host
Node1
Segment Host
Node2
Segment Host
Node3
Segment Host
NodeN
GPU N
…
GPU 1 GPU N
…
GPU 1 GPU N
…
GPU 1
…
GPU N
…
GPU 1
In-Database
Functions
Machine learning
&
statistics
&
math
&
graph
&
utilities
MassivelyParallelProcessing
Deep Learning on a Cluster
Num Approach Description
1 Distributed deep learning Train single model architecture across the cluster.
Data distributed (usually randomly) across segments.
2 Data parallel models Train same model architecture in parallel on different
data groups (e.g., build separate models per country).
3 Hyperparameter tuning Train same model architecture in parallel with different
hyperparameter settings and incorporate cross
validation. Same data on each segment.
4 Neural architecture
search
Train different model architectures in parallel. Same
data on each segment.
Current
work
Data Loading and Formatting
Testing Infrastructure
• Google Cloud Platform (GCP)
• Type n1-highmem-32 (32 vCPUs, 208 GB memory)
• NVIDIA Tesla P100 GPUs
• Greenplum database config
– Tested up to 20 segment (worker node) clusters
– 1 GPU per segment
6-layer CNN - Runtime (CIFAR-10)
Method: Model weight averaging
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
3. Model Management
Try and try and try...
• Data scientists typically try many different types of
models with many different parameters combinations
Model Persistence in MADlib 1.x
One model at a time
Model Persistence in MADlib 2.0
Multiple models at a time in
model library
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
4. MADlib Flow
Data Science Process
Model Evaluation
Operationalization
Model Building
Feature Engineering
Data
Review
User Feedback
Problem
Definition
Setup
Model Operationalization
Model Evaluation
Operationalization
Model Building
Feature Engineering
Data
Review
User Feedback
Problem
Definition
Setup
Model Operationalization
is the process of deploying data
science models to production
for ongoing use by other
software
Common Challenges With Operationalizing Models
Model Evaluation
Operationalization
Model Building
Feature Engineering
Data
Review
User Feedback
Problem
Definition
Setup
Common challenges with model
operationalization:
● Handling production data
● Engineering for scale and
performance
● Model transportation
● Managing and orchestrating
deployed models
● Data Scientists are not
developers or platform
experts
BATCH TRAINING
BATCH INFERENCE
~40% of today’s use cases
Tax Return Fraud: Score database of
tax returns - on a nightly basis - to flag
likely fraudulent returns for audit
EVENT DRIVEN
TRAINING EVENT
DRIVEN INFERENCE
<5% today’s use cases
Online Advertising: Maximize Click
Thru Rate by algorithmically selecting
and testing advertisement placement in
real time
BATCH TRAINING
EVENT DRIVEN
INFERENCE
~55% today’s use cases (growing)
Real Time Transaction Fraud: Train
a ML model on historical data to
classify - in real time - whether or not
new credit/debit transactions are likely
to be fraudulent
EXAMPLE
Patterns For Operationalizing Models
EXAMPLE EXAMPLE
PotsgreSQL/Greenplum
with MADlib supports
this pattern
PostgreSQL/Greenplum
with MADlib & MADlib
Flow supports this
pattern
Highly specialized – low
number of enterprise use
cases
AI For The PostgreSQL Community
Standardized end-to-end Data Science in SQL with the Greenplum/Postgres stack
Experimentation
Initial code development and testing,
model experimentation on samples.
Modeling at Scale
Heavy compute tasks such as model
training across big data
Deployment
Production deployment of models to feed
downstream applications and reports
Artificial
Intelligence
: Closed
Loop
Machine
Learning
Model Deployment With MADlib Flow
1
ML Training
Train ML model in
Postgres or Greenplum
using Apache MADlib
madlibflow --
deploy
Set configs in .yml and
deploy model from
Greenplum to Docker,
PCF or Kubernetes
2
Docker pull
Pull docker containers
with optimized Postgres
and MADlib
3
Pull Model
Extract model and
feature table schema
layout from Greenplum
database
4
Load Model
Load model and feature
table schema into
optimized Postgres
5
Deploy
Deploy docker container
to target environment
6
Automated Backend OperationsUser Operations
Containerized Deployment Of Models
$ madlibflow --deploy --target kubernetes --type model
Key benefits of MADlib Flow
● Easy to deploy & light weight
● Highly scalable REST and Streaming
● End-to-end SQL workflow
● Low latency inference/predictions
● Feature Transformations
Single command to deploy a MADlib
trained model from GPDB/Postgres to
Docker, PCF or Kubernetes
Containerized deployment of Apache MADlib Machine Learning workflows for low
latency event driven inference and scale
MADlib Flow Components
MADlib Flow : Hello World!
Let us demonstrate a Linear Regression Model deployment
Dependent Variable:
● patient has had a second heart attack within 1 year
independent variables:
● patient completed a treatment on anger control
● anxiety scale score
Workflow:
Create
schema
Load data Train model
Deploy
model
Tes
t
Batch
prediction
Model Deployment
Deployment manifest
$ madlibflow --name patient-lr --type model --action deploy --target kubernetes --inputJson config.json
Model Deployment
Greenplum Database
Feature EngineCredit/Debit Card Transaction
(Input)
Message
{
“transaction_ts”: ,
“credit_card_number”: ,
“transaction_amt”:,
“merchant_id”:
}
Approved Credit/Debit Card
Transaction
(Output)
Message
{
“transaction_ts”: ,
“transaction_amt”:,
“credit_card_number”:,
“num_transactions_30days”:,
“max_transactions_30days”:,
“merchant_id”:,
“num_fraud_cases”:,
“avg_transaction_amount_30days”:,
“fraud_risk_score”: 0.92,
“approved”: True
}
Accounts
credit_card_number
num_transactions_30days
max_transactions_30days
Merchants
merchant_id
num_fraud_cases
avg_transaction_amount_30days
Cache
(Gemfire, PCC, Redis, etc.)
Cache Abstraction
Cache Abstraction
SELECT mch.*
,acct.*
,log(msg.transaction_amt + 1) AS log_transaction_amt
FROM message msg
JOIN merchants mch ON
msg.merchant_id=mch.merchant_id
JOIN accounts acct ON
msg.credit_card_number=acct.credit_card_number;
MADlib REST
Cache Loader
Automated deployment
of scalable low latency
end-to-end ML pipelines
(“Data Science Ops.”)
No code conversion -
engineer features and
populate cache in SQL
Join data from the
incoming message with
cached data
Accounts Merchants
SELECT create_accounts(); SELECT create_merchants();
Example Flow for Fraud Detection
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
5. Learn More!
• Download
– http://madlib.apache.org/
• ~40 Jupyter notebooks
– https://github.com/apache/madlib-site/tree/asf-
site/community-artifacts
• Wednesday March 20 @PostgresConf
#ScaleMatters
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Backup Slides
MADlib 2.0
● More deep learning
capabilities
○ Improved model
performance
○ Hyperparameter
tuning
● Model repositories and
management for
streamlined data science
workflows
● New and improved SQL
interface for MADlib
functions
MADlib Flow
● Support for PL/Python and
PL/R
● Native deployment to
Pivotal Cloud foundry as
build pack.
● Beta Release in May’19
● Metrics collector.
MADlib 1.16
● Initial deep learning
release for image
classification
(Keras/TensorFlow)
● Postgres 11 support
● Improve speed of k-
nearest neighbors via
approximate method
Looking Ahead
Apache MADlib Resources
• Web site
– http://madlib.apache.org/
• Wiki
– https://cwiki.apache.org/confluence/display/MAD
LIB/Apache+MADlib
• User docs
– http://madlib.apache.org/docs/latest/index.html
• Jupyter notebooks
– https://github.com/apache/madlib-site/tree/asf-
site/community-artifacts
• Technical docs
– http://madlib.apache.org/design.pdf
• Pivotal commercial site
– http://pivotal.io/madlib
• Mailing lists and JIRAs
– https://mail-
archives.apache.org/mod_mbox/incubator-
madlib-dev/
– http://mail-
archives.apache.org/mod_mbox/incubator-
madlib-user/
– https://issues.apache.org/jira/browse/MADLIB
• PivotalR
– https://cran.r-
project.org/web/packages/PivotalR/index.html
• Github
– https://github.com/apache/madlib
– https://github.com/pivotalsoftware/PivotalR
Execution Flow
Client
Database
Server
Master
Segment 1
Segment 2
Segment n
…
SQL
Stored
Procedure
Result
Set
String
Aggregation
psql
…
Artificial Intelligence Landscape
Deep
Learning
Distributed Deep Learning Methods
• Open area of research*
• Methods we have investigated so far:
– Simple averaging
– Ensembling
– Elastic averaging stochastic gradient descent
(EASGD)
* Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
https://arxiv.org/pdf/1802.09941.pdf
Some Results with CIFAR-10
• 60k 32x32 color
images in 10 classes,
with 6k images per
class
• 50k training images
and 10k test images
https://www.cs.toronto.edu/~kriz/cifar.html
■ Experimentation -> Modeling at scale -> Deployment all in SQL
■ Single platform from model development to Deployment using Postgres/Greenplum
■ Low latency inference
■ Easy to deploy both feature generation code and model
■ Join data from event message with Feature cache objects using ANSI SQL
■ Continuously generate the features and feed in to feature engine.
■ Multiple versions of Models can be deployed for accuracy measurement.
■ Same tool can deploy to multiple Container Environments, PKS, AKS, GKE, etc.
MADlib Flow Benefits
Model Training
Model Testing

More Related Content

What's hot

PG-REXで学ぶPacemaker運用の実例
PG-REXで学ぶPacemaker運用の実例PG-REXで学ぶPacemaker運用の実例
PG-REXで学ぶPacemaker運用の実例kazuhcurry
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream ProcessingSuneel Marthi
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0Ji-Woong Choi
 
Microsoft Edgeで サポートされる 新しい API について
Microsoft Edgeでサポートされる新しい API についてMicrosoft Edgeでサポートされる新しい API について
Microsoft Edgeで サポートされる 新しい API についてOsamu Monoe
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
DevNation Live: Kafka and Debezium
DevNation Live: Kafka and DebeziumDevNation Live: Kafka and Debezium
DevNation Live: Kafka and DebeziumRed Hat Developers
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 

What's hot (20)

Greenplum Roadmap
Greenplum RoadmapGreenplum Roadmap
Greenplum Roadmap
 
PG-REXで学ぶPacemaker運用の実例
PG-REXで学ぶPacemaker運用の実例PG-REXで学ぶPacemaker運用の実例
PG-REXで学ぶPacemaker運用の実例
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0
[오픈소스컨설팅]RHEL7/CentOS7 Pacemaker기반-HA시스템구성-v1.0
 
Microsoft Edgeで サポートされる 新しい API について
Microsoft Edgeでサポートされる新しい API についてMicrosoft Edgeでサポートされる新しい API について
Microsoft Edgeで サポートされる 新しい API について
 
Advanced Jasper Reports
Advanced Jasper ReportsAdvanced Jasper Reports
Advanced Jasper Reports
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
DevNation Live: Kafka and Debezium
DevNation Live: Kafka and DebeziumDevNation Live: Kafka and Debezium
DevNation Live: Kafka and Debezium
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

Similar to AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learningArnaud Rachez
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsGanesan Narayanasamy
 

Similar to AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019 (20)

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019

  • 1.
  • 2. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Sridhar Paladugu Frank McQuillan AI on Greenplum Using Apache MADlib and MADlib Flow
  • 3. Greenplum Integrated Analytics Data Transformation Traditional BI Machine Learning Graph Data Science Productivity Tools Geospatial Text Deep Learning Build Manage Deploy
  • 4. ■ Machine learning ■ Deep learning ■ Model management ■ Deployment and orchestration of models Agenda
  • 5. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. 1. Machine Learning with Apache MADlib
  • 6. Scalable, In-Database Machine Learning • Open source https://github.com/apache/madlib • Downloads and docs http://madlib.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/ Apache MADlib: Big Data Machine Learning in SQL Open source, top level Apache project For PostgreSQL and Greenplum Database Powerful machine learning, graph, statistics and analytics for data scientists
  • 7. History MADlib project was initiated in 2011 by EMC/Greenplum architects and Professor Joe Hellerstein from University of California, Berkeley. UrbanDictionary.com: mad (adj.): an adjective used to enhance a noun. 1- dude, you got skills. 2- dude, you got mad skills.
  • 8. Functions Data Types and Transformations Array and Matrix Operations Matrix Factorization • Low Rank • Singular Value Decomposition (SVD) Norms and Distance Functions Sparse Vectors Encoding Categorical Variables Path Functions Pivot Sessionize Stemming Apache MADlib 1.15.1 Graph All Pairs Shortest Path (APSP) Breadth-First Search Hyperlink-Induced Topic Search (HITS) Average Path Length Closeness Centrality Graph Diameter In-Out Degree PageRank and Personalized PageRank Single Source Shortest Path (SSSP) Weakly Connected Components Model Selection Cross Validation Prediction Metrics Train-Test Split Statistics Descriptive Statistics • Cardinality Estimators • Correlation and Covariance • Summary Inferential Statistics • Hypothesis Tests Probability Functions Supervised Learning Neural Networks Support Vector Machines (SVM) Conditional Random Field (CRF) Regression Models • Clustered Variance • Cox-Proportional Hazards Regression • Elastic Net Regularization • Generalized Linear Models • Linear Regression • Logistic Regression • Marginal Effects • Multinomial Regression • Naïve Bayes • Ordinal Regression • Robust Variance Tree Methods • Decision Tree • Random Forest Time Series Analysis • ARIMA Unsupervised Learning Association Rules (Apriori) Clustering (k-Means) Principal Component Analysis (PCA) Topic Modelling (Latent Dirichlet Allocation) Utility Functions Columns to Vector Conjugate Gradient Linear Solvers • Dense Linear Systems • Sparse Linear Systems Mini-Batching PMML Export Term Frequency for Text Vector to Columns Nearest Neighbors • k-Nearest Neighbors Sampling Balanced Random Stratified Comprehensive and mature data science library
  • 9. Why MADlib on Greenplum? • Better parallelism • Better scalability • Higher predictive accuracy • Top level ASF project “Apache MADlib Comes of Age”, Frank McQuillan, Oct. 2017, https://content.pivotal.io/blog/apache-madlib-comes-of-age
  • 10. Greenplum Database with MADlib Standby Master … Master Host SQL Interconnect Segment Host Node1 Segment Host Node2 Segment Host Node3 Segment Host NodeN Local Storage Other RDBMSes SparkGemFire Cloud Object Storage HDFS KafkaETL Spring Cloud Data Flow In-Database Functions Machine learning & statistics & math & graph & utilities MassivelyParallelProcessing
  • 11. Iterative Model Execution Master model = init(…) WHILE model not converged model = SELECT model.aggregation(…) FROM data table ENDWHILE Stored Procedure for Model … Broadcast Segment 2 Segment n … Transition Function Operates on tuples or mini-batches to update transition state (model) 1 Merge Function Combines transition states2 Final Function Transforms transition state into output value 3 Segment 1
  • 12. Familiar SQL Interface Train (build a predictive model) Predict (use model on new data)
  • 13. Familiar SQL Interface From house pricing model
  • 14. SVM Scale with Data Size Greenplum cluster: ● 1 master ● 4 segment hosts with 6 segments per host Support Vector Machines
  • 15. PageRank Scale with Graph Size Greenplum cluster: ● 1 master ● 4 segment hosts with 6 segments per host Normal random graphs with mean degrees 50 edges per vertex (i.e., 5B edges in the largest case) 5B edges (1K) (10K) (100K) (1M) (10M) (100M) Note: log-log scale (100s) (1s) (10K s) (1M s) “Graph Processing on Greenplum Database using Apache MADlib”, Frank McQuillan, Jan 2018, https://content.pivotal.io/blog/graph-processing-on-greenplum-database-using-apache-madlib
  • 16. But modeling is only part of the story... “It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.” - Jeffrey Heer, Professor of Computer Science at the University of Washington and Co- founder of Trifacta “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, Aug. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
  • 18. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. 2. Deep Learning
  • 19. Deep Learning • Type of machine learning inspired by biology of the brain • Artificial neural networks with multiple layers between input and output
  • 20. Example Deep Learning Algorithms Multilayer perceptron (MLP) “The Original” Recurrent neural network (RNN) E.g., machine translation Convolutional neural network (CNN) E.g., image classification
  • 21. Convolutional Neural Networks (CNN) • Effective for computer vision • Fewer parameters than fully connected networks • Translational invariance • Classic networks: LeNet-5, AlexNet, VGG
  • 22. Graphics Processing Units (GPUs) • Great at performing a lot of simple computations such as matrix operations • Well suited to deep learning algorithms
  • 24. Moving Data Greenplum <-> Single Server Deep learningData preparation, feature generation, machine learning, geospatial, etc. Large data transfer Suboptimal
  • 25. Integrated Deep Learning with Greenplum Standby Master … Master Host SQL Interconnect Segment Host Node1 Segment Host Node2 Segment Host Node3 Segment Host NodeN GPU N … GPU 1 GPU N … GPU 1 GPU N … GPU 1 … GPU N … GPU 1 In-Database Functions Machine learning & statistics & math & graph & utilities MassivelyParallelProcessing
  • 26. Deep Learning on a Cluster Num Approach Description 1 Distributed deep learning Train single model architecture across the cluster. Data distributed (usually randomly) across segments. 2 Data parallel models Train same model architecture in parallel on different data groups (e.g., build separate models per country). 3 Hyperparameter tuning Train same model architecture in parallel with different hyperparameter settings and incorporate cross validation. Same data on each segment. 4 Neural architecture search Train different model architectures in parallel. Same data on each segment. Current work
  • 27. Data Loading and Formatting
  • 28. Testing Infrastructure • Google Cloud Platform (GCP) • Type n1-highmem-32 (32 vCPUs, 208 GB memory) • NVIDIA Tesla P100 GPUs • Greenplum database config – Tested up to 20 segment (worker node) clusters – 1 GPU per segment
  • 29. 6-layer CNN - Runtime (CIFAR-10) Method: Model weight averaging
  • 30. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. 3. Model Management
  • 31. Try and try and try... • Data scientists typically try many different types of models with many different parameters combinations
  • 32. Model Persistence in MADlib 1.x One model at a time
  • 33. Model Persistence in MADlib 2.0 Multiple models at a time in model library
  • 34. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. 4. MADlib Flow
  • 35. Data Science Process Model Evaluation Operationalization Model Building Feature Engineering Data Review User Feedback Problem Definition Setup
  • 36. Model Operationalization Model Evaluation Operationalization Model Building Feature Engineering Data Review User Feedback Problem Definition Setup Model Operationalization is the process of deploying data science models to production for ongoing use by other software
  • 37. Common Challenges With Operationalizing Models Model Evaluation Operationalization Model Building Feature Engineering Data Review User Feedback Problem Definition Setup Common challenges with model operationalization: ● Handling production data ● Engineering for scale and performance ● Model transportation ● Managing and orchestrating deployed models ● Data Scientists are not developers or platform experts
  • 38. BATCH TRAINING BATCH INFERENCE ~40% of today’s use cases Tax Return Fraud: Score database of tax returns - on a nightly basis - to flag likely fraudulent returns for audit EVENT DRIVEN TRAINING EVENT DRIVEN INFERENCE <5% today’s use cases Online Advertising: Maximize Click Thru Rate by algorithmically selecting and testing advertisement placement in real time BATCH TRAINING EVENT DRIVEN INFERENCE ~55% today’s use cases (growing) Real Time Transaction Fraud: Train a ML model on historical data to classify - in real time - whether or not new credit/debit transactions are likely to be fraudulent EXAMPLE Patterns For Operationalizing Models EXAMPLE EXAMPLE PotsgreSQL/Greenplum with MADlib supports this pattern PostgreSQL/Greenplum with MADlib & MADlib Flow supports this pattern Highly specialized – low number of enterprise use cases
  • 39. AI For The PostgreSQL Community Standardized end-to-end Data Science in SQL with the Greenplum/Postgres stack Experimentation Initial code development and testing, model experimentation on samples. Modeling at Scale Heavy compute tasks such as model training across big data Deployment Production deployment of models to feed downstream applications and reports Artificial Intelligence : Closed Loop Machine Learning
  • 40. Model Deployment With MADlib Flow 1 ML Training Train ML model in Postgres or Greenplum using Apache MADlib madlibflow -- deploy Set configs in .yml and deploy model from Greenplum to Docker, PCF or Kubernetes 2 Docker pull Pull docker containers with optimized Postgres and MADlib 3 Pull Model Extract model and feature table schema layout from Greenplum database 4 Load Model Load model and feature table schema into optimized Postgres 5 Deploy Deploy docker container to target environment 6 Automated Backend OperationsUser Operations
  • 41. Containerized Deployment Of Models $ madlibflow --deploy --target kubernetes --type model Key benefits of MADlib Flow ● Easy to deploy & light weight ● Highly scalable REST and Streaming ● End-to-end SQL workflow ● Low latency inference/predictions ● Feature Transformations Single command to deploy a MADlib trained model from GPDB/Postgres to Docker, PCF or Kubernetes Containerized deployment of Apache MADlib Machine Learning workflows for low latency event driven inference and scale
  • 43. MADlib Flow : Hello World! Let us demonstrate a Linear Regression Model deployment Dependent Variable: ● patient has had a second heart attack within 1 year independent variables: ● patient completed a treatment on anger control ● anxiety scale score Workflow: Create schema Load data Train model Deploy model Tes t Batch prediction
  • 44. Model Deployment Deployment manifest $ madlibflow --name patient-lr --type model --action deploy --target kubernetes --inputJson config.json
  • 46. Greenplum Database Feature EngineCredit/Debit Card Transaction (Input) Message { “transaction_ts”: , “credit_card_number”: , “transaction_amt”:, “merchant_id”: } Approved Credit/Debit Card Transaction (Output) Message { “transaction_ts”: , “transaction_amt”:, “credit_card_number”:, “num_transactions_30days”:, “max_transactions_30days”:, “merchant_id”:, “num_fraud_cases”:, “avg_transaction_amount_30days”:, “fraud_risk_score”: 0.92, “approved”: True } Accounts credit_card_number num_transactions_30days max_transactions_30days Merchants merchant_id num_fraud_cases avg_transaction_amount_30days Cache (Gemfire, PCC, Redis, etc.) Cache Abstraction Cache Abstraction SELECT mch.* ,acct.* ,log(msg.transaction_amt + 1) AS log_transaction_amt FROM message msg JOIN merchants mch ON msg.merchant_id=mch.merchant_id JOIN accounts acct ON msg.credit_card_number=acct.credit_card_number; MADlib REST Cache Loader Automated deployment of scalable low latency end-to-end ML pipelines (“Data Science Ops.”) No code conversion - engineer features and populate cache in SQL Join data from the incoming message with cached data Accounts Merchants SELECT create_accounts(); SELECT create_merchants(); Example Flow for Fraud Detection
  • 47. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. 5. Learn More!
  • 48. • Download – http://madlib.apache.org/ • ~40 Jupyter notebooks – https://github.com/apache/madlib-site/tree/asf- site/community-artifacts • Wednesday March 20 @PostgresConf
  • 49. #ScaleMatters © Copyright 2019 Pivotal Software, Inc. All rights Reserved.
  • 50. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Backup Slides
  • 51. MADlib 2.0 ● More deep learning capabilities ○ Improved model performance ○ Hyperparameter tuning ● Model repositories and management for streamlined data science workflows ● New and improved SQL interface for MADlib functions MADlib Flow ● Support for PL/Python and PL/R ● Native deployment to Pivotal Cloud foundry as build pack. ● Beta Release in May’19 ● Metrics collector. MADlib 1.16 ● Initial deep learning release for image classification (Keras/TensorFlow) ● Postgres 11 support ● Improve speed of k- nearest neighbors via approximate method Looking Ahead
  • 52. Apache MADlib Resources • Web site – http://madlib.apache.org/ • Wiki – https://cwiki.apache.org/confluence/display/MAD LIB/Apache+MADlib • User docs – http://madlib.apache.org/docs/latest/index.html • Jupyter notebooks – https://github.com/apache/madlib-site/tree/asf- site/community-artifacts • Technical docs – http://madlib.apache.org/design.pdf • Pivotal commercial site – http://pivotal.io/madlib • Mailing lists and JIRAs – https://mail- archives.apache.org/mod_mbox/incubator- madlib-dev/ – http://mail- archives.apache.org/mod_mbox/incubator- madlib-user/ – https://issues.apache.org/jira/browse/MADLIB • PivotalR – https://cran.r- project.org/web/packages/PivotalR/index.html • Github – https://github.com/apache/madlib – https://github.com/pivotalsoftware/PivotalR
  • 53. Execution Flow Client Database Server Master Segment 1 Segment 2 Segment n … SQL Stored Procedure Result Set String Aggregation psql …
  • 55. Distributed Deep Learning Methods • Open area of research* • Methods we have investigated so far: – Simple averaging – Ensembling – Elastic averaging stochastic gradient descent (EASGD) * Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis https://arxiv.org/pdf/1802.09941.pdf
  • 56. Some Results with CIFAR-10 • 60k 32x32 color images in 10 classes, with 6k images per class • 50k training images and 10k test images https://www.cs.toronto.edu/~kriz/cifar.html
  • 57. ■ Experimentation -> Modeling at scale -> Deployment all in SQL ■ Single platform from model development to Deployment using Postgres/Greenplum ■ Low latency inference ■ Easy to deploy both feature generation code and model ■ Join data from event message with Feature cache objects using ANSI SQL ■ Continuously generate the features and feed in to feature engine. ■ Multiple versions of Models can be deployed for accuracy measurement. ■ Same tool can deploy to multiple Container Environments, PKS, AKS, GKE, etc. MADlib Flow Benefits