SlideShare a Scribd company logo
1 of 32
11Pivotal Confidential–Internal Use Only
Apache HAWQ and
Apache MADlib
Journey to Apache
Pivotal San Francisco
Dec 3, 2015
2
Topics
• Journey to Apache
• HAWQ overview
• MADlib overview
33Pivotal Confidential–Internal Use Only
Journey to Apache
1986 … 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
Journey to Apache
Michael Stonebraker develops Postgres at UCB
Postgres adds support for SQL
Open Source PostgreSQL
PostgreSQL 7.0 released
PostgreSQL 8.0 released
Greenplum forks
PostgreSQL
Hadoop 1.0 Released
HAWQ & MADlib
go Apache
HAWQ
launched
Hadoop 2.0 Released
MADlib
launched
Greenplum
open sourced
5
Pivotal is Committed to
Open Source
Pivotal GemFire Apache Geode (April 2015)
Pivotal HDB Apache HAWQ (Sept 2015)
Pivotal Query OptimizerComing
Soon
MADlib OSS
(BSD License)
Apache MADlib (Sept 2015)
Pivotal Greenplum Greenplum Database (Oct 2015)
(Apache 2 License)
6
 Collaborate on software in open and productive ways
 Need strong community for innovation
 MADlib and HAWQ are complementary technologies
Why Apache?
77Pivotal Confidential–Internal Use Only
Apache HAWQ Overview
8
What is Apache HAWQ?
9
Key Features
of
HAWQ
5
10
5 • Up to 30x SQL-on-Hadoop performance
advantage
• Faster time to insight
• Massive MPP scalability to petabytes
Benefits: Near real-time latency, complex
queries and advanced analytics
at scale
1. Advanced Analytics Performance
Key Features
of
HAWQ
11
5 • ANSI SQL-92, -99, -2003
• All 99 TPC-DS queries tested, no
modifications
• Plus, OLAP extensions
• Complete ACID integrity and reliability
Benefits: 100% SQL compliant
No risk to SQL applications
All native on HDP via HAWQ
2. 100% ANSI SQL Compliant
Key Features
of
HAWQ
12
HAWQ Performance vs Impala
HAWQ
Faster
Impala
Faster
2 28 46 66 73 76 79 80 88 90
96
HAWQ
• Faster on 46 of 62
TPC-DS queries
completed*
• 4.55x mean avg.
• 12 hrs faster total
* Impala supported 74 of 99
queries, 12 crashed mid-run
13
HAWQ vs Apache Hive w/Tez
HAWQ
Faster
Hive
Faster
3 7 15 25 27 34 46 48 76 79 89 90
96
HAWQ
• Faster on 45 of 60
TPC-DS queries
completed*
• 3.44x mean avg.
• 9 hrs faster total
* Hive supported 65 of 99 queries,
5 crashed mid-run
14
5 • Advanced machine learning for big data
• Local, in-database operation
• Exceptional MPP/parallel performance
• Open source, Postgres-based
Benefits: Advanced, highly scalable,
machine learning, directly on
data in Hadoop
3. Integrated Machine Learning
Key Features
of
HAWQ
15
5 • HDP, PHD, other ODPi-derived distros
• Easily managed via Ambari
• On premises, in cloud, or PaaS
• HBase, Avro, Parquet and more
• Connectors to make HAWQ data
available to other SQL query tools
Benefits: Flexibility
Accessibility
Portability
4. Flexible Deployment
Key Features
of
HAWQ
16
Open Data Platform
A shared industry effort to advance the state of Apache Hadoop® and Big Data
technologies for the enterprise
17
The open ecosystem of big data
September 25, 2015
http://odpi.org
18
5 • Cost-based query optimization
• Robust query plan optimization
• Complex big data management
Benefits: Optimize performance and costs
Maximize Hadoop cluster resources
Offload EDW w/o compromise
5. Query Optimization Options
Key Features
of
HAWQ
19
Apache HAWQ
● Discover New Relationships
● Enable Data Science
● Analyze External Sources
● Query All Data Types!
Multi-level
Fault Tolerance
Granular
Authorization
Resource Mgmt
(+ YARN)
high multi-tenancy
ANSI SQL
Standard
OLAP
Extensions
JDBC ODBC
Connectivity
Parallel
Processing
Online
Expansion
HDFS
Petabyte Scale
Cost Based Optimizer
Dynamic
Pipelining
ACID +
Transactional
Multi-Language
UDF Support
Built-in Data
Science Library
Extensible
(PXF)
Query External
Sources
Hardened, 10+ Years Investment, Production
Proven
Accessibility + Usability
HDFS Native
File Formats
● Manage Multiple Workloads
● Petabyte Scale Analytics
● Security controls
● Leverage Existing
SQL Skills & BI Tools
● Easily Integrate with
Other Tools
● Sub-second
Performance Compression
+ Partitioning
core
compliance
● Hadoop-Native
● Supports Pivotal HD
and Hortonworks
Data Platform
● Ambari-Integrated
20
Apache HAWQ 2.0 (in beta)
Areas of Enhancement New Features
Elastic & Scalable Architecture
Hadoop-Native Integrations
Simplified External Data Access/Queries
Performance & Optimizations
On-Demand Virtual Segments
Flexible Query Dispatch on subset nodes
3 Tier RM: YARN level>User>Query-Operator
Dynamic Cluster Expansion (no redistribute)
New Fault Tolerance Service
HCatalog integration - Read Access
HDFS Catalog Cache
Per Table Directory storage (user friendly)
Single physical segment per node
Easier Administration/Usage
Cloud-Ready
Simpler Management Commands
21
HAWQ
Segments
HAWQ
Masters
Yarn
Physical Segment
Client
Parser/
Analyzer
Optimizer
Dispatcher
DataNode
NodeManager
NameNodeNameNode
External Data Stores via Xtension Framework (Hive/HBase/etc)
Resource
Manager
Fault Tolerance
Service
Catalog
Service
Virtual
Segment
Virtual
Segment
Physical Segment
DataNode
NodeManager
Virtual
Segment
Virtual
Segment
Physical Segment
DataNode
NodeManager
Virtual
Segment
Virtual
Segment
Resource
Broker
libYARN
HDFS Catalog Cache
Interconnect Interconnect
Apache HAWQ 2.0
Architecture
22
Example Use Cases
Smart/connected car
• PHD, HAWQ
• Ability to have numerous data
in Hadoop
• Generate new business models
• Predictive analytics
Network & Call Center Analysis
• PHD, HAWQ
• Store and maintain 2B records/day
• Analyze drop and completed calls
• Analyze networks, care-center
responsiveness
• 5X capacity of EDW at half the cost
Revenue Prediction
• PHD, HAWQ, GPDB
• Predict ad revenue
to within 1%
• Transform into data-driven
company that builds
close relationships with
customers
Archive Analytics, Customer
Behavior Analytics
• PHD, HAWQ
• Mainframe alternative
• Archive analytics
• Customer behavior
profiling and analytics
2323Pivotal Confidential–Internal Use Only
Apache MADlib Overview
24
Scalable, In-Database
Machine Learning
• Open Source https://github.com/apache/incubator-madlib
• Supports Greenplum DB, Apache HAWQ/HDB and PostgreSQL
• Downloads and Docs: http://madlib.incubator.apache.org/
Apache (incubating)
25
History
MADlib project was initiated in 2011 by EMC/Greenplum architects and
Joe Hellerstein from Univ. of California, Berkeley.
• MAD stands for:
• lib stands for SQL library of:
• advanced (mathematical, statistical, machine learning)
• parallel & scalable in-database functions
UrbanDictionary.com:
mad (adj.): an adjective used to enhance a noun.
1- dude, you got skills.
2- dude, you got mad skills.
26
Functions
Predictive Modeling Library
Linear Systems
• Sparse and Dense Solvers
• Linear Algebra
Matrix Factorization
• Singular Value Decomposition (SVD)
• Low Rank
Generalized Linear Models
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• Cox Proportional Hazards Regression
• Elastic Net Regularization
• Robust Variance (Huber-White), Clustered
Variance, Marginal Effects
Other Machine Learning Algorithms
• Principal Component Analysis (PCA)
• Association Rules (Apriori)
• Topic Modeling (Parallel LDA)
• Decision Trees
• Random Forest
• Support Vector Machines
• Conditional Random Field (CRF)
• Clustering (K-means)
• Cross Validation
• Naïve Bayes
• Support Vector Machines (SVM)
Descriptive Statistics
Sketch-Based Estimators
• CountMin (Cormode-Muth.)
• FM (Flajolet-Martin)
• MFV (Most Frequent Values)
Correlation
Summary
Support Modules
Array Operations
Sparse Vectors
Random Sampling
Probability Functions
Data Preparation
PMML Export
Conjugate Gradient
Inferential Statistics
Hypothesis Tests
Time Series
• ARIMA
Oct 2014
27
MADlib Advantages
 Better parallelism
– Algorithms designed to leverage MPP and
Hadoop architecture
 Better scalability
– Algorithms scale as your data set scales
 Better predictive accuracy
– Can use all data, not a sample
 ASF open source (incubating)
– Available for customization and optimization
28
Supported Platforms
GPDB PostgreSQL
PHD
HDP
Other ODPi distros
Scale-out machine learning
now available on open
source, MPP execution
engines.
29
Example Usage
Train a model
Predict for new data
30
Linear Regression on 10 Million Rows in Seconds
Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of
the VLDB Endowment 5.12 (2012): 1700-1711.
31
Pivotal is very proud to deepen
our relationship with the ASF to
advance SQL-on-Hadoop and
machine learning technologies.
Please join us!
32
Contributors Welcome!
• Web sites
– http://hawq.incubator.apache.org/
– http://madlib.incubator.apache.org/
– https://cran.r-project.org/web/packages/PivotalR/index.html
• Github
– https://github.com/apache/incubator-hawq
– https://github.com/apache/incubator-madlib
– https://github.com/pivotalsoftware/PivotalR

More Related Content

What's hot

SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterMilind Bhandarkar
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...DataWorks Summit
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 
GPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServiceGPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServicePivotalOpenSourceHub
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMapR Technologies
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Hivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveHivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveDataWorks Summit
 

What's hot (20)

SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
GPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServiceGPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a Service
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Hivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveHivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache Hive
 

Viewers also liked

Apache HAWQ : An Introduction
Apache HAWQ : An IntroductionApache HAWQ : An Introduction
Apache HAWQ : An IntroductionSandeep Kunkunuru
 
Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabasePivotalOpenSourceHub
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchVMware Tanzu
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache GeodePivotalOpenSourceHub
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram ManiShivram Mani
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Hawq Hcatalog Integration
Hawq Hcatalog IntegrationHawq Hcatalog Integration
Hawq Hcatalog IntegrationShivram Mani
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIMithun (Matt) Mathew
 
PXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataPXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataShivram Mani
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)saravana krishnamurthy
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQInMobi Technology
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1seungdon Choi
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
 

Viewers also liked (20)

Apache HAWQ : An Introduction
Apache HAWQ : An IntroductionApache HAWQ : An Introduction
Apache HAWQ : An Introduction
 
Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache Geode
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Mani
 
PXF BDAM 2016
PXF BDAM 2016PXF BDAM 2016
PXF BDAM 2016
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Hawq Hcatalog Integration
Hawq Hcatalog IntegrationHawq Hcatalog Integration
Hawq Hcatalog Integration
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARI
 
PXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataPXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged Data
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)
 
Build & test Apache Hawq
Build & test Apache Hawq Build & test Apache Hawq
Build & test Apache Hawq
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 

Similar to Apache HAWQ and Apache MADlib: Journey to Apache

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in HadoopAnalyticsWeek
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Christian Tzolov
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_finalEMC
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMark Swarbrick
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
 

Similar to Apache HAWQ and Apache MADlib: Journey to Apache (20)

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_final
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 

More from PivotalOpenSourceHub

Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationPivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode AdaptorPivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analyticsPivotalOpenSourceHub
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed SystemsPivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 

More from PivotalOpenSourceHub (20)

Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Apache Geode Offheap Storage
Apache Geode Offheap StorageApache Geode Offheap Storage
Apache Geode Offheap Storage
 

Recently uploaded

Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Apache HAWQ and Apache MADlib: Journey to Apache

  • 1. 11Pivotal Confidential–Internal Use Only Apache HAWQ and Apache MADlib Journey to Apache Pivotal San Francisco Dec 3, 2015
  • 2. 2 Topics • Journey to Apache • HAWQ overview • MADlib overview
  • 3. 33Pivotal Confidential–Internal Use Only Journey to Apache
  • 4. 1986 … 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 Journey to Apache Michael Stonebraker develops Postgres at UCB Postgres adds support for SQL Open Source PostgreSQL PostgreSQL 7.0 released PostgreSQL 8.0 released Greenplum forks PostgreSQL Hadoop 1.0 Released HAWQ & MADlib go Apache HAWQ launched Hadoop 2.0 Released MADlib launched Greenplum open sourced
  • 5. 5 Pivotal is Committed to Open Source Pivotal GemFire Apache Geode (April 2015) Pivotal HDB Apache HAWQ (Sept 2015) Pivotal Query OptimizerComing Soon MADlib OSS (BSD License) Apache MADlib (Sept 2015) Pivotal Greenplum Greenplum Database (Oct 2015) (Apache 2 License)
  • 6. 6  Collaborate on software in open and productive ways  Need strong community for innovation  MADlib and HAWQ are complementary technologies Why Apache?
  • 7. 77Pivotal Confidential–Internal Use Only Apache HAWQ Overview
  • 10. 10 5 • Up to 30x SQL-on-Hadoop performance advantage • Faster time to insight • Massive MPP scalability to petabytes Benefits: Near real-time latency, complex queries and advanced analytics at scale 1. Advanced Analytics Performance Key Features of HAWQ
  • 11. 11 5 • ANSI SQL-92, -99, -2003 • All 99 TPC-DS queries tested, no modifications • Plus, OLAP extensions • Complete ACID integrity and reliability Benefits: 100% SQL compliant No risk to SQL applications All native on HDP via HAWQ 2. 100% ANSI SQL Compliant Key Features of HAWQ
  • 12. 12 HAWQ Performance vs Impala HAWQ Faster Impala Faster 2 28 46 66 73 76 79 80 88 90 96 HAWQ • Faster on 46 of 62 TPC-DS queries completed* • 4.55x mean avg. • 12 hrs faster total * Impala supported 74 of 99 queries, 12 crashed mid-run
  • 13. 13 HAWQ vs Apache Hive w/Tez HAWQ Faster Hive Faster 3 7 15 25 27 34 46 48 76 79 89 90 96 HAWQ • Faster on 45 of 60 TPC-DS queries completed* • 3.44x mean avg. • 9 hrs faster total * Hive supported 65 of 99 queries, 5 crashed mid-run
  • 14. 14 5 • Advanced machine learning for big data • Local, in-database operation • Exceptional MPP/parallel performance • Open source, Postgres-based Benefits: Advanced, highly scalable, machine learning, directly on data in Hadoop 3. Integrated Machine Learning Key Features of HAWQ
  • 15. 15 5 • HDP, PHD, other ODPi-derived distros • Easily managed via Ambari • On premises, in cloud, or PaaS • HBase, Avro, Parquet and more • Connectors to make HAWQ data available to other SQL query tools Benefits: Flexibility Accessibility Portability 4. Flexible Deployment Key Features of HAWQ
  • 16. 16 Open Data Platform A shared industry effort to advance the state of Apache Hadoop® and Big Data technologies for the enterprise
  • 17. 17 The open ecosystem of big data September 25, 2015 http://odpi.org
  • 18. 18 5 • Cost-based query optimization • Robust query plan optimization • Complex big data management Benefits: Optimize performance and costs Maximize Hadoop cluster resources Offload EDW w/o compromise 5. Query Optimization Options Key Features of HAWQ
  • 19. 19 Apache HAWQ ● Discover New Relationships ● Enable Data Science ● Analyze External Sources ● Query All Data Types! Multi-level Fault Tolerance Granular Authorization Resource Mgmt (+ YARN) high multi-tenancy ANSI SQL Standard OLAP Extensions JDBC ODBC Connectivity Parallel Processing Online Expansion HDFS Petabyte Scale Cost Based Optimizer Dynamic Pipelining ACID + Transactional Multi-Language UDF Support Built-in Data Science Library Extensible (PXF) Query External Sources Hardened, 10+ Years Investment, Production Proven Accessibility + Usability HDFS Native File Formats ● Manage Multiple Workloads ● Petabyte Scale Analytics ● Security controls ● Leverage Existing SQL Skills & BI Tools ● Easily Integrate with Other Tools ● Sub-second Performance Compression + Partitioning core compliance ● Hadoop-Native ● Supports Pivotal HD and Hortonworks Data Platform ● Ambari-Integrated
  • 20. 20 Apache HAWQ 2.0 (in beta) Areas of Enhancement New Features Elastic & Scalable Architecture Hadoop-Native Integrations Simplified External Data Access/Queries Performance & Optimizations On-Demand Virtual Segments Flexible Query Dispatch on subset nodes 3 Tier RM: YARN level>User>Query-Operator Dynamic Cluster Expansion (no redistribute) New Fault Tolerance Service HCatalog integration - Read Access HDFS Catalog Cache Per Table Directory storage (user friendly) Single physical segment per node Easier Administration/Usage Cloud-Ready Simpler Management Commands
  • 21. 21 HAWQ Segments HAWQ Masters Yarn Physical Segment Client Parser/ Analyzer Optimizer Dispatcher DataNode NodeManager NameNodeNameNode External Data Stores via Xtension Framework (Hive/HBase/etc) Resource Manager Fault Tolerance Service Catalog Service Virtual Segment Virtual Segment Physical Segment DataNode NodeManager Virtual Segment Virtual Segment Physical Segment DataNode NodeManager Virtual Segment Virtual Segment Resource Broker libYARN HDFS Catalog Cache Interconnect Interconnect Apache HAWQ 2.0 Architecture
  • 22. 22 Example Use Cases Smart/connected car • PHD, HAWQ • Ability to have numerous data in Hadoop • Generate new business models • Predictive analytics Network & Call Center Analysis • PHD, HAWQ • Store and maintain 2B records/day • Analyze drop and completed calls • Analyze networks, care-center responsiveness • 5X capacity of EDW at half the cost Revenue Prediction • PHD, HAWQ, GPDB • Predict ad revenue to within 1% • Transform into data-driven company that builds close relationships with customers Archive Analytics, Customer Behavior Analytics • PHD, HAWQ • Mainframe alternative • Archive analytics • Customer behavior profiling and analytics
  • 23. 2323Pivotal Confidential–Internal Use Only Apache MADlib Overview
  • 24. 24 Scalable, In-Database Machine Learning • Open Source https://github.com/apache/incubator-madlib • Supports Greenplum DB, Apache HAWQ/HDB and PostgreSQL • Downloads and Docs: http://madlib.incubator.apache.org/ Apache (incubating)
  • 25. 25 History MADlib project was initiated in 2011 by EMC/Greenplum architects and Joe Hellerstein from Univ. of California, Berkeley. • MAD stands for: • lib stands for SQL library of: • advanced (mathematical, statistical, machine learning) • parallel & scalable in-database functions UrbanDictionary.com: mad (adj.): an adjective used to enhance a noun. 1- dude, you got skills. 2- dude, you got mad skills.
  • 26. 26 Functions Predictive Modeling Library Linear Systems • Sparse and Dense Solvers • Linear Algebra Matrix Factorization • Singular Value Decomposition (SVD) • Low Rank Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards Regression • Elastic Net Regularization • Robust Variance (Huber-White), Clustered Variance, Marginal Effects Other Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Apriori) • Topic Modeling (Parallel LDA) • Decision Trees • Random Forest • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation • Naïve Bayes • Support Vector Machines (SVM) Descriptive Statistics Sketch-Based Estimators • CountMin (Cormode-Muth.) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions Data Preparation PMML Export Conjugate Gradient Inferential Statistics Hypothesis Tests Time Series • ARIMA Oct 2014
  • 27. 27 MADlib Advantages  Better parallelism – Algorithms designed to leverage MPP and Hadoop architecture  Better scalability – Algorithms scale as your data set scales  Better predictive accuracy – Can use all data, not a sample  ASF open source (incubating) – Available for customization and optimization
  • 28. 28 Supported Platforms GPDB PostgreSQL PHD HDP Other ODPi distros Scale-out machine learning now available on open source, MPP execution engines.
  • 29. 29 Example Usage Train a model Predict for new data
  • 30. 30 Linear Regression on 10 Million Rows in Seconds Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): 1700-1711.
  • 31. 31 Pivotal is very proud to deepen our relationship with the ASF to advance SQL-on-Hadoop and machine learning technologies. Please join us!
  • 32. 32 Contributors Welcome! • Web sites – http://hawq.incubator.apache.org/ – http://madlib.incubator.apache.org/ – https://cran.r-project.org/web/packages/PivotalR/index.html • Github – https://github.com/apache/incubator-hawq – https://github.com/apache/incubator-madlib – https://github.com/pivotalsoftware/PivotalR