SlideShare a Scribd company logo
Pivotal Data Science
Technical Workshop
Scott Hajek
shajek@pivotal.io
April Song
asong@pivotal.io
Workshop Agenda
1. Data Science and Parallelism
2. Pivotal Greenplum Fundamental
Concepts
3. Airline Optimization Use Case
Parallelism: Crucial for Analytics at Scale
• Explicit Parallelism
• Problems that are easy to break up into a number of parallel tasks
• No dependency (or communication) between those parallel tasks
• Examples
• Have each person in this room weigh themselves
• Count a deck of cards by dividing it up between people in this room
• map() function in Python
• apply() family of functions in R
Pivotal Big Data Technology: Pivotal Greenplum & HDB
• Performance through massive
parallelism
• Automatic parallelization
• Load and query like any database
• Automatically distributed tables across
nodes
• Analytics-oriented query optimization
• Scalable MPP architecture
• All nodes can scan and process in
parallel
• Linear scalability by adding nodes
Pivotal Big Data Technology: Pivotal Greenplum & HDB
Think of it as multiple PostgreSQL servers
Segments (Workers)
Master
Rows are distributed across segments by a particular field (or randomly)
Pivotal Greenplum
Fundamental Concepts
Shared-Nothing Massively Parallel
Processing Architecture
External
Sources
Loading,
streaming
Segment
Servers
Query processing
and data storage
Interconnect
Master Servers
Query planning
and dispatch
Key Features and Benefits of Pivotal Greenplum
PRODUCT
FEATURES
CLIENT ACCESS
AND TOOLS
CORE MPP
ARCHITECTURE
GPDB ADAPTIVE
SERVICES
Workload Management
CLIENT ACCESS
ODBC, JDBC, OLEDB,
MapReduce, etc.
3rd PARTY TOOLS
BI Tools, ETL Tools
Data Mining, etc.
ADMIN TOOLS
Command Center
Package Manager
LOADING AND
EXTERNAL ACCESS
Petabyte-Scale Loading
Hadoop Integration
Trickle Micro-Batching
Anywhere Data Access
STORAGE/DATA ACCESS
Hybrid Storage and
Execution
In-Database Compression
Multi-Level Partitioning
Indexes – B-tree, Bitmap
External Table Support
LANGUAGE SUPPORT
Comprehensive SQL
Native MapReduce
SQL 2003 OLAP
Extensions
Programmable Analytics
Package Support
Multi-Level Fault
Tolerance
Online System
Expansion
Shared-Nothing MPP
Parallel Query Optimizer
Polymorphic Data Storage™
Parallel Dataflow Engine
gNet™ Software Interconnect
MPP Scatter/Gather Streaming™
Benefits of Greenplum
Faster
performance
Analytics where
the data lives
Flexibility and
control
Centralized
management
Enterprise class
reliability
Linear
scalability
Airline Optimization Use Case
Revenue Management
Solution for Airlines
Industry
Data Analytics Training
Machine Learning, Optimization, MADlib, PL/R
Revenue Management
What is Revenue Management ?
• Charge different prices to different segments to maximize revenue
When is it used?
• There is a fixed amount of resources available for sale
• The resources to sell are perishable
• Customers are willing to pay a different price for using the same resources.
What is the current state of Revenue Management?
• Revenue Management is in use since early 1980s
• Technology is dispersed
• Competitor prices are now available
What we will demonstrate
• We wanted to combine machine learning and optimization in the same flow to
highlight the benefits of having a single address for analytics.
• Highlight two technologies: MADlib and PL/R
Data Generation - Airports
We imported airport information from Wikipedia
Route Creation: Hub & Spoke Model
Route Creation: Hub & Spoke Model
Route Creation: Point to Point Model
Data Generation
Airports:
• Serve only airports with annual enplanement
larger than 1 M
• 65 Airports
Routes:
• Picked the Hub & Spoke Model
• From any airport to any other airport through one
of the 5 hubs
• From/to any airport to/from any hubs non-stop
routes
• No two connection routes allowed!
• 4180 Routes (640 are nonstop others through a
hub)
Daily Flights:
• Up to 5 times for a flight
• 2200 flights/day
How realistic is 2200 flights/day?
According to Bureau of Transportation Statistics the average number of
flights per day in 2011 by Airlines
• American Airlines ~1,500
• Delta ~2000
• JetBlue ~ 600
• SouthWest ~ 3200
• United ~ 910
Note: Numbers presented here are averages. There are small seasonal deviations.
Data Generation
Sales History:
• Close to two years of history
• Own Price + Competitors Prices
• Flight Date, Month, Weekday, Holiday Indicators
• Flights are available starting only 20 days before the flight date
• Sales for each sales date is captured.
• 3 Classes: First Class, Business, and Economy
4180
Routes
x
665
Flight
Dates
Up to 5
Flight
Times
20
Sales
Dates
3
Classes
x x x
> 500 Million observations ( > 150 GB)
Not big data but big enough that will not to fit in memory
Note: With networks data grows exponentially! Remember we started with only 65 airports.
Problem Statement
• Decision Variables
pt
where t is Number of Days to Flight
• Assume linear relationship between Demand and Price
– Dt
(pt
) is demand on day t when price is set to pt
• Demand depends also on
– Competitor Prices
– Trend
– Seasonality
▪ Day of the Week, Month, Holiday Indicator
Once you determine D(pi
)s - (ai
pi
+bi
) the problem is a Quadratic
Programming problem with Linear Constraints
Max Σi
D(pi
) pi
st. Σi
D(pi
) <= Capacity, D(pi
) >= 0
Linear Regression: Other possible features
Own Price
Comp 1 Price
Comp 2 Price
Comp 3 Price
Comp 4 Price
Price Gaps
Minimum Price
Month
Week day
Holiday Ind
Individual
Holidays
Linear Trend
Projected BTS
Trends
Unemployment
Consumer
Confidence
Long Overlays
Short Connections
No of Connections
Distribution
Frequent Flyer
Promos
Delay Statistics
Weather
School Schedules
Marketing Spend
Season
Solution
Sales History
~500 Million
Model Results
627 K Models
MADlib
To Be Priced Routes
86 Million
Scoring
86 Million
Input for QP
4.3 Million
Optimal Prices
4.3*20 Million
PL/R
SQL+MADlib
SQL
~60 secs
~13 secs
~10 secs
~45 secs
• Get insight from sales history
• Optimize the pricing decisions
for 4.3 Million flights
Step 1: Linear Regression Step 2: Scoring Step 3: Aggregate Step 4: Optimize
• Approximately 2 Minutes
• With 4 Select Statements
• Without data ever leaving the DB
Next Steps to Continue your Learning Journey
• Download the Greenplum Sandbox with MADlib
– http://greenplum.org/
• Learn more about Pivotal Data Science
– https://content.pivotal.io/data-science
• Pivotal Academy
– http://academy.pivotal.io
Thank You.

More Related Content

More from VMware Tanzu

Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
VMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
VMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
VMware Tanzu
 
SpringOne Tour: Doing Progressive Delivery with your Team
SpringOne Tour: Doing Progressive Delivery with your TeamSpringOne Tour: Doing Progressive Delivery with your Team
SpringOne Tour: Doing Progressive Delivery with your Team
VMware Tanzu
 
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
VMware Tanzu
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
VMware Tanzu
 
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
VMware Tanzu
 
SpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and BeyondSpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and Beyond
VMware Tanzu
 
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
VMware Tanzu
 
Tanzu Developer Connect | Public Sector | March 29, 2023.pdf
Tanzu Developer Connect | Public Sector | March 29, 2023.pdfTanzu Developer Connect | Public Sector | March 29, 2023.pdf
Tanzu Developer Connect | Public Sector | March 29, 2023.pdf
VMware Tanzu
 

More from VMware Tanzu (20)

Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 
SpringOne Tour: Doing Progressive Delivery with your Team
SpringOne Tour: Doing Progressive Delivery with your TeamSpringOne Tour: Doing Progressive Delivery with your Team
SpringOne Tour: Doing Progressive Delivery with your Team
 
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
SpringOne Tour: Make the Right Thing the Obvious Thing: The Journey to Intern...
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
 
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
SpringOne Tour: 10 Practical Tips for Building Native and Serverless Spring A...
 
SpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and BeyondSpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and Beyond
 
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
SpringOne Tour 2023: Let's Get Streaming! A Guide to Orchestrating Spring Clo...
 
Tanzu Developer Connect | Public Sector | March 29, 2023.pdf
Tanzu Developer Connect | Public Sector | March 29, 2023.pdfTanzu Developer Connect | Public Sector | March 29, 2023.pdf
Tanzu Developer Connect | Public Sector | March 29, 2023.pdf
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

Pivotal Data Science Technical Workshop - PAIR LA

  • 1. Pivotal Data Science Technical Workshop Scott Hajek shajek@pivotal.io April Song asong@pivotal.io
  • 2. Workshop Agenda 1. Data Science and Parallelism 2. Pivotal Greenplum Fundamental Concepts 3. Airline Optimization Use Case
  • 3. Parallelism: Crucial for Analytics at Scale • Explicit Parallelism • Problems that are easy to break up into a number of parallel tasks • No dependency (or communication) between those parallel tasks • Examples • Have each person in this room weigh themselves • Count a deck of cards by dividing it up between people in this room • map() function in Python • apply() family of functions in R
  • 4. Pivotal Big Data Technology: Pivotal Greenplum & HDB • Performance through massive parallelism • Automatic parallelization • Load and query like any database • Automatically distributed tables across nodes • Analytics-oriented query optimization • Scalable MPP architecture • All nodes can scan and process in parallel • Linear scalability by adding nodes
  • 5. Pivotal Big Data Technology: Pivotal Greenplum & HDB Think of it as multiple PostgreSQL servers Segments (Workers) Master Rows are distributed across segments by a particular field (or randomly)
  • 7. Shared-Nothing Massively Parallel Processing Architecture External Sources Loading, streaming Segment Servers Query processing and data storage Interconnect Master Servers Query planning and dispatch
  • 8. Key Features and Benefits of Pivotal Greenplum PRODUCT FEATURES CLIENT ACCESS AND TOOLS CORE MPP ARCHITECTURE GPDB ADAPTIVE SERVICES Workload Management CLIENT ACCESS ODBC, JDBC, OLEDB, MapReduce, etc. 3rd PARTY TOOLS BI Tools, ETL Tools Data Mining, etc. ADMIN TOOLS Command Center Package Manager LOADING AND EXTERNAL ACCESS Petabyte-Scale Loading Hadoop Integration Trickle Micro-Batching Anywhere Data Access STORAGE/DATA ACCESS Hybrid Storage and Execution In-Database Compression Multi-Level Partitioning Indexes – B-tree, Bitmap External Table Support LANGUAGE SUPPORT Comprehensive SQL Native MapReduce SQL 2003 OLAP Extensions Programmable Analytics Package Support Multi-Level Fault Tolerance Online System Expansion Shared-Nothing MPP Parallel Query Optimizer Polymorphic Data Storage™ Parallel Dataflow Engine gNet™ Software Interconnect MPP Scatter/Gather Streaming™
  • 9. Benefits of Greenplum Faster performance Analytics where the data lives Flexibility and control Centralized management Enterprise class reliability Linear scalability
  • 11. Revenue Management Solution for Airlines Industry Data Analytics Training Machine Learning, Optimization, MADlib, PL/R
  • 12. Revenue Management What is Revenue Management ? • Charge different prices to different segments to maximize revenue When is it used? • There is a fixed amount of resources available for sale • The resources to sell are perishable • Customers are willing to pay a different price for using the same resources. What is the current state of Revenue Management? • Revenue Management is in use since early 1980s • Technology is dispersed • Competitor prices are now available What we will demonstrate • We wanted to combine machine learning and optimization in the same flow to highlight the benefits of having a single address for analytics. • Highlight two technologies: MADlib and PL/R
  • 13. Data Generation - Airports We imported airport information from Wikipedia
  • 14. Route Creation: Hub & Spoke Model
  • 15. Route Creation: Hub & Spoke Model
  • 16. Route Creation: Point to Point Model
  • 17. Data Generation Airports: • Serve only airports with annual enplanement larger than 1 M • 65 Airports Routes: • Picked the Hub & Spoke Model • From any airport to any other airport through one of the 5 hubs • From/to any airport to/from any hubs non-stop routes • No two connection routes allowed! • 4180 Routes (640 are nonstop others through a hub) Daily Flights: • Up to 5 times for a flight • 2200 flights/day
  • 18. How realistic is 2200 flights/day? According to Bureau of Transportation Statistics the average number of flights per day in 2011 by Airlines • American Airlines ~1,500 • Delta ~2000 • JetBlue ~ 600 • SouthWest ~ 3200 • United ~ 910 Note: Numbers presented here are averages. There are small seasonal deviations.
  • 19. Data Generation Sales History: • Close to two years of history • Own Price + Competitors Prices • Flight Date, Month, Weekday, Holiday Indicators • Flights are available starting only 20 days before the flight date • Sales for each sales date is captured. • 3 Classes: First Class, Business, and Economy 4180 Routes x 665 Flight Dates Up to 5 Flight Times 20 Sales Dates 3 Classes x x x > 500 Million observations ( > 150 GB) Not big data but big enough that will not to fit in memory Note: With networks data grows exponentially! Remember we started with only 65 airports.
  • 20. Problem Statement • Decision Variables pt where t is Number of Days to Flight • Assume linear relationship between Demand and Price – Dt (pt ) is demand on day t when price is set to pt • Demand depends also on – Competitor Prices – Trend – Seasonality ▪ Day of the Week, Month, Holiday Indicator Once you determine D(pi )s - (ai pi +bi ) the problem is a Quadratic Programming problem with Linear Constraints Max Σi D(pi ) pi st. Σi D(pi ) <= Capacity, D(pi ) >= 0
  • 21. Linear Regression: Other possible features Own Price Comp 1 Price Comp 2 Price Comp 3 Price Comp 4 Price Price Gaps Minimum Price Month Week day Holiday Ind Individual Holidays Linear Trend Projected BTS Trends Unemployment Consumer Confidence Long Overlays Short Connections No of Connections Distribution Frequent Flyer Promos Delay Statistics Weather School Schedules Marketing Spend Season
  • 22. Solution Sales History ~500 Million Model Results 627 K Models MADlib To Be Priced Routes 86 Million Scoring 86 Million Input for QP 4.3 Million Optimal Prices 4.3*20 Million PL/R SQL+MADlib SQL ~60 secs ~13 secs ~10 secs ~45 secs • Get insight from sales history • Optimize the pricing decisions for 4.3 Million flights Step 1: Linear Regression Step 2: Scoring Step 3: Aggregate Step 4: Optimize • Approximately 2 Minutes • With 4 Select Statements • Without data ever leaving the DB
  • 23. Next Steps to Continue your Learning Journey • Download the Greenplum Sandbox with MADlib – http://greenplum.org/ • Learn more about Pivotal Data Science – https://content.pivotal.io/data-science • Pivotal Academy – http://academy.pivotal.io