Knowing how to quickly create advanced analytical models and deploy them efficiently is the true calling card of the data-driven enterprise. Join us for the Pivotal Analytics Innovation Roadshow, a day-long event for data professionals, where you will learn how to roll out successful analytics initiatives while avoiding traps that lead to delays and setbacks. Participants will get practical advice and best practices for building a data-driven enterprise from a seasoned team of data and analytics experts.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
2. Workshop Agenda
1. Data Science and Parallelism
2. Pivotal Greenplum Fundamental
Concepts
3. Airline Optimization Use Case
3. Parallelism: Crucial for Analytics at Scale
• Explicit Parallelism
• Problems that are easy to break up into a number of parallel tasks
• No dependency (or communication) between those parallel tasks
• Examples
• Have each person in this room weigh themselves
• Count a deck of cards by dividing it up between people in this room
• map() function in Python
• apply() family of functions in R
4. Pivotal Big Data Technology: Pivotal Greenplum & HDB
• Performance through massive
parallelism
• Automatic parallelization
• Load and query like any database
• Automatically distributed tables across
nodes
• Analytics-oriented query optimization
• Scalable MPP architecture
• All nodes can scan and process in
parallel
• Linear scalability by adding nodes
5. Pivotal Big Data Technology: Pivotal Greenplum & HDB
Think of it as multiple PostgreSQL servers
Segments (Workers)
Master
Rows are distributed across segments by a particular field (or randomly)
12. Revenue Management
What is Revenue Management ?
• Charge different prices to different segments to maximize revenue
When is it used?
• There is a fixed amount of resources available for sale
• The resources to sell are perishable
• Customers are willing to pay a different price for using the same resources.
What is the current state of Revenue Management?
• Revenue Management is in use since early 1980s
• Technology is dispersed
• Competitor prices are now available
What we will demonstrate
• We wanted to combine machine learning and optimization in the same flow to
highlight the benefits of having a single address for analytics.
• Highlight two technologies: MADlib and PL/R
13. Data Generation - Airports
We imported airport information from Wikipedia
17. Data Generation
Airports:
• Serve only airports with annual enplanement
larger than 1 M
• 65 Airports
Routes:
• Picked the Hub & Spoke Model
• From any airport to any other airport through one
of the 5 hubs
• From/to any airport to/from any hubs non-stop
routes
• No two connection routes allowed!
• 4180 Routes (640 are nonstop others through a
hub)
Daily Flights:
• Up to 5 times for a flight
• 2200 flights/day
18. How realistic is 2200 flights/day?
According to Bureau of Transportation Statistics the average number of
flights per day in 2011 by Airlines
• American Airlines ~1,500
• Delta ~2000
• JetBlue ~ 600
• SouthWest ~ 3200
• United ~ 910
Note: Numbers presented here are averages. There are small seasonal deviations.
19. Data Generation
Sales History:
• Close to two years of history
• Own Price + Competitors Prices
• Flight Date, Month, Weekday, Holiday Indicators
• Flights are available starting only 20 days before the flight date
• Sales for each sales date is captured.
• 3 Classes: First Class, Business, and Economy
4180
Routes
x
665
Flight
Dates
Up to 5
Flight
Times
20
Sales
Dates
3
Classes
x x x
> 500 Million observations ( > 150 GB)
Not big data but big enough that will not to fit in memory
Note: With networks data grows exponentially! Remember we started with only 65 airports.
20. Problem Statement
• Decision Variables
pt
where t is Number of Days to Flight
• Assume linear relationship between Demand and Price
– Dt
(pt
) is demand on day t when price is set to pt
• Demand depends also on
– Competitor Prices
– Trend
– Seasonality
▪ Day of the Week, Month, Holiday Indicator
Once you determine D(pi
)s - (ai
pi
+bi
) the problem is a Quadratic
Programming problem with Linear Constraints
Max Σi
D(pi
) pi
st. Σi
D(pi
) <= Capacity, D(pi
) >= 0
21. Linear Regression: Other possible features
Own Price
Comp 1 Price
Comp 2 Price
Comp 3 Price
Comp 4 Price
Price Gaps
Minimum Price
Month
Week day
Holiday Ind
Individual
Holidays
Linear Trend
Projected BTS
Trends
Unemployment
Consumer
Confidence
Long Overlays
Short Connections
No of Connections
Distribution
Frequent Flyer
Promos
Delay Statistics
Weather
School Schedules
Marketing Spend
Season
22. Solution
Sales History
~500 Million
Model Results
627 K Models
MADlib
To Be Priced Routes
86 Million
Scoring
86 Million
Input for QP
4.3 Million
Optimal Prices
4.3*20 Million
PL/R
SQL+MADlib
SQL
~60 secs
~13 secs
~10 secs
~45 secs
• Get insight from sales history
• Optimize the pricing decisions
for 4.3 Million flights
Step 1: Linear Regression Step 2: Scoring Step 3: Aggregate Step 4: Optimize
• Approximately 2 Minutes
• With 4 Select Statements
• Without data ever leaving the DB
23. Next Steps to Continue your Learning Journey
• Download the Greenplum Sandbox with MADlib
– http://greenplum.org/
• Learn more about Pivotal Data Science
– https://content.pivotal.io/data-science
• Pivotal Academy
– http://academy.pivotal.io