SlideShare a Scribd company logo
Justin Brandenburg
Machine Learning Architect @ Databricks
Using PySpark to scale
Markov Decision
Problems for Policy
Exploration
#UnifiedDataAnalytics #SparkAISummit
About Me
• Member of the Professional Services team at Databricks
• Background in economics, cyber analytics and IT
• Based in Washington, DC USA
• Education:
– Bachelors in Economics – Virginia Tech
– Masters in Applied Economics – Johns Hopkins University
– Masters in Computational Social Science – George Mason University
• Previously worked as:
– Senior Data Scientist for big data platform vendor
– Lead Data Scientist for consulting company
2#UnifiedDataAnalytics #SparkAISummit
Agenda
• Systems, Policies, Complexity and Modeling
• Markov Decision Processes for Policy evaluation
• Using PySpark for MDP modeling
• Example
• Demo
• Summary
3#UnifiedDataAnalytics #SparkAISummit
Systems & Policies
• A system is a group or combination of interrelated,
interdependent, or interacting elements
– Systems have purposes or goals
– Policies are created to achieve desired outcomes
• A policy is a combination of principles that are created to
guide decisions and achieve rational outcomes
• Policies that lead to ideal outcomes for a system are some
of the most difficult challenges facing decision makers
within an organization.
4#UnifiedDataAnalytics #SparkAISummit
Uncertainty Impacting Policy
• A complex system is where each of the entities may be
perfectly understood, but the behavior of the system as a
whole cannot necessarily be predicted
• Complex systems do not provide perfect information and
never achieve equilibrium
• Uncertainty and non-rational logic can lead to emergent
behavior that policies can’t always account for
5#UnifiedDataAnalytics #SparkAISummit
Complex System Modeling
• Agent-Based Modeling
– Schelling’s Segregation Model
– Sugarscape
• Game Theory
– Prisoners Dilemma
– Texas Hold’em Poker
• Discrete Event Simulation
– A clinical diagnosis
– Traffic accidents
• Markov Decision Process
– Traveling Salesman
6#UnifiedDataAnalytics #SparkAISummit
Markov Decision Process
7#UnifiedDataAnalytics #SparkAISummit
Evaluate all Strategies and Outcomes
8#UnifiedDataAnalytics #SparkAISummit
Source Screenrant
Markov Decision Process
• Framework for modeling decisions
• A Markov Process describes the state of a system
• When there is a possibility of making a decision (action)
from a list of possible decisions it becomes a Markov
Decision Process
• Often applied in:
– Energy Grid Optimization
– Economic planning
– Logistics
– Risk Management
– Robotics
9#UnifiedDataAnalytics #SparkAISummit
Why PySpark for MDP?
“The power of intelligence stems from the our vast diversity, not
from any single, perfect principle.”
- Marvin Minsky, 1986. The Society of Mind.
• Efforts to accurately represent real world problems has highlighted the
inability for a single all encompassing model (one state-action space for
one objective) to scale
• Spark provides a distributed computing engine for scaling data analysis
• MDPs are simulations, they create a large amount of data that is used to
identify optimal processes
10#UnifiedDataAnalytics #SparkAISummit
Performing MDP in PySpark
• MDPs are run using the Spark resilient distributed dataset
(RDD)
• Allowing for the ability to map functions to specific
environments through key-value attributes
• Each row in the RDD is an independent entity that does not
interact with other entities, only with the policy and states
11
Agents
• Agents are the entities interacting with the environment
by executing certain actions, taking observations, and
receiving eventual rewards
• Goal is to identify optimal behavior based on policy
parameters
• Behavior is often a transition done in a sequential manner:
1. Decision is made
2. Action is performed
3. Outcome is evaluated
4. New decision is made
12#UnifiedDataAnalytics #SparkAISummit
Generating Agents from Existing Data
If you had an existing dataset of
projects and you were going to run
What-If analysis on what would be
the optimal schedule based on things
like equipment availability, cost or
external market factors.
In this case, each line[x] would be
mapped to to a column in our data
frame that is converted into an RDD.
13#UnifiedDataAnalytics #SparkAISummit
class Projects_Agent:
def __init__(self, line):
self.project_id = line[0]
self.start_date = line[1]
self.cur_date = line[2]
self.labor = line[3]
self.equipment = line[4]
self.week_prod = line[5]
self.ex_prod = line[6]
self.num_weeks = line[7]
self.weekly_labor_costs = line[8]
self.weekly_equip_costs = line[9]
self.active = True
def initialize_project(line):
proj = Projects_Agent (line)
return proj
Generating Agents using Parameters
Agents can be generated using data
as attributed parameters. This
allows for standing up boundaries of
behavior that the agents can
transition through based on policy
decisions.
14#UnifiedDataAnalytics #SparkAISummit
class Agent:
def __init__(self, row):
self.id = row[0]
def create_agents(row):
agent = Agent(row)
agent.car_type_index = random.uniform(0,1)
agent.car_type = 'gas'
agent.car_loan = random.randint(0,30000)
agent.avg_car_payment = loan_payment
agent.annual_depreciation = 0.10
agent.number_of_payments = 0
agent.personal_property_tax = .04
Actions and State Transitions
15#UnifiedDataAnalytics #SparkAISummit
def policy_per_agent(row):
agent = row
credit = 2000
if agent.car_type == 'gas' and agent.avg_car_payment >= 0:
if agent.transportation_costs == agent.transportation_savings:
switched_to_ev_vehicle(agent, credit)
else:
pass
return agent
def switched_to_ev_vehicle(row, credit):
agent = row
agent.car_type = 'ev'
agent.car_loan = 40000 - credit
agent.avg_car_payment = 500
agent.car_value = 35000
agent.gas_price = 0.00
agent.gallons = 0
agent.monthly_refuels = 0.0
agent.percentage_time_express_lanes = 1.00
agent.tolls_paid = 0
agent.commute_time = 30
agent.commuting_costs = 150.00
return agent
Specify actions and transitions with
RDD transformation functions.
Executing the MDP
16#UnifiedDataAnalytics #SparkAISummit
Create an MDP Function
that executes the actions
and transitions
def run_mdp(row, time, policy):
mdp_data = []
agent = create_agents(row)
initialize_agent_attributes(agent)
apply_mdp_using_policy(agent, time, mdp_data, policy)
return mdp_data
Instantiate the number of agents needed and convert to RDD.
Apply function via flatMap()
car_agents = 50000
agentRDD = spark.createDataFrame(zip(range(1, car_agents + 1)), ["driver_id"]).rdd
t = 36
policy = 1
mdp_results = agentRDD.flatMap(lambda x:run_mdp(x,t,policy)).toDF()
Example
17#UnifiedDataAnalytics #SparkAISummit
Electronic Vehicles and Toll Lanes
• A local government enacted policy to reduce vehicle
congestion during periods of the day when commuters are
on their way to and from work
• To reduce congestion along key routes toll lanes where put
place to alleviate congestion and speed up commutes
• The toll lanes are free for electronic vehicle commuters
• Commuters who drive gas powered vehicles can use the tolls
but the tolls increase the more cars that merge onto the toll
lanes
18
Use Case
• As more commuters switch to electronic vehicles, the toll
lanes are increasingly becoming more congested leading to
longer commute times
• Could the incentives put in place by the policy makers have
led to changes in commuter behavior at a faster pace than
what was originally planned?
19
Agents
• The agents in this example are commuters
– Approximately 10% drive electronic vehicles
– Among the commuters that drive gas vehicles
• 50% have paid off their vehicles
• 50% have an more payments to make
20
State
• Each month the commuter evaluates the current state of
transportation costs vs transportation savings
• Commuters in gas vehicles show preference for short term
rewards associated with:
– Lower car loan payments or no payments
– Lower property taxes
• Commuters in EVs show preference for long term rewards
associated with:
– Increased savings due to no tolls or gas
21
Actions
• If the commuter uses an electronic vehicle:
– Has ability to switch to an EV if the costs associated with
transportation meet a threshold where the short term benefits
of low or zero monthly payments no longer outweigh the
savings associated with purchasing an EV
22
Policies
• Policy makers are evaluating updates to their commuter
policy.
• The policies under consideration are:
A. Remove the price credit awarded to new EV owner thereby
increasing cost of ownership
B. Remove the price credit awarded to new EV owner and toll EV
commutes, but at a lower rate than gas vehicle commuters
C. Toll EV commuters at lower rate but provide the price credit for
new purchases
23
Optimization Algos for MDPs
• Value Iteration Method
– Discrete time method
– Start from some state, S, and respond to transitions according to stated policy for a horizon
of N time periods, update an estimate of the optimal value repeatedly
• Policy Iteration
– 2 Steps:
1. Value Determination - arbitrarily selecting an initial policy P and then calculate
marginal utility
2. Policy Improvement - a better policy is selected and the value determination step is
repeated
• Linear Programming
– Identify the minimum and maximum value of a function subject to a set of constraints
24
Optimization for this Example
• This example will use the Policy Iteration
– Set of states is defined and static
– There are simultaneous calculations for actions
– Infinite horizon
• Evaluate results for optimal result
25
Experiment
Walkthrough
26
Additional Considerations
• Discounting was included but was static
• Transition probabilities may not stay the same over time
• Did the policies choose the right agent attributes to subject to
actions and transitions?
• Adding random percentage of commuters who switch to EV
from gas vehicles regardless of financial impact
27
Future Project Goals
• Leverage Deep Learning frameworks for additional
optimization for each agent
• Considering each agent is looking to achieve best results,
are those results the best for the group?
• How can we share information between epochs to distribute
information
– In a distributed environment this is very challenging
– Possibly just by agents in each partition -> local information sharing
28
Thank You
#UnifiedDataAnalytics #SparkAISummit
https://github.com/JustinBurg
Code for the simulations can be found on github
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Analysis of covariance
Analysis of covarianceAnalysis of covariance
Analysis of covariancemikko656
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
Sumit Das
 
Correlationanalysis
CorrelationanalysisCorrelationanalysis
Correlationanalysis
Libu Thomas
 
Anatomy of an econometric modelling (1)
Anatomy of an econometric modelling (1)Anatomy of an econometric modelling (1)
Anatomy of an econometric modelling (1)
Jai Dewan
 
In Anova
In  AnovaIn  Anova
In Anova
ahmad bassiouny
 
Descriptive statistics and use of excel
Descriptive statistics and use of excelDescriptive statistics and use of excel
Descriptive statistics and use of excel
EhtishamAliHussain
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
Dinesh Chaurasiya
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
James Neill
 
Commonly used statistical tests in research
Commonly used statistical tests in researchCommonly used statistical tests in research
Commonly used statistical tests in researchNaqeeb Ullah Khan
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
ATHUL RAVI
 
Journal club presentation
Journal club presentationJournal club presentation
Journal club presentation
Ravilla Jyothsna Naidu
 
Statistical tests for data involving quantitative data
Statistical tests for data involving quantitative dataStatistical tests for data involving quantitative data
Statistical tests for data involving quantitative data
Rizwan S A
 
DIstinguish between Parametric vs nonparametric test
 DIstinguish between Parametric vs nonparametric test DIstinguish between Parametric vs nonparametric test
DIstinguish between Parametric vs nonparametric test
sai prakash
 
Bureau statistics-shamim-rafique
Bureau statistics-shamim-rafiqueBureau statistics-shamim-rafique
Bureau statistics-shamim-rafique
abdulrehman saeed
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the data
Ken Plummer
 
Statistical methods for the life sciences lb
Statistical methods for the life sciences lbStatistical methods for the life sciences lb
Statistical methods for the life sciences lbpriyaupm
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
Setia Pramana
 

What's hot (20)

Analysis of covariance
Analysis of covarianceAnalysis of covariance
Analysis of covariance
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Correlationanalysis
CorrelationanalysisCorrelationanalysis
Correlationanalysis
 
Anatomy of an econometric modelling (1)
Anatomy of an econometric modelling (1)Anatomy of an econometric modelling (1)
Anatomy of an econometric modelling (1)
 
In Anova
In  AnovaIn  Anova
In Anova
 
Descriptive statistics and use of excel
Descriptive statistics and use of excelDescriptive statistics and use of excel
Descriptive statistics and use of excel
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Commonly used statistical tests in research
Commonly used statistical tests in researchCommonly used statistical tests in research
Commonly used statistical tests in research
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
Journal club presentation
Journal club presentationJournal club presentation
Journal club presentation
 
Data analysis
Data analysisData analysis
Data analysis
 
Statistical tests for data involving quantitative data
Statistical tests for data involving quantitative dataStatistical tests for data involving quantitative data
Statistical tests for data involving quantitative data
 
DIstinguish between Parametric vs nonparametric test
 DIstinguish between Parametric vs nonparametric test DIstinguish between Parametric vs nonparametric test
DIstinguish between Parametric vs nonparametric test
 
Bureau statistics-shamim-rafique
Bureau statistics-shamim-rafiqueBureau statistics-shamim-rafique
Bureau statistics-shamim-rafique
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the data
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Statistical methods for the life sciences lb
Statistical methods for the life sciences lbStatistical methods for the life sciences lb
Statistical methods for the life sciences lb
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Correlation
CorrelationCorrelation
Correlation
 

Similar to Using PySpark to Scale Markov Decision Problems for Policy Exploration

Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdfChapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
AndresBelloAvila
 
How to optimise renewables & energy storage
How to optimise renewables & energy storageHow to optimise renewables & energy storage
How to optimise renewables & energy storage
Iain Beveridge
 
Using Demand Side Management to Support Electricity Grids
Using Demand Side Management to Support Electricity GridsUsing Demand Side Management to Support Electricity Grids
Using Demand Side Management to Support Electricity Grids
Leonardo ENERGY
 
Using Demand-Side Management to Support Electricity Grids
Using Demand-Side Management to Support Electricity GridsUsing Demand-Side Management to Support Electricity Grids
Using Demand-Side Management to Support Electricity Grids
Leonardo ENERGY
 
AITPM Risk and Governance
AITPM Risk and GovernanceAITPM Risk and Governance
AITPM Risk and Governance
JumpingJaq
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
QuantUniversity
 
Designing Data Products
Designing Data ProductsDesigning Data Products
Designing Data Products
Vassilis Protonotarios
 
Management information system prepared by samena
Management information system prepared by samenaManagement information system prepared by samena
Management information system prepared by samena
samena shawon
 
Cscmp 2014 technology and transportation—creating, deploying, and benefiting...
Cscmp 2014  technology and transportation—creating, deploying, and benefiting...Cscmp 2014  technology and transportation—creating, deploying, and benefiting...
Cscmp 2014 technology and transportation—creating, deploying, and benefiting...
Matt Douglass
 
Multiple Criteria for Decision
Multiple Criteria for DecisionMultiple Criteria for Decision
Multiple Criteria for Decision
Subhash sapkota
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
SEIN Advanced Rate Design Project
SEIN Advanced Rate Design ProjectSEIN Advanced Rate Design Project
SEIN Advanced Rate Design Project
Storn White
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
STEP_scotland
 
Multicriteria and cost benefit analysis for smart grid projects
Multicriteria and cost benefit analysis for smart grid projectsMulticriteria and cost benefit analysis for smart grid projects
Multicriteria and cost benefit analysis for smart grid projects
Leonardo ENERGY
 
Automated Data Mining for Everyone
Automated Data Mining for EveryoneAutomated Data Mining for Everyone
Automated Data Mining for Everyone
Exponea
 
Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2
Sanjeev Deshmukh
 
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
ProfDrAnbalaganChinn
 
Supply chain managemen1
Supply chain managemen1Supply chain managemen1
Supply chain managemen1Rohit Dhaware
 
GridMAP: Next generation energy analysis tools.
GridMAP: Next generation energy analysis tools.GridMAP: Next generation energy analysis tools.
GridMAP: Next generation energy analysis tools.
Iain Beveridge
 
Parametric Estimation in a nutshell
Parametric Estimation in a nutshellParametric Estimation in a nutshell
Parametric Estimation in a nutshell
Planisware
 

Similar to Using PySpark to Scale Markov Decision Problems for Policy Exploration (20)

Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdfChapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
 
How to optimise renewables & energy storage
How to optimise renewables & energy storageHow to optimise renewables & energy storage
How to optimise renewables & energy storage
 
Using Demand Side Management to Support Electricity Grids
Using Demand Side Management to Support Electricity GridsUsing Demand Side Management to Support Electricity Grids
Using Demand Side Management to Support Electricity Grids
 
Using Demand-Side Management to Support Electricity Grids
Using Demand-Side Management to Support Electricity GridsUsing Demand-Side Management to Support Electricity Grids
Using Demand-Side Management to Support Electricity Grids
 
AITPM Risk and Governance
AITPM Risk and GovernanceAITPM Risk and Governance
AITPM Risk and Governance
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Designing Data Products
Designing Data ProductsDesigning Data Products
Designing Data Products
 
Management information system prepared by samena
Management information system prepared by samenaManagement information system prepared by samena
Management information system prepared by samena
 
Cscmp 2014 technology and transportation—creating, deploying, and benefiting...
Cscmp 2014  technology and transportation—creating, deploying, and benefiting...Cscmp 2014  technology and transportation—creating, deploying, and benefiting...
Cscmp 2014 technology and transportation—creating, deploying, and benefiting...
 
Multiple Criteria for Decision
Multiple Criteria for DecisionMultiple Criteria for Decision
Multiple Criteria for Decision
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
SEIN Advanced Rate Design Project
SEIN Advanced Rate Design ProjectSEIN Advanced Rate Design Project
SEIN Advanced Rate Design Project
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
 
Multicriteria and cost benefit analysis for smart grid projects
Multicriteria and cost benefit analysis for smart grid projectsMulticriteria and cost benefit analysis for smart grid projects
Multicriteria and cost benefit analysis for smart grid projects
 
Automated Data Mining for Everyone
Automated Data Mining for EveryoneAutomated Data Mining for Everyone
Automated Data Mining for Everyone
 
Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2
 
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
4. PAE AcFn621Ch-4a Project Alaysis and Selection.ppt
 
Supply chain managemen1
Supply chain managemen1Supply chain managemen1
Supply chain managemen1
 
GridMAP: Next generation energy analysis tools.
GridMAP: Next generation energy analysis tools.GridMAP: Next generation energy analysis tools.
GridMAP: Next generation energy analysis tools.
 
Parametric Estimation in a nutshell
Parametric Estimation in a nutshellParametric Estimation in a nutshell
Parametric Estimation in a nutshell
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Using PySpark to Scale Markov Decision Problems for Policy Exploration

  • 1. Justin Brandenburg Machine Learning Architect @ Databricks Using PySpark to scale Markov Decision Problems for Policy Exploration #UnifiedDataAnalytics #SparkAISummit
  • 2. About Me • Member of the Professional Services team at Databricks • Background in economics, cyber analytics and IT • Based in Washington, DC USA • Education: – Bachelors in Economics – Virginia Tech – Masters in Applied Economics – Johns Hopkins University – Masters in Computational Social Science – George Mason University • Previously worked as: – Senior Data Scientist for big data platform vendor – Lead Data Scientist for consulting company 2#UnifiedDataAnalytics #SparkAISummit
  • 3. Agenda • Systems, Policies, Complexity and Modeling • Markov Decision Processes for Policy evaluation • Using PySpark for MDP modeling • Example • Demo • Summary 3#UnifiedDataAnalytics #SparkAISummit
  • 4. Systems & Policies • A system is a group or combination of interrelated, interdependent, or interacting elements – Systems have purposes or goals – Policies are created to achieve desired outcomes • A policy is a combination of principles that are created to guide decisions and achieve rational outcomes • Policies that lead to ideal outcomes for a system are some of the most difficult challenges facing decision makers within an organization. 4#UnifiedDataAnalytics #SparkAISummit
  • 5. Uncertainty Impacting Policy • A complex system is where each of the entities may be perfectly understood, but the behavior of the system as a whole cannot necessarily be predicted • Complex systems do not provide perfect information and never achieve equilibrium • Uncertainty and non-rational logic can lead to emergent behavior that policies can’t always account for 5#UnifiedDataAnalytics #SparkAISummit
  • 6. Complex System Modeling • Agent-Based Modeling – Schelling’s Segregation Model – Sugarscape • Game Theory – Prisoners Dilemma – Texas Hold’em Poker • Discrete Event Simulation – A clinical diagnosis – Traffic accidents • Markov Decision Process – Traveling Salesman 6#UnifiedDataAnalytics #SparkAISummit
  • 8. Evaluate all Strategies and Outcomes 8#UnifiedDataAnalytics #SparkAISummit Source Screenrant
  • 9. Markov Decision Process • Framework for modeling decisions • A Markov Process describes the state of a system • When there is a possibility of making a decision (action) from a list of possible decisions it becomes a Markov Decision Process • Often applied in: – Energy Grid Optimization – Economic planning – Logistics – Risk Management – Robotics 9#UnifiedDataAnalytics #SparkAISummit
  • 10. Why PySpark for MDP? “The power of intelligence stems from the our vast diversity, not from any single, perfect principle.” - Marvin Minsky, 1986. The Society of Mind. • Efforts to accurately represent real world problems has highlighted the inability for a single all encompassing model (one state-action space for one objective) to scale • Spark provides a distributed computing engine for scaling data analysis • MDPs are simulations, they create a large amount of data that is used to identify optimal processes 10#UnifiedDataAnalytics #SparkAISummit
  • 11. Performing MDP in PySpark • MDPs are run using the Spark resilient distributed dataset (RDD) • Allowing for the ability to map functions to specific environments through key-value attributes • Each row in the RDD is an independent entity that does not interact with other entities, only with the policy and states 11
  • 12. Agents • Agents are the entities interacting with the environment by executing certain actions, taking observations, and receiving eventual rewards • Goal is to identify optimal behavior based on policy parameters • Behavior is often a transition done in a sequential manner: 1. Decision is made 2. Action is performed 3. Outcome is evaluated 4. New decision is made 12#UnifiedDataAnalytics #SparkAISummit
  • 13. Generating Agents from Existing Data If you had an existing dataset of projects and you were going to run What-If analysis on what would be the optimal schedule based on things like equipment availability, cost or external market factors. In this case, each line[x] would be mapped to to a column in our data frame that is converted into an RDD. 13#UnifiedDataAnalytics #SparkAISummit class Projects_Agent: def __init__(self, line): self.project_id = line[0] self.start_date = line[1] self.cur_date = line[2] self.labor = line[3] self.equipment = line[4] self.week_prod = line[5] self.ex_prod = line[6] self.num_weeks = line[7] self.weekly_labor_costs = line[8] self.weekly_equip_costs = line[9] self.active = True def initialize_project(line): proj = Projects_Agent (line) return proj
  • 14. Generating Agents using Parameters Agents can be generated using data as attributed parameters. This allows for standing up boundaries of behavior that the agents can transition through based on policy decisions. 14#UnifiedDataAnalytics #SparkAISummit class Agent: def __init__(self, row): self.id = row[0] def create_agents(row): agent = Agent(row) agent.car_type_index = random.uniform(0,1) agent.car_type = 'gas' agent.car_loan = random.randint(0,30000) agent.avg_car_payment = loan_payment agent.annual_depreciation = 0.10 agent.number_of_payments = 0 agent.personal_property_tax = .04
  • 15. Actions and State Transitions 15#UnifiedDataAnalytics #SparkAISummit def policy_per_agent(row): agent = row credit = 2000 if agent.car_type == 'gas' and agent.avg_car_payment >= 0: if agent.transportation_costs == agent.transportation_savings: switched_to_ev_vehicle(agent, credit) else: pass return agent def switched_to_ev_vehicle(row, credit): agent = row agent.car_type = 'ev' agent.car_loan = 40000 - credit agent.avg_car_payment = 500 agent.car_value = 35000 agent.gas_price = 0.00 agent.gallons = 0 agent.monthly_refuels = 0.0 agent.percentage_time_express_lanes = 1.00 agent.tolls_paid = 0 agent.commute_time = 30 agent.commuting_costs = 150.00 return agent Specify actions and transitions with RDD transformation functions.
  • 16. Executing the MDP 16#UnifiedDataAnalytics #SparkAISummit Create an MDP Function that executes the actions and transitions def run_mdp(row, time, policy): mdp_data = [] agent = create_agents(row) initialize_agent_attributes(agent) apply_mdp_using_policy(agent, time, mdp_data, policy) return mdp_data Instantiate the number of agents needed and convert to RDD. Apply function via flatMap() car_agents = 50000 agentRDD = spark.createDataFrame(zip(range(1, car_agents + 1)), ["driver_id"]).rdd t = 36 policy = 1 mdp_results = agentRDD.flatMap(lambda x:run_mdp(x,t,policy)).toDF()
  • 18. Electronic Vehicles and Toll Lanes • A local government enacted policy to reduce vehicle congestion during periods of the day when commuters are on their way to and from work • To reduce congestion along key routes toll lanes where put place to alleviate congestion and speed up commutes • The toll lanes are free for electronic vehicle commuters • Commuters who drive gas powered vehicles can use the tolls but the tolls increase the more cars that merge onto the toll lanes 18
  • 19. Use Case • As more commuters switch to electronic vehicles, the toll lanes are increasingly becoming more congested leading to longer commute times • Could the incentives put in place by the policy makers have led to changes in commuter behavior at a faster pace than what was originally planned? 19
  • 20. Agents • The agents in this example are commuters – Approximately 10% drive electronic vehicles – Among the commuters that drive gas vehicles • 50% have paid off their vehicles • 50% have an more payments to make 20
  • 21. State • Each month the commuter evaluates the current state of transportation costs vs transportation savings • Commuters in gas vehicles show preference for short term rewards associated with: – Lower car loan payments or no payments – Lower property taxes • Commuters in EVs show preference for long term rewards associated with: – Increased savings due to no tolls or gas 21
  • 22. Actions • If the commuter uses an electronic vehicle: – Has ability to switch to an EV if the costs associated with transportation meet a threshold where the short term benefits of low or zero monthly payments no longer outweigh the savings associated with purchasing an EV 22
  • 23. Policies • Policy makers are evaluating updates to their commuter policy. • The policies under consideration are: A. Remove the price credit awarded to new EV owner thereby increasing cost of ownership B. Remove the price credit awarded to new EV owner and toll EV commutes, but at a lower rate than gas vehicle commuters C. Toll EV commuters at lower rate but provide the price credit for new purchases 23
  • 24. Optimization Algos for MDPs • Value Iteration Method – Discrete time method – Start from some state, S, and respond to transitions according to stated policy for a horizon of N time periods, update an estimate of the optimal value repeatedly • Policy Iteration – 2 Steps: 1. Value Determination - arbitrarily selecting an initial policy P and then calculate marginal utility 2. Policy Improvement - a better policy is selected and the value determination step is repeated • Linear Programming – Identify the minimum and maximum value of a function subject to a set of constraints 24
  • 25. Optimization for this Example • This example will use the Policy Iteration – Set of states is defined and static – There are simultaneous calculations for actions – Infinite horizon • Evaluate results for optimal result 25
  • 27. Additional Considerations • Discounting was included but was static • Transition probabilities may not stay the same over time • Did the policies choose the right agent attributes to subject to actions and transitions? • Adding random percentage of commuters who switch to EV from gas vehicles regardless of financial impact 27
  • 28. Future Project Goals • Leverage Deep Learning frameworks for additional optimization for each agent • Considering each agent is looking to achieve best results, are those results the best for the group? • How can we share information between epochs to distribute information – In a distributed environment this is very challenging – Possibly just by agents in each partition -> local information sharing 28
  • 30. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT