This document discusses predictive analytics for transportation in a high dimensional heterogeneous data world. It covers several topics:
1) The transportation world is generating large amounts of high dimensional data from sources like cameras, GPS, cell phones, and probe vehicles that needs to be combined and analyzed.
2) Connected and automated vehicles will generate huge amounts of detailed data on vehicle movements, passenger activities and intentions that can be used to infer travel patterns and predict crashes.
3) Applying machine learning, advanced computation and domain knowledge is necessary to make sense of this non-standardized, high volume data.
Gen AI in Business - Global Trends Report 2024.pdf
Predictive Analytics for Transportation in a High Dimensional Heterogeneous Data World
1. Predictive Analytics for
Transportation in a High Dimensional
Heterogeneous Data World
Dr. Chandra Bhat
Center for Transportation Research
The University of Texas at Austin
Acknowledgments: D-STOP, Humboldt Award, Dr. Ram Pendyala, Dr. Kostas Goulias,
all my graduate/undergraduate students
2. World of high dimensional
heterogeneous data
• Providing accurate traffic
information is becoming an
imperative
• Cameras, GPS, cell phone tracking,
and probe vehicles are used to
supplement the information
provided by conventional
measurement systems.
• Methodologies to combine and
aggregate high dimensional
heterogeneous data are needed
3. Connected/Automated Vehicles (CAVs)
and big data
• The car of the near future will be a part of a gigantic data-collection
engine.
• Vehicles have embedded
computers,
GPS receivers,
short-range wireless network interfaces,
in-car sensors,
cameras, and
internet.
• Vehicles interact with
Roadside wireless sensor networks,
passenger’s wireless devices, and
other cars.
4. Data required to keep a CAV safely on
the road
• Highly detailed maps information:
Shape and elevation of roadways,
lane lines,
intersections,
crosswalks,
speed limits, and
traffic signals.
• Position, speed and intentions of other vehicles and pedestrians.
• Position, speed and intentions of other road users, such as…
jaywalking pedestrians,
cars coming out of hidden driveways,
a stop sign held up by a crossing guard, and
cyclist making signs of riding intent.
5. What can be inferred from CAVs and
smartphones data
• Where people drive,
• when people drive,
• what route people take,
• where people stop,
• what people put in their car,
• why, how and when people take decisions on the fly and change
their activity plan route, and
• detailed crashes data (speed, position, and intention at the moment
of the accident).
6. Data Science
• Not enough humans to process
• Machine learning, visualization, and
advanced computation techniques
• Statistics, social sciences, and domain knowledge
• High-dimensional heterogeneous data
9. Five Pillars of ABM Design
• Based on sound behavioral theory/paradigm
• Computationally feasible and tractable
– Model estimation
– Model implementation
• Optimal use of available data (present and future)
• ABM should be both an Activity-Based Model and an Agent-Based
Model
• Sensitive to policy issues and planning applications of interest
10. Behavioral Basis of ABM
• Decision hierarchies and choice processes
– A variety of behavioral decision structures possible
– Virtually all models assume a sequential decision structure
similar to traditional four-step models for computational
convenience
• Considerable evidence of simultaneity in behavioral choice
mechanisms
– Several choices made simultaneously as a lifestyle package
11. Behavioral Basis of ABM
• Examples of simultaneous choice packages
– Residential location, vehicle ownership, mode to pre-
planned activities (e.g., work)
– Activity type, activity duration, and activity timing
(scheduling)
• Behavioral heterogeneity
– Differences in choice processes across market segments
– Identify market segments both exogenously and
endogenously (latent market segments)
13. Agent Interactions
I have a client
meeting today;
so I will take the
car
I have to pick up
Jane from School
and go shopping
later; I need the
car.
My meeting is in the
morning. I can pick up
Jane from school today.
And we can go shopping
together in the evening. OK, that sounds
good. I’ll go
ahead and take
light rail today to
work. See you
later.
Hey, Mom and
Dad, don’t forget;
you have to drop
me off at Johnny’s
house in the
evening today
Don’t worry Jane; we’ll
drop you off on the way
to the store and pick you
up later. Run along now,
you’ll miss the bus.
14. Definition of an Activity
• Disaggregate activity purpose definition
– Challenge traditional notion of mandatory and discretionary
activities/trips
– Movie, ball game, and child’s tennis lesson or soccer game often
have spatial and/or temporal fixity
– Characterize activities and trips by level of spatial and temporal
fixity/constraints (besides purpose)
– Can be accomplished using concepts of time-space
geography
– Automated method to add attributes describing degrees of
freedom according to set of spatial/temporal fixity criteria to
activity records in data set
15. Central Role of Time Use
• Notion of time is central to activity-based modeling
– Explicit modeling of activity durations (daily activity time allocation and
individual episode duration)
– Treat time as “continuous” and not as “discrete choice” blocks
• Activity engagement is the focus of attention
– Travel patterns are inferred as an outcome of activity participation and
time use decisions
– Continuous treatment of time dimension allows explicit consideration of
time constraints on human activities
• Reconcile activity durations with network travel durations (feedback
processes)
16. In Summary
• ABM should…
– Capture the central role of activities, time, and space
in a continuum
– Explicitly recognize constraints and interactions
– Represent simultaneity in behavioral choice processes
– Account for heterogeneity in behavioral decision
hierarchies
– Incorporate feedback processes to facilitate
integration with land use and network models
• SimAGENT does it all and more…
17. SIMAGENT (SIMULATOR OF ACTIVITIES,
GREENHOUSE GAS EMISSIONS, ENERGY,
NETWORKS, AND TRAVEL):
AN OVERVIEW
20. CEMDAP – The Core ABM in SimAGENT
Socio-Economic Data
PopGen
CEMSELTS
CEMDAP
• Simulates activity schedule and
travel characteristics for each
individual of the region
• Core module of SimAGENT
• 52 sub-models.
• Developed by UT Austin
21. Features of CEMDAP (continued)
• Changes in the activity-travel pattern of one individual in a
household may bring about changes in activity-travel patterns
of other household members
• MDCEV approach facilitates modeling activity participation at a
household level with joint activity participation incorporated in
a simple fashion
– MDCEV – Multiple Discrete Continuous Extreme Value
econometric choice modeling method
• Includes a model of household vehicle ownership by type and
make/model, and primary driver assignment
23. Joint Activities and Household Interactions
MDCEV Model
• Most activity based models accommodate activity type choice
as a series of models for each individual in the household
• These approaches do not explicitly recognize that activity
participation is a collective decision of household members
• MDCEV approach – simple and relatively inexpensive for
modeling activity participation at a household level
• SimAGENT now features MDCEV modeling methodology to
capture household-level activity participation
24. Joint Activities and Interactions
MDCEV Model
• Conventional discrete choice frameworks need to generate
mutually exclusive alternatives results in an explosion in
the number of alternatives
• MDCEV allows us to tackle the problem by considering
activity participation as a household decision
• MDCEV offers substantial computational and behavioral
advantages
– Employ one model to generate activities
– Accommodate substitution/complementarity in activity participation
and household member dimensions
26. MDCEV Model
A1 P1 A1 P2 A1 P1P2
A2 P1 A2 P2 A2 P1P2
Each box
represents an
alternative
None+
Alternatives - Total 7alternatives versus 64in traditional case
28. Vehicle Type Choice Simulation Component
• Vehicle type choice determines vehicle fleet mix; critical to
energy and emissions analysis
• SimAGENT incorporates joint vehicle type choice and primary
driver allocation model which jointly determines:
– Multiple vehicle holdings
– Body type (Sub-compact, Compact car, Mid-sized car, Large car,
Small SUV, Mid-sized SUV, Large SUV, Van, and Pickup)
– Age (Less than 2 years old, 2 to 3 years old, 4 to 5 years old, 6 to
9 years old, 10 to 12 years old, Older than 12 years)
– Make/model and use (miles)
– Primary driver of each vehicle
29. Vehicle Holdings and Use
Vehicle
Type/
Vintage
33 makes/models
21 makes/models
24 makes/models
25 makes/models
7 makes/models
10 makes/models
23 makes/models
19 makes/models
16 makes/models
12 makes/models
13 makes/models
13 makes/models
23 makes/models
15 makes/models
12 makes/models
23 makes/models
12 makes/models
5 makes/models
6 makes/models
15 makes/models
Coupe Old
Sedan Mid-size New
Sedan Mid-size Old
Sedan Compact Old
Sedan Mini/Subcompact New
Sedan Mini/Subcompact Old
Coupe New
Sedan Compact New
Sedan Large Old
Sedan Large New
Minivan Old
Pickup Truck New
SUV New
SUV Old
Hatchback/Station Wagon New
Hatchback/Station Wagon Old
Pickup Truck Old
Van New
Van Old
Minivan New
Non-motorized vehicles
31. Portable & Flexible Software Architecture
ODBC
Run-Time Data Objects
Household
Person
Zone Data
LOS Data
Pattern
Tour
Stop
Output Files
Simulation
Coordinator
Modeling Modules
…
.
.
.
Decision to Work Model
Work Start/End Time
model
Input
Database
Application
Driver
Data Queries
Zone to Zone
Data
Coordinator
32. Ability to Integrate and Enhance
• Successfully interfaced with
– Multi-period static assignment (the current four-step
approach of SCAG)
– TRANSIMS and MATSim (second by second assignment of
people and vehicles on networks), and
• Continuous-time evolutionary framework facilitates real-time
dynamic integration of ABM and DTA models
33. • SimAGENT is successfully implemented in the LA region
• Existing SimAGENT code (CEMDAP, PopGen, CEMSELTS) is
open source
• Being implemented currently in the New York region; selected
based on behavioral realism and ability to accommodate CAVs
• Elements of system being used for long distance travel
modeling by CDOT; UT-Austin working with CDOT
35. Why joint modeling of data is important?
• Borrows information on other outcomes
• Able to answer intrinsically multivariate questions, such as the
effect of a covariate on a multidimensional outcome
• Obviates the need for multiple tests and facilitates global tests,
offering superior testing power and better control of Type 1 error
rates
• If some endogenous outcomes are used to explain other
endogenous outcomes, and if the outcomes are not modeled
jointly, the result can be inconsistent estimation of the effects of
one endogenous outcome on another.
• Problem? Mixed data, high-dimensional data
36. A Way-Out
• The new Spatial Generalized Heterogeneous Data Model (GHDM);
Bhat (2015)
• Correlation across various dimensions (of the dependent variables)
are captured using latent constructs.
• Accommodates all possible types of data (dependent variables).
• Dimension of integration is independent of number of latent
constructs.
• Bhat’s Maximum Approximate Composite Marginal Likelihood
(MACML) estimation approach is used for estimation of GHDM.