PyData Global 2023 talk overviewing case studies in network science, including stock market crash prediction, food price pattern mining, and stopping the spread of epidemics.
Gen AI in Business - Global Trends Report 2024.pdf
Hands-On Network Science, PyData Global 2023
1. Hands-On Network
Science
Colleen M. Farrelly, Post Urban Ventures
Yae Ulrich Gaba, Quantum Leap Africa
Franck Kalala Mutumbo, University of Lubumbashi
2. • Set with defined relationships across
items in the set
• Examples:
• People connected to each other on
social media
• Geographic areas connected by
animal migration patterns
• Stocks connected by buyer behavior
• Goods connected by supply chains
• Ideas connected semantically
3. Network Structures
• Hubs
• Densely-connected regions
• Tight-knit friends groups, cities with
many international flights, watering
holes where many animals
congregate
• Bridges
• Connections between regions
• Individuals that span many social
groups, manufacturers that provide
common parts to many industries,
common food sources for many
types of animals
4. • More computationally feasible for many data
science problems than traditional approaches
• Spatial regression vs. network science for change point
detection
• Time series methods vs. network science methods
• Nice visual representation of data and algorithms
• Many deep connections to mathematics
• Topology
• Geometry
• Dynamic systems
7. Case 1:
Epidemic
Spread Data
Collected dataset of friendships
Static relationships (no changes over
time)
Represents medical school friendships
and veterans’ group friendships
Theoretical disease spread to predict
severity within network and strategies
to prevent disease spread
8. Case 1: Epidemic
Spread Methods
• SIR model
• System of differential
equations
• Adapted for connectivity of
network
• Forman-Ricci curvature
• Geometric measurement of
centrality
• Removal of highest-ranked
vertex (highest risk for
epidemic spread)
9. Case 2: Stock Market
Prediction Problem
• American stock exchange crash
forecasting
• Change-point problem in time series
analytics
• Caveats of non-stationary data
• Difficult to model time series data at scale
10. Case 2: Stock Market
Prediction Data
• Apple, Alphabet, Nvidia, and Microsoft
• 8/19/2004-4/1/2020
• Periodic trends of constant growth, crashes, and accelerated growth that
sometimes overlap across stocks (and sometimes doesn’t!)
11. Case 2: Stock Market
Prediction Methods
• Overlapping time windows
• Thresholded correlation
networks
• Changes in Forman-Ricci
curvature, betweenness
centrality, PageRank
centrality, and degree
centrality to assess risk
12. • Predicting millet price in markets
across Burkina Faso
• Impacted by supply chain and
global trends (COVID 19, war in
Ukraine…)
• Spatiotemporal aspects
• Computational cost of spatial
regression models
• Non-stationarity
13. • Quarterly millet prices
• Time period of 2015
(Quarter 2) to 2022
(Quarter 2)
• 45 administrative
provinces (averaged
market prices)
14. Case 3: Food Pricing
Methods
• Overlapping time windows
• Local Moran statistic thresholding to
create network
• PageRank and Forman-Ricci curvature
centrality to assess risk
15. • Benefits of network science
approaches
• Computational feasibility
• Easy visualizations
• Interpretable results
• Future directions
• Spatiotemporal data applications
• Temporal data applications
• Scaling of problems