Multimedia on the Mountaintop: Using
Public Snow Images to Improve Water
Systems Operation
A. Castelletti, R. Fedorov, P. Fraternali,
M. Giuliani Politecnico di Milano, Italy
ACM MM 2016, Amsterdam
BNI session
The (hopefully brave new) idea
• There is a lot of multimedia content out there,
produced by
– People
– Ground sensors
• There are many environmental problems that
lack affordable and accessible input data
• Question: is public web visual content good
enough to help in such environmental
problems?
Observing the earth
• Not everything can be done from above
• There is not a single satellite product good for all
• (Useful) satellite products are costly
• Clouds may be a problem
The grand challenge: water scarcity
• Climate change, urban concentration and agriculture
put water resources under stress
• Predicting future availability is key
• When you have mountains, water is stored as snow
UK_WATER SUPPLY UTILITY
15 million customers
2.6 Gl/day drinking water
3 billion $ revenue (2013-14)
The content
Input
• User generated
– 700.000 Flickr images
crawled so far within 300x160
km
• Sensor generated
– 2000 webcams queried every
minute (10 – to 1500 images
per web cam per day)
– More than 10M images
crawled so far
Output
• Virtual Snow Indexes:
numerical time series that
are a proxy of the quantity
of water stored in the snow
pack (Snow Water
Equivalent – SWE)
The multimedia pipelines
• Differences
– Web cam images have high temporal density, UG
images have broader spatial coverage
– UG photos searched by keywords may be irrelevant,
webcam images always portrait mountains
– UG photo mountain classifier already discards bad
weather images
UG Image relevance
• 7000 images randomly sampled and used for a
crowdsourcing experiment: “Do you see a
mountain in this picture?”
• Classifier trained (94% precision, 96.3% recall)
Webcam image enhancement
Remove/attenuate:
• Variability of illumination
• Shadows
• People & irrelevant objects
Daily median image
Mountain peak identification
orginal image edge maps
skyline estimation
DEM generated
virtual panoram
VCC best matching
Snow mask extraction
Snow classification
at the pixel level
Snow mask
extraction
Snow Virtual Indexes
The case study
• Regulation of mountain inflow dependent lakes
Lake Como
Hydropower reservoir
Power plant
Como city
Penstock
River Adda
River Adda
Legend
Lario
Lario catchment
River
Irrigated area
0 10 20 30 40 505
Kilometers
Catchment area
Lake Como 4500 km2
Reservoirs
Lake Como 247 Mm3
Alpine HP 545 Mm3
Stakeholders
Farmers:
irrigated area 1400 km2
Floods:
lake and downstream
….
Local folklore
Formalization: 2 objectives optimization
• Decide the daily lake outflow (
lake level)
• So to
– Maximize water for downstream
irrigation
– Minimize # of flood days
• Respecting
– Minimum outflow requirement for
ecological preservation of effluents
• Based on
– Policy input (X)
• Regulator's policies
– Baseline: regulator only considers
lake level and day of year
– Upper bound: regulator knows the
water that will be available (lake
inflow) in the future
– P_x: regulator knows partial
information (x) on the water that
will be available (lake inflow) in the
future
• What is X?
– P1: Official snow water equivalent
data estimated from Region
Lombardy
– P2: virtual snow indexes from
nearby mountain images
– P3: official SWE data + virtual snow
indexes
PS: Upper bound policy can be calculated retrospectively for the past,
where you know how much water you actually got day by day
Assessment method
Select information
based on its
expected value
(Iterative
Input Selection)
Design control
policy based on
selected input
information
Quantify
performance of
policy + selected
information
Quantify value of
perfect
information
Expected Value of
Perfect Information (EVPI)
Inflow data series Outflow data series
Baseline
policy Upper
bound
policy
Input
data
series
(exogenous
variables)
Most
Valuable
Information
(X)
X_informed
control
policy
(P_x)
J(P_x)
Performance of
P_x
Performance metrics
Hyper Volume Indicator
(HV)
Performance
improvement
over baseline
(ΔHV)
Assessment results
Thank you & … see you soon in
the PlayStore
Content processing pipeline
• Photo contains/does not contain mountain landscape
binary classifier
– SVM with Dense SIFT, Spatial Histograms. 7k annotated
images (majority of 3 votes). 95.1% Accuracy on balanced
dataset.
• Peak identification / Photo orientation estimation
– Ad-hoc algorithm with edge extraction and vector cross-
correlation. 160 images manually aligned w.r.t. Digital
Elevation Model. 75-81% of images correctly aligned
(depending on weather conditions).
• Pixel-wise snow/non snow classifier
Random Forest, trained/evaluated on 60 manually segmented
images (single annotator) for a total of 7M of labeled pixels. 91%
accuracy.
Iterative input selection
Select information
based on its
expected value
(Iterative
Input Selection)
Design control
policy based on
selected input
information
Quantify
performance of
policy + selected
information
Quantify value of
perfect
information
Expected Value of
Perfect Information (EVPI)
Inflow data series Outflow data series
Baseline
policy Upper
bound
policy
Input
data
series
(exogenous
variables)
Most
Valuable
Information
(X)
X_informed
control
policy
(P_x)
J(P_x)
Performance of
P_x
Performance metrics
Hyper Volume Indicator
(HV)
Performance
improvement
over baseline
(ΔHV)
D=distance metric
Policy search
Select information
based on its
expected value
(Iterative
Input Selection)
Design control
policy based on
selected input
information
Quantify
performance of
policy + selected
information
Quantify value of
perfect
information
Expected Value of
Perfect Information (EVPI)
Inflow data series Outflow data series
Baseline
policy Upper
bound
policy
Input
data
series
(exogenous
variables)
Most
Valuable
Information
(X)
X_informed
control
policy
(P_x)
J(P_x)
Performance of
P_x
Performance metrics
Hyper Volume Indicator
(HV)
Performance
improvement
over baseline
(ΔHV)
Good decisions matter
WATER DEFICIT
FLOOD THRESHOLD
EFFECT OF REGULATION
For more info
• A. Castelletti, R. Fedorov, P. Fraternali, M. Giuliani:
name.surname@polimi.it
• http://snowwatch.polimi.it/

Multimedia on the mountaintop: presentation at ACM MM2016

  • 1.
    Multimedia on theMountaintop: Using Public Snow Images to Improve Water Systems Operation A. Castelletti, R. Fedorov, P. Fraternali, M. Giuliani Politecnico di Milano, Italy ACM MM 2016, Amsterdam BNI session
  • 2.
    The (hopefully bravenew) idea • There is a lot of multimedia content out there, produced by – People – Ground sensors • There are many environmental problems that lack affordable and accessible input data • Question: is public web visual content good enough to help in such environmental problems?
  • 3.
    Observing the earth •Not everything can be done from above • There is not a single satellite product good for all • (Useful) satellite products are costly • Clouds may be a problem
  • 4.
    The grand challenge:water scarcity • Climate change, urban concentration and agriculture put water resources under stress • Predicting future availability is key • When you have mountains, water is stored as snow UK_WATER SUPPLY UTILITY 15 million customers 2.6 Gl/day drinking water 3 billion $ revenue (2013-14)
  • 5.
    The content Input • Usergenerated – 700.000 Flickr images crawled so far within 300x160 km • Sensor generated – 2000 webcams queried every minute (10 – to 1500 images per web cam per day) – More than 10M images crawled so far Output • Virtual Snow Indexes: numerical time series that are a proxy of the quantity of water stored in the snow pack (Snow Water Equivalent – SWE)
  • 6.
    The multimedia pipelines •Differences – Web cam images have high temporal density, UG images have broader spatial coverage – UG photos searched by keywords may be irrelevant, webcam images always portrait mountains – UG photo mountain classifier already discards bad weather images
  • 7.
    UG Image relevance •7000 images randomly sampled and used for a crowdsourcing experiment: “Do you see a mountain in this picture?” • Classifier trained (94% precision, 96.3% recall)
  • 8.
    Webcam image enhancement Remove/attenuate: •Variability of illumination • Shadows • People & irrelevant objects Daily median image
  • 9.
    Mountain peak identification orginalimage edge maps skyline estimation DEM generated virtual panoram VCC best matching
  • 10.
    Snow mask extraction Snowclassification at the pixel level Snow mask extraction
  • 11.
  • 12.
    The case study •Regulation of mountain inflow dependent lakes Lake Como Hydropower reservoir Power plant Como city Penstock River Adda River Adda Legend Lario Lario catchment River Irrigated area 0 10 20 30 40 505 Kilometers Catchment area Lake Como 4500 km2 Reservoirs Lake Como 247 Mm3 Alpine HP 545 Mm3 Stakeholders Farmers: irrigated area 1400 km2 Floods: lake and downstream ….
  • 13.
  • 14.
    Formalization: 2 objectivesoptimization • Decide the daily lake outflow ( lake level) • So to – Maximize water for downstream irrigation – Minimize # of flood days • Respecting – Minimum outflow requirement for ecological preservation of effluents • Based on – Policy input (X) • Regulator's policies – Baseline: regulator only considers lake level and day of year – Upper bound: regulator knows the water that will be available (lake inflow) in the future – P_x: regulator knows partial information (x) on the water that will be available (lake inflow) in the future • What is X? – P1: Official snow water equivalent data estimated from Region Lombardy – P2: virtual snow indexes from nearby mountain images – P3: official SWE data + virtual snow indexes PS: Upper bound policy can be calculated retrospectively for the past, where you know how much water you actually got day by day
  • 15.
    Assessment method Select information basedon its expected value (Iterative Input Selection) Design control policy based on selected input information Quantify performance of policy + selected information Quantify value of perfect information Expected Value of Perfect Information (EVPI) Inflow data series Outflow data series Baseline policy Upper bound policy Input data series (exogenous variables) Most Valuable Information (X) X_informed control policy (P_x) J(P_x) Performance of P_x Performance metrics Hyper Volume Indicator (HV) Performance improvement over baseline (ΔHV)
  • 16.
  • 17.
    Thank you &… see you soon in the PlayStore
  • 18.
    Content processing pipeline •Photo contains/does not contain mountain landscape binary classifier – SVM with Dense SIFT, Spatial Histograms. 7k annotated images (majority of 3 votes). 95.1% Accuracy on balanced dataset. • Peak identification / Photo orientation estimation – Ad-hoc algorithm with edge extraction and vector cross- correlation. 160 images manually aligned w.r.t. Digital Elevation Model. 75-81% of images correctly aligned (depending on weather conditions). • Pixel-wise snow/non snow classifier Random Forest, trained/evaluated on 60 manually segmented images (single annotator) for a total of 7M of labeled pixels. 91% accuracy.
  • 19.
    Iterative input selection Selectinformation based on its expected value (Iterative Input Selection) Design control policy based on selected input information Quantify performance of policy + selected information Quantify value of perfect information Expected Value of Perfect Information (EVPI) Inflow data series Outflow data series Baseline policy Upper bound policy Input data series (exogenous variables) Most Valuable Information (X) X_informed control policy (P_x) J(P_x) Performance of P_x Performance metrics Hyper Volume Indicator (HV) Performance improvement over baseline (ΔHV) D=distance metric
  • 20.
    Policy search Select information basedon its expected value (Iterative Input Selection) Design control policy based on selected input information Quantify performance of policy + selected information Quantify value of perfect information Expected Value of Perfect Information (EVPI) Inflow data series Outflow data series Baseline policy Upper bound policy Input data series (exogenous variables) Most Valuable Information (X) X_informed control policy (P_x) J(P_x) Performance of P_x Performance metrics Hyper Volume Indicator (HV) Performance improvement over baseline (ΔHV)
  • 22.
    Good decisions matter WATERDEFICIT FLOOD THRESHOLD EFFECT OF REGULATION
  • 23.
    For more info •A. Castelletti, R. Fedorov, P. Fraternali, M. Giuliani: name.surname@polimi.it • http://snowwatch.polimi.it/

Editor's Notes

  • #20 Several techniques can be used to solve this feature selection problem [11], such as cross-correlation analysis, mutual information analysis, or input variable selection methods. We use the hybrid model-based/model-free Iterative Input Selection (IIS) algorithm (Algorithm 1), which can approximate strongly non-linear functions and scale to large datasets made of long time series and many candidate variables [11]. Given a generic output variable vo and the set of candidate inputs vi, IIS first ranks the inputs w.r.t. a statistical measure of significance and adds the best performing input v to the current set of selected variables V. This step avoids the inclusion of redundant variables: after an input is selected, all the other inputs highly correlated with it will rank low in the next iterations. Then, the algorithm estimates a model of vo with input V, such that v0 = ^m(V), and estimates the model performance with a distance metric D (e.g., the coefficient of determination) as well as the model residuals (vo - ^m(V)), which become the new output at the next iteration. The algorithm stops when the next best input variable selected is already in the set V, or when overfitting conditions are reached. Among the many alternative model classes, IIS relies on extremely randomized trees (Extra-Trees), a tree-based method proposed by [12] that was empirically demonstrated to outperform other models in terms of modeling flexibility, efficiency, and scalability with respect to the input dimensionality. Moreover, Extra-Trees structures can be exploited to infer the relative importance of variables, as required for their ranking [3].
  • #21 After selecting the most valuable information It t, the next step is to design the Informed Control Policy (ICP) that exploits such information to make decisions. The ICP is dened by extending the input zt of the baseline control policy with the selected information, i.e., zt = (t; lt; It), and searching the optimal control policy with approximate dynamic programming methods. We use the evolutionary multi-objective direct policy search (EMODPS), a simulationbased technique that combines direct policy search, nonlinear approximating networks, and multi-objective evolutionary algorithms [13]. EMODPS exploits the parameterization of the control policies p and explores the parameter space to nd a policy (p ) that optimizes the expected system performance (J, conventionally assumed to be a cost), i.e., p = arg minp J where the policy p is parameterized by parameters 2 and the problem is constrained by the dynamics of the system. Finding p is equivalent to nding the corresponding optimal policy parameters . A tabular version of the EMODPS method is illustrated in Algorithm 2. In general, we expect the ICP to ll the performance gap between the upper and lower bound solutions (i.e., the PCP and BCP), and to produce a performance JICP as close as possible to JPCP . The benet associated to the use of the selected information is called Expected Value of Sample In- formation (EVSI) and can be quantied by means of the same metrics used for the evaluating the EVPI (see Section 5.1).