1. Government 3.0
The Tools: Big Data and Open Data
Michael Holland
February 27, 2013
1
2. The CUSP Partnership
• The University Partners:
– NYU, NYU-Poly, Univ. of Toronto, Warwick
University, CUNY, IIT-Bombay, Carnegie Mellon
University,
• The Industrial Partners:
– IBM, Cisco, Xerox, ConEdison, [Lutron,] National
Grid, Siemens, ARUP, IDEO, AECOM
• City and State Agency Partners:
– NYC Agencies, MTA, Port Authority
• National Laboratories:
– [Lawrence Livermore National Laboratory, Los
Alamos National Laboratory, Sandia National
Laboratories, Brookhaven National Laboratory]
A diverse set of other organizations have
expressed interest in joining the partnership
2
3. Big data can be brought to bear on
societal issues
• Sensing/transmission/storage
/analysis capabilities growing
rapidly
• How can you “instrument
society”?
• What do you want to know?
• How can you find out?
• What could you do with the
information?
– Descriptive, predictive
• Greenhouse Gas Treaty
Verification methodology is an
example of this
• Fuse surveys, direct measurements,
proxies to independently verify GHG
emissions
4. What does it mean to instrument a city?
Infrastructure Environment People
Condition, operations Meteorology, pollution, Relationships, location,
noise, flora, fauna economic /communications
activities, health, nutrition,
opinions, …
Properly acquired, integrated, and analyzed, data can
•Take government beyond imperfect understanding
– Better (and more efficient) operations, better planning, better policy
•Improve governance and citizen engagement
•Enable the private sector to develop new services for
governments, firms, citizens
•Enable a revolution in the social sciences
5. Urban Data Sources
• Organic data flows
– Administrative records (census, permits, …)
– Transactions (sales, communications, …)
– Operational (traffic, transit, utilities, health system, …)
• Sensors
– Personal (location, activity, physiological)
– Fixed in situ sensors
– Crowd sourcing (mobile phones, …)
– Choke points (people, vehicles)
• Opportunities for “novel” sensor technologies
– Visible, infrared and spectral imagery
– RADAR, LIDAR
– Gravity and magnetic
– Seismic, acoustic
– Ionizing radiation, biological, chemical
– …
7. 10
8
Percent
4
2
06 Building Energy Use
0 100 200 300 400 500
Current Weather Normalized Source Energy Intensity (kBtu/Sq. Ft.)
Source EUI, Multi-Family Buildings Source EUI, Office Buildings
D. Hsu and C. Kontokosta, NYC Local Law 84 Benchmarking Report, 2012
8. Some Sensor Stats: United States
• 300 million mobile phones; 494,151 cell towers
• Approximately 400,000 ATMs record video of all
transactions
• 30 million commercial surveillance cameras
• 4,214 red-light cameras; 761 speed-trap cameras
• A third of large police forces equip patrol cars with
automatic license plate-readers that can check 1,000
plates per minute
Source: Wall Street Journal (January 3, 2013) – “In Privacy Wars, It’s iSpy vs. gSpy”
9. Visualization of TLC GPS Data
Drop-off
Pick-up
Most drop-off’s occur
on the avenues, most
pick-up’s on the streets
Lauro Lins, Fernando Chirigati, Nivan Ferreira,Claudio Silva and Juliana Freire - NY- Poly
(Data obtained from TLC on June 6th, 2012)
9
11. Cell Tower Records for Traffic Analysis
Wang, P., Hunter, T., Bayen, A.M., Schechtner, K. & Gonzalez, M.C.
Understanding Road Usage Patterns in Urban Areas. Nature, Sci. Rep. 2, 1001; DOI:10.1038/srep01001(2012).
12. Urban Observatory
• Provisioned urban vantage point(s)
– MetroTech (1 MT and 388 Bridge St)
– 277 Park Ave (at 47th Street)
– Governor's Island
• Suite of bore-sighted instruments
– Photometric and colorimetric optical imaging
– Broad-band IR imaging (SWIR, MWIR, and thermal?)
– Hyperspectral imaging (trace gases)
– LIDAR (building motions, pollution)
– Radar (building /street vibrations, building motion, traffic flow)
• Correlative data on the urban scenes
– Meteorology (temperature, winds, visibility)
– Scene geometry (distances, directions, identities of features visible)
– Parcel and land use data, building characteristics and activities,
building utility consumptions, and real estate valuation data
– In situ pollution data and location/nature of major sources
– In situ vehicle and pedestrian traffic for the streets visible
– Demographic and economic data
• Capability to archive, process, and analyze data acquired
– Image processing chains
– Data warehouse, GIS, Visualization tools
– Software and procedures to enhance privacy protection
• Personnel and funding to create and operate the above
14. Manhattan in the Thermal IR
199 Water Street
Built 1993 :: 998,000 sq ft
electricity, natural gas, steam
LEED Certified
Photo by Tyrone Turner/National Geographic
Other synoptic modalities: Hyperspectral, RADAR, LIDAR, Gravity, Magnetic, …
15. Quantified Community
• Fully instrument a slice of the city
– 10-100k people within 20 blocks of MetroTech or
a new development
– Create a well-characterized test bed for
technologies/policies and behavioral
interventions
• What constitutes “complete instrumentation”?
– In situ vs. choke points vs. synoptic?
– Acoustic/traffic/mobile
phones/video/IR/magnetic/CBRN/…
– Economic data? Physiological data? Nutrition? …
• How to fully engage people who live/work in the community to provide data,
participate in citizen science, create educational opportunities, …?
– Foster improved quality of life: “cleanest/greenest/healthiest/most livable /…”
– “I’ll show you the parking spaces …”
– ???
• What might we expect to learn?
15
16. What can cities do with the data?
• Optimize operations
– traffic flow, utility loads, services delivery, …
• Monitor infrastructure conditions
– bridges, potholes, leaks, …
• Infrastructure planning
– zoning, public transit, utilities
• Improve regulatory compliance (“nudges”, efficient enforcement)
• Public health
– Nutrition, epidemiology, environmental impacts
• Abnormal conditions
– Hazard detection, emergency management
• Data-driven formulation of data-driven policies and investments
– Road pricing and congestion charging, time-of-day power, …)
• Better inform the citizenry
• Enhance economic performance and competitiveness
17. Among the projects we’re considering
• Normalization, interoperability of city data sets
• 3D Urban GIS capability
• Multi-data correlations to improve city resource
allocation
• Noise / Temperature / Pollution
• Mobility
• Novel sensing of public health
• Building efficiency
• Living Lab definition
17
18. Privacy Issues
• Privacy issues are structural - you can’t study society
without studying people at some level
• People will voluntarily give up their data if they can see
a personal or societal benefit
– Social networks, voltstats.net, …
• Norms/expectations are changing with generations
• There are technical fixes for multi-level
privacy/classification
• Privacy is eroding in any event and we should do our
best to ensure it is done sensibly
• We don’t yet know what the optimal level of privacy is
for studies of interest
18
20. Context, Context, Context
Society
Societal Demands
Political Defense
(Macro) Energy
Economic Security
Health
Agency Environment
(Corporate) Food/Water
Discovery
Research VALUE
Program
(Competitive)
Scientific
Disciplines
Opportunities
AMO, bio, nano,
NP, EPP, Astro
cosmology
MERIT
21. One Systematic Evaluation Process:
OMB/OSTP R&D Investment Criteria
Quality Relevance Performance
[1] Mechanism of
Award (e.g., 10 CFR “Top N”
605) Planning & Milestones
Prospective [2] Justification of Prioritization:
funding distribution (5 < N < 10)
among classes of Strategy
performers
[1] Expert reviews of Evaluation of
successes and utility of R&D Report on
Retrospective failures results to both “Top N”
[2] Information on field and Milestones
major awards broader “users”
Advisory GPRA-style
Committees & NAS Annual Metrics
22.
23. Roles of “Data”
• Scientific Understanding: Data improves unbiased explanation
of natural or social phenomena
• Administrative Action: Data ensures that Agencies
transparently exercise their delegated authorities in a fashion
that is not "arbitrary and capricious, an abuse of discretion, or
otherwise not in accordance with the law."
• Legal or Political Action: Data as a tool for adjudicating
disputes, i.e., winning contests and seeing one’s priorities
implemented.
24. Is USG Robust Against “Big Data?”
[T]he median Congressional district is now about five points Republican-leaning relative
to the country as a whole. Why this asymmetry? It’s partly because Republicans created
boundaries efficiently in redistricting and partly because the most Democratic districts in
the country, like those in urban portions of New York or Chicago, are even more
Democratic than the reddest districts of the country are Republican, meaning there are
fewer Democratic voters remaining to distribute to swing districts.
“As Swing Districts Dwindle, Can a Divided House Stand?”
Nate Silver, NYT, Dec 27, 2012
Animated (on clicks), added information on 199 Water St
Added: …data-driven policies “and investments” Added: “Enhance economic performance and competitiveness) Corrected fonts (heading) Notes: Masoud: extreme event analytics, interdependencies Constantine: investments – how new projects are funded, tax increment financing & tax revenue
Political Level (President, Congress) How does the science benefit society? (jobs, economy, defense,…) How does this alleviate/placate constituent concerns? (budget growth!) How has the program been managing and performing? What have we gotten for our investment to date? Agency Head/ Department Secretary Level How does the agency mission address administration priorities? How does the science further the mission of the agency? How does the science impact or strengthen other programs or related activities across the Government? How has the program been managing and performing? What have we gotten for our investment to date? Competitive Environment (Program Level) How does the program further agency mission and administration priorities? How does science advance the program’s objectives? How does the science impact or strengthen other programs or related activities across the Government? How has the program been managing and performing? What have we gotten for our investment to date? Internal Environment (Portfolio Balance)