The U.S. Department of Commerce collects, processes and disseminates data on a range of issues that impact our nation. Having a host of data and ensuring that this data is open and accessible to all are two separate issues. This session will cover the Commerce Data Usability Project (CDUP) - a community-driven public-private partnership to help data scientists, programmers and other users to access open knowledge from our open data.
15. 15
The Team
• data engineers, data scientists and technologists
• user-centered design, lean, agile, open.
Data Products Data Academy Data Science
CodeConf LA 2016 July 27 - 29
16. 16
designed for impact
+14
data science
development
education
operations
in-residence
leadership
CodeConf LA 2016 July 27 - 29
28. 28
how to collaborate
•a novel analysis or question posed to the data
•visually arresting graphics & engagement with the
public
•open, free code and data for the public to use
CodeConf LA 2016 July 27 - 29
It is a honor to be here.
I have been a fan of Git and Github for a very long time. Mention DOC announcement.
My talk today is focused on the sort of collaborations that a platform like Github enables.
However, I will be focusing specifically on collaborations that involve the Federal government, specifically the US Department of Commerce.
But first, you are probably wondering who am I and why the hell am I standing in front of you speaking?
Abstract:
The U.S. Department of Commerce collects, processes and disseminates data on a range of issues that impact our nation. Having a host of data and ensuring that this data is open and accessible to all are two separate issues. This session will cover the Commerce Data Usability Project (CDUP) - a community-driven public-private partnership to help data scientists, programmers and other users to access open knowledge from our open data.
I am Jamaican – and I know the organizers were trying to round out their Caribbean diversity component.
I started coding in Basic and Assembly. Yes, yes, I know. I am ancient.
Luckily, I learned and built solutions in everything from Cobol, Prolog, Lisp, Fortan, Java, C and C++.
More importantly, I love Latin dancing and I am an impact junkie – I work on projects that have a high probability of having a positive impact on the daily lives of people.
And of course, this led me to the White House and now to Department of Commerce.
I remember the exact moment when I realized how significant the impact of the Department of Commerce was on my every day.
I was eight days into my current job as the Deputy Chief Data Officer of the Department and …..
Jeff Chen (right there, who is truly one of the most brilliant data scientists of our time) and I were doing an evaluation of the data assets of the 12 bureaus of the Department.
There is a lot of diversity in the data at Commerce.
And it dawned on us, that the Department had made several important contributions to some of the most critical things that we use today.
Take the cell phone.
Moment of honesty: How many of you would have a panic attack if I told you that you could never have a cell phone ever again?
The material standards for manufacturing rely on standards from NIST - The National Institute of Standards and Technology.
The Intellectual Property for the technology on this device is safeguarded by the USPTO - Patent and Trademark Office.
Your weather app relies at some point on data collected by NOAA - National Oceanic and Atmospheric Administration.
Your stock app will show the impact of the GDP statistical release from the BEA - Bureau of Economic Analysis.
Telecommunications and spectra on these devices will most likely be influenced by NTIA - National Telecommunications and Information Administration.
The components in the devices are part of trade as advocated by the ITA - International Trade Administration.
The way an app or product is positioned geographically most likely relies on Census Bureau data.
The startups that create software for this device have either directly or indirectly accessed resources from the EDA (Economic Development Administration) or MBDA (Minority Business Development Agency).
When you examine it, the bureaus not only perform functions that are critical to consumer products, but also collect data that ranges from information on surface temperatures on the sun to acoustic emanations at the floor of our oceans.
When it comes to the Open Data Sets available on data.gov, the Department provides a significant portion of it - 36%
However, our own internal calculations indicate that only 1% of our data
Recognizing that a small proportion of our data is used to fuel multi-billion dollar industries.
USD 1.5 Billion in 2015.
Imagine what is possible when the other 90% is unlocked.
The Secretary of the Department, Penny Pritzker, did something that, at the time, no other federal agency had ever contemplated.
She made Data an integral part of the Department's Strategic plan.
Nov 9th, 2015: Publicly launched the Commerce Data Service.
Focused on helping the data initiatives of the twelve bureaus
Specifically, we help our 12 Bureaus to rapidly create and develop projects to advance the Department's mission.
We are an Agile, Lean & User-Centered tiger team that 1) built data products and services, 2) provided data education , 3) developed data science insight to improve operations.
January 2015: We hired 13 data scientists, data engineers and front-end engineers
Jun 2015: The team grew to over 30 members
One of the first things that the Commerce Data Service took on as a project was :
How do we meaningfully work with our users to unlock the untapped data resources within the Department?
We called it the Commerce Data Usability project.
We started with 4 data guides that provided the necessary context and code (all hosted on Github)
We show how to use RStudio, leaflet.js, and Google Charts to visualize hail data from NOAA’s SWDI
We use Python, Gephi, d3.js, and dimple.js to show how to process and include computer security vulnerability data from NIST to understand the patterns in security bug dissemination and patch release.
We use R + Plotly to unpack satellite data from NOAA and demonstrate that it is a proxy for human activity.
Hinting at possible other uses.
If you are a company wanting to know the Census tract where particular americans are, who meet a certain criteria.
We show how to find the right customers for your non-profit or for-profit business using the American Community Survey (ACS) from the US Census Bureau.
Python and leaflet.js
The previous tutorials were internal collaborations.
We worked with Census, NOAA and NIST.
We recognize that this has to be a community-driven initiative.
We have over 20 data guides ”in flight”; collaborating with 12 partners.
We have a few examples of data-driven public-private collaboration.
Zillow worked with us, using R, to create data guides that showcase the affordability of housing for different job categories, such as firemen, in various neighborhoods in the US.
MapBox bash, rasterio, GDAL, gribdoctor
Atmospheric Rivers (AR) are narrow regions in the atmosphere that transport water across the world.
Like waterways on the ground, ARs are wide ranging in size, with the ability to hold vast amounts of water.
When ARs slow and stall, vulnerable areas are at risk of heavy, damaging rainfalls and floods. Alternatively, the more common, weaker ARs bring much needed rain to resupply water reserves.
Earth Genome bash, rasterio, GDAL, gribdoctor
The Dow Chemical facility in Freeport, Texas is the largest chemical manufacturing complex in the Western Hemisphere. The site produces billions of dollars of product each year and employs thousands of people. To do this, the complex requires a lot of fresh water, drawing 100,000 gallons per minute from the Brazos River.
By looking at topological data from NOAA, this guide shows how DOW could save hundreds of millions of dollars of water infrastructure investment to a more socially optimal allocation -- a solution that encourages environmental benefits without detracting from financial benefits.
Using Tableau, PTO
The first patent in the United States was issued on July 31, 1790 to one Samuel Hopkins for an improvement in the making of pot ash.
This guide explores the most innovative sectors in each state.
Use the tutorials
Create issues and PRs – help us improve the current set.
Collaborate with us to create your own tutorial
This is your community.