Open RaleighAmong other things, Open Raleigh is an Open Data Portal.The timeline for the development of Open Raleigh started in February 2012 and continues through today.TimelineOpen Data Resolution passed by Raleigh City Council in February, 2012Open Knowledge Foundation’s Open Data Handbook used as foundation of Open Data Resolution March, 2012Socrata discussed and advocated by city staff as the preferred vendor for the City of Raleigh during council meeting April, 2012Open Data Program Manager position created June, 2012Open Data Program Manager hired September 2012Open Data Program Charter and Strategy in place by December 2012Open Data Strategy presented to IRMC January 2013Open Data Portal Beta Launch March 2013Raleigh’s Open Data Portal is a civic engagement toolThe original Open Raleigh charter had elements of transparency as well as accessibility and the “democratization” of data. The current program charter labels transparency as one of the inevitable outcomes of Open Raleigh.Open data and access to the data will create a way for citizens to understand the city’s story through the visualization of data. These visualizations can also be created by citizens. This is what we mean when we say “democratization”Open data is a public asset. Data is like a streetlamp or any other city asset. It can be used to enhance the quality of life for Raleigh citizensOpen Raleigh is concerned with opening data to the public as a service, a utility and as “infrastructure”
Open Data is not a new concept: open standards and government/private partnerships have been around a long time.Open data is a philosophy around a deceptively complex way of accessing and standing up data sets for pubic consumption. Currently there is no globally greed upon standard or even scope of what “open data” means. This lack of standards and interoperability affects the ability for different agencies to work together or even reuse code and effort expended during an open data initiative.Open data has its origins in the Freedom of Information Act which was enacted shortly after World War II. Linked data was the reason Tim Berners Lee evangelized HTML and the WWW starting in the early 90’s. The “semantic web” movement 10 years ago can also be seen as a precursor to the Obama Administration’s 2008 campaign pledge to be more transparent (after the election in 2009 Data.gov was launched). We will be talking about this claim and the website a little later.Previous efforts made government data available via several protocols. Protocols and standards have given rise to the foundations of a new knowledge based economy. Some previous successful examples are listed below.TCP/IP: 1982 was the first published standard. TCP/IP became the foundation for the Internet and was part of a federally funded series of projects.FTP: 1985 Current RFC released and allowed for the transfer of binaries and text files between clients and servers.NNTP: 1986 RFC published and Usenet was established to allow for distributed discussion boards on a wide range of topics.HTTP: 1991 First RFC published. Led to World Wide Web, W3C consortium overseas all standards concerning devices and applications using the HTTP protocol.Standards and government/private partnerships have a deep rich history. Open data eventually needs to evolve to allow for the globalization of a definition/RFC and schematic/semantic specifications to become truly mainstream.Data available for various government verticalsWeather (NOAA)GPS Data via satellite feedsWhat other data is the next big thing?Open Data today is different than in the past (the past meaning 12 months ago)An expectation of “open by default” (White House Memorandum M-13-13). An increasing expectation of machine readability By citizensBy businessBy internal government data consumersThe emerging ambiguity of Open Data vs. Open GovernmentOpen Government Data became popular during the 2008 presidential campaignThe mission of open data has changed from a transparency model to one of accessibility and “ data as infrastructure”Policy Issue:The New Ambiguity of “Open Government” by Harlan Yu and David RobinsonOpen data represents a primary change in pubic policy to make more government data available in machine readable format.Open machine readable format allow for the reuse of data by others.Policy initiatives the promote “open data” are often cast in political terms as “open government”The ambiguity can affect both open data and transparency initiatives.The Open Government Partnership signed by the US and seven other countries in 2011 uses open data as a catalyst for data sharing with political accountability.Is open data the same as open government? It is not. The OGP as implemented in the United States is very different than how it is implemented in the EU.Payment gateways limit access to transparency information both in the US and in the EU.Policy Issue: When Transparency and Collaboration Collide: The USA Open Data ProgramIn 2005 Hurricane Katrina was seen as much a communications and data sharing failure as much as a policy failurePresident Obama, after the 2009 inauguration, suggested the OGP as one solution to the agency data sharing problem.First full day in office the Whitehouse OMB unleashed a barrage on Federal Agencies to “free the data”.Initially Data.gov was a failure. Passive aggressive attitudes on the part of agencies, unfunded mandate during the financial downturn in the US economy and in US tax revenues/Data.gov The goals of open data have emerged as a debate between transparency advocates and those promoting data accessibility and utility.
The OKFN and the Sebastopol protocols of 2007 Define Open Government Data in slightly different ways but the gist is belowGovernment Open Data shall be considered open if the data are made public in a way that complies with the principles below:1. Data Must Be Complete2. Data Must Be Primary3. Data Must Be Timely4. Data Must Be Accessible5. Data Must Be Machine “Readable”6. Access Must Be Non-Discriminatory7. Data Formats Must Be Non-Proprietary8. Data Must Be License-free Corollaries to the 8 Principles Compliance must be reviewable.A contact person must be designated to respond to people trying to use the data. A contact person must be designated to respond to complaints about violations of the principles. So then what specifically defines open data? I would argue there is a 9th principle of Open Government Data: AuthorityAuthority and acknowledgement of the data steward that the data being accessed is representative and approved to be as accurate as possible. This requires the data to be accessed from an authoritative source. Data is from a source (A public sector agency website or a website with an acknowledgement that the data has been approved as accurate by a public sector agency).Does this negate any of the above principles with regard to accessibility or licensing? No. It ensures that the data being used in an API or downloaded is accurate and timely. Annotations on Sebastapol Open Government Data Principles:Additional NotesData Most Be Complete:While non-electronic information resources, such as physical artifacts, are not subject to the Open Government Data principles, it is always encouraged that such resources be made available electronically to the extent feasible.Data Must Be Primary: Primary data is an important aspect of compliance with the Open Government Data principles. All too often, audio, video, and images are only made available at low resolution to Internet user, making the data impossible to use in any professional application. The choice of an appropriate "low" resolution format yesterday begins to look unusable by the standards of today. If an entity chooses to transform data by aggregation or transcoding for use on an Internet site built for end users, it still has an obligation to make the full-resolution information available in bulk for others to build their own sites with and to preserve the data for posterity. Just as one should not destroy information by presenting and preserving only low-resolution imagery, numeric or tabular data should not be aggressively aggregated for use in one particular Internet application at the cost of throwing public information that could be used. The determination of what is an acceptable level of granularity to present and preserve is a moving target and should be based on best practices of the time, with a heavy bias towards "more is better.“Data Must Be Timely: What is reasonable depends on the nature of the data set. As an example, when the data is a record of ongoing events, is relevant to current policy debate, or is otherwise time sensitive, a delay of more than one month is not acceptable. On the other hand, geographic data collected for purposes independent of any current policy debate, for example, may releasing data periodically in bulk. Newly updated complete data sets should be provided in a timely manner as well. Time-sensitive data sets should be updated at the same frequency with which the data changes. When individual records change, notices of the changes should also be made timely available. Despite the forgoing, if data is not released in a timely manner because of technical constraints, that is not a reason to continue delaying release. Better late than never!Data Must Be Accessible: Data must be made available on the Internet so as to accommodate the widest practical range of users and uses. This means considering how choices in data preparation and publication affect access to the disabled and how it may impact users of a variety of software and hardware platforms. Data must be published with current industry standard protocols and formats, as well as alternative protocols and formats when industry standards impose burdens on wide reuse of the data, and this includes honoring handicapped-accessibility initiatives.If the data is retrievable from a Web interface, there must be some straightforward means of exporting it (flattening it) to be inspected in raw form directly, downloaded, and imported into other tools. Data is not accessible if it can be retrieved only through navigating web forms, or if automated tools are not permitted to access it because of a robots.txt file or other statement of policy.Data Must Be Machine Readable: The ability for data to be widely used requires that the data be properly encoded. Free-form text is not a substitute for, e.g., tabular and normalized records. Images of text are not a substitute for the text itself. Sufficient documentation on the data format and meanings of normalized data items must be available to users of the data.Following the principle that data must be accessible, the accessibility must extent to automated access. If the data is accessible from some kind of interface, it must be possible to download the complete data set in raw form through an automated process.Access Must Be Non-Discriminatory: Anonymous access to the data must be allowed for public data, including access through anonymous proxies. Data should not be hidden behind "walled gardens," accessible only to certain classes of Internet users. To use analogies from earlier periods of the Internet, data only accessible via AOL, Internet 2, or Bloomberg would be considered to be presented in a discriminatory manner. This principle reiterations some of the goals of principle 4, accessibility.Data Formats Must Be Non-Proprietary: Proprietary formats add unnecessary restrictions over who can use the data, how it can be used and shared, and whether the data will be usable in the future.While some proprietary formats are nearly ubiquitous, it is nevertheless not acceptable to use only proprietary formats. Likewise, the relevant non-proprietary formats may not reach a wide audience. In these cases, it may be necessary to make the data available in multiple formats.Data Must Be License-free: Because government information is a mix of public records, personal information, copyrighted work, and other non-open data, it is important to be clear about what data is available and what licensing, terms of service, and legal restrictions apply. Data for which no restrictions apply should be marked clearly as being in the public domain.Recommendations4. Data Must Be Accessible: The first part of the accessibility principle speaks of availability, meaning the ability for the entirety of the data to be acquired over the Internet. A data set being large does not exempt it from the requirements in this section. Disks are cheap and high definition video is no longer hard to achieve and distribute. When data sets are too large to be made available in whole, in bulk, directly from the source, assistance from the nonprofit and private sector must be sought. As a last resort, a rotation scheme can be deployed to make available a limited window of data at a time.Accessibility also relates to uses by disabled individuals. Accessibility initiatives to be followed include the World Wide Web Consortium'sWeb Accessibility Initiative, and in the United States Section 504 and Section 508 of the federal Rehabilitation Act and Section 255 of the federal Telecommunications Act.Benchmarks for accessibility include whether existing tools are available to process the data and whether tools that use the data could enable vision-impaired individuals to achieve the same comprehension of the data as a sighted individuals, for instance using a Braille workstation or a screen reader, and whether non-English-speaking individuals can use a web service to translate the data (in this case a document) into another language.5. Data Must Be Machine Readable: For tabular or structured data, each record should include an identifier. This identifier should be persistent across revisions to the data set so that external references to individual record can follow updates. The identifier can be a globally unique URI identifier following Semantic Web best practices, for instance. The data format should be documented so that those familiar with the domain of the data set can understand it. All columns, tags, and abbreviations should be described. However, XML schema or the like are not necessary.A benchmark for meeting this requirement is whether a programmer can build a parser for the data in a scripting language in just an afternoon. That parser should be able to crawl through the published dataset and push the data into a database.There should be a means of notifying users of the data to changes in the data format. A mail list or RSS feed aimed at data users, plus a document describing the history of the data format, are recommended.7. Data Formats Must Be Non-Proprietary;8. Data Must Be License-freeBenchmarks for meeting these requirements include whether the data can be used in applications based entirely on free software (including license and patent free), and whether individuals are able to redistribute the data without restriction, without requiring the permission of any third party (including the government).
New Technology Platforms:SaaS solutions allow government to create public/private partnerships to rapidly deploy cloud open data platforms. Government can focus on the policy and data rather than the technology.Resource constraints:We can no longer throw money and resources at a problem until it goes away. Resource constraints have a tendency to stress an organization which can lead to innovation. This is a natural human response to environmental stress: Invent something or make some more suitable to the new environment. Rethink the role of government (as a service)Government can not provide every service.Citizens do engage if given an opportunitySelf-serviceSocially enriched data and metadataEvents and feedback create engagementDevelopers will build apps if provided reasonable access through standardized schemas and APIs.Data and services are exposed in useful ways Use your imaginationBut your imagination won’t match what really emerges
Open Data is an ecosystem. The more government publishes data and engages the public the more demand there is for data and hence the more a government publishes. Open data is an ecosystem of collaboration. A single city open data portal does not represent all of the citizens living in a culturally or politically defined area (Think RTP for the former and Wake County or North Carolina for the latter).“Network effects” offer a “Virtuous Circle”: Macroeconomics discusses a virtuous circle where the availability of a resource compounds demand for the resource as a need is established. Open data is also subject to deflated expectations and collapse if there is no policy or sustainability plan for the open data portal/initiative.
This is a picture of all of the different types of people that attended Data Jam. Some of you in this room were there!Consumers of data fall into 3 basic categories:Casual users: the general publicProgrammatic users: Generally those hitting an API to power an applicationBulk users: These typically are data scientists looking to parse an entire data set at one timeThe latter two consumer user types are looking for data sets that are machine readable and fall into the OKFN’s definition of “open data”. It is the first type that the COR needs to concentrate on for purposes of transparency. The latter two resolve to research and economic development.Open Raleigh engages the public at large by visualizing data sets designated as key performance indicators of the city. These datasets include:Crime dataBudget and financial dataPlanning and land management dataEmergency event dataOur citizen engagement strategy also involves the planning and execution of local and regional events highlighting data, the potential for data use and the partnering with federal, state, county and private groups on sponsoring these events.
Acculturation of Open Data within GovernmentThe acceptance and adoption of an open data initiative, resolution, policy and ultimately a online accessible repository takes executive as well as rank and file buy-in. Open data is not open government but the data can be used to foster innovation as well as transparency. Open data and open government also intersect with both proprietary platforms like Socrata and open source technologies like the Python powered CKAN.Acculturation of the acceptance of open data as a process rather than as an event will take time. The technology is agnostic to the overall process of developing a culture amenable to developing and making available open data as a general rule to the public. The only way to develop and nurture the idea of open data within any organization is persuasion, education and a demonstration of the inherent benefits to the organization at the department level and at the municipal level.This transformation takes the “data exhaust” from applications operations and converts this by-product into a valuable asset for re-use by the public. The public are not the only benefactors of this transformation. Internal data consumers benefit from re-sharing data and reducing man-hours needed to reducing the duplication of data gathering activities. Data can be accessed through an internal as well as external central data repository.Acculturation of “Open by Default”2009 Obama Administration’s Data.gov has evolved in 4 years to include an open data policy that is “Open by Default” (Memorandum M13-13). Open by default pushes agencies to make all data available that is not shielded by security or other classifications that are not available to the public. The effectiveness of this policy has yet to be evaluated. Open Raleigh during our policy meetings with our steering committee has been strongly urged to move forward with an “Open by Default” policy.For the Whitehouse and any other government entity looking adopt such a policy, there are consequences/benefits (depends on how one looks at the issue):Buy-in from stakeholders within departments, agencies and divisionsThe ability to deal with the data deluge from such a policyPrivacy backlash and accountabilityVendors having to retool applicationsChanges in government workflow
Open Data governance model establishedSteering committee made up of COR employees and citizensAdvise on policy and support the strategic direction of Open Raleigh’s portalOpen Data Long Range VisionRegional in natureFocused on usability and economic impactData as infrastructure via API, web services
Raleigh seen as a regional leader in open dataRaleigh is creating a model and a philosophy around open data that makes it easier for other jurisdictions to start their own initiativesRaleigh sees benefits from collaboration with neighboring cities and countiesSupporting open data initiatives in the TriangleDeveloping a set of “best practices” that are the model of other open data initiativesCivic Engagement is the key to sustainable open dataBuilding consensus through civic engagement is the key to Raleigh’s success and acceptance by the communityRaleigh bringing national attention through events like the NC Data Jam and DatapaloozaDataJam and DataPaloozaAs U.S. Chief Technology Officer Todd Park explains, “The Open Data Initiative is a program to … liberate government data … and to actively stimulate the development of new tools and services, and enhance existing tools and services, leveraging the data to help improve Americans’ lives in very tangible ways, and create jobs for the future.”Pointing to the billions of dollars in private sector growth stimulated by publicly available GPS and weather data (think Google Maps and The Weather Channel), the White House has helped launch open-data efforts across multiple sectors, including health, education, energy, public safety, and international development. This initiative has spurred several “datapaloozas” across the country sparking hundreds of promising entrepreneurial ventures.On April 22nd, 2013, the Triangle kicked off the country’s first regional datapalooza. With support from the White House and the City of Raleigh, the event (hosted by HUB Raleigh) brought together dozens of local entrepreneurs and companies like Blue Cross Blue Shield, Cisco, and SAS, to brainstorm how to use the vast amounts of publicly available data.City and state data, helpfully consolidated on Triangle Wiki, by Raleigh, includes city-specific information on crime data, building permit trends, real estate statistics, voter and elected official data, animal center data, and more. At the national level (easiest found at alpha.data.gov) is everything ranging from the energy use of your appliances to hospital quality data to education standards to natural hazards data.The lists are endless and so are the possibilities for innovative uses of this data – evident in the brainstorming sessions that ensued with local entrepreneurs.Among the ideas that emerged: a personalized health dashboard allowing you to track and compare your health to national/local data and plug into locally available resources that are publicly rated; a public parking app; a traffic app that guides you on new routes based on trend analysis; real-time data on how your local representatives are voting; a rating system for schools and child-care centers; and others.These teams now have ninety days to develop these solutions and submit them for judging. In September, the top ideas will be showcased publicly – hopefully accelerating their speed to market and sparking new ideas for data-driven innovation.
The Open Raleigh home page in beta form was launched in March, 2013. We have won the PTI award for our Open Raleigh project for 2013. This is the new DataSlate product from Socrata. Our open data portal can now contextualize data sets with copy that explain methodology, data collection, data vetting and several other attributes. We will also have a privacy impact assessment as well as documentation for developers and site visitors alike. This will allow for greater “democratization” of data by enhancing the customer experience.
The visualization of data in geo-spatial format as well as in graphs has made data easier to understand for our citizens. This is an ESRI map layer of fire stations with a fire incident map overlay.
This shows across the city all of the Sustainability projects happening or completed. My colleague from the sustainability office can speak a little more about how open data and sustainability are working together. We are helping the sustainability office to gather and store their data sets.
Citizens can login an create their own visualizations of COR data sets. There is also documentation on developer APIs and toolkits for the use of our Socrata SODA API foundry.
Data analytics for Open Raleigh. The CityCampNC event closed on May 31st, 2013. In general, the trend for the beta launch is one of growth. May was an especially strong growth month. The first three days of June totaled more than 14,000 page views.Site StatisticsOver 155,000 page viewsThe number of record rows is over 3.4 MillionThe amount of activity correlates to civic engagement activitiesActivity rises during civic engagement eventsData JamCityCampNCNational Day of Hacking
Open Raleigh will be launching GovStat in October of 2013Open Raleigh is committed to rolling out new features to enhance the “democratization of data” through the visual tools supplied by the Socrata platformOpen Raleigh will continue to enhance the availability of government services through public/private partnershipsOpen Raleigh is committed to open data for the long termWhat is GovStat?GovStat blends technology and professional services so that Raleighcan openly set goals, connect to stakeholders, track progress, and achieve results. The GovStat solution helps governments centralize priority data and reduce the friction in accessing data. It also facilitates data analysis across agencies, breaking through traditional data silos. GovStat helps you get started with simple ways to analyze and create reports on your data. You can then create citizen-friendly maps, charts, dashboards, and graphs so that your reports are easy to follow. And, all of those visualizations can be automatically updated in real time.And, while all of this public, real-time data serves Raleigh’s constituency, it also assists in the management of departments’ data access issues. We have an agile environment that enables continuous, evidence-based, decision making and implementation. Intelligent, Informed DecisionsGovStat transforms our government into a data-driven organization with tools that:1. Collect data2. Map data to key priority areas3. Track your progress4. Dynamically visualize your data5. Build custom reports6. Build custom dashboards
Open Data Trends and Issues: Open Raleigh
Open Data Portal Implementation
• Open Raleigh
– What is Open Raleigh
– What is Open Data?
– Who benefits? How?
– Where are we now?
– Where are we headed?
What is Open Raleigh?
Beta Launch: March 2013
The evolution of the Open Raleigh
portal. Above is the beta launch from
DalaSlate Launch: August 2013
To the right is the Open Raleigh
Portal with DataSlate. Anticipated
launch is August 2013.
Who benefits? How?
Why is now a good opportunity for government to
benefit from open data?
Who Benefits? How?
The “Virtuous Circle” of a Data Ecology
Who Benefits? How?
Citizens and government working together
Where are we now?
Data Policy Data Collector Data Producer Data Publisher
The Data Evolution in Raleigh
Open Data Policy as a Living Document
Where are we now?
Launch of Beta
Timeline of Open Raleigh
Where are we now?
The Whitehouse comes to the Triangle #NCDataJam
Where are we now?
Open Raleigh’s DataSlate Project
Where are we now?
Geospatial Visualizations Open Raleigh
Where are we now?
Sustainability Projects in Open Raleigh
Where are we now?
Citizen Tools to Create Visualizations