The document discusses the United States Census Bureau's plans to use big data for the 2020 Census. Specifically:
1) The Census Bureau plans to use governmental and corporate data sources to update addresses and maps, reducing the need for in-person address canvassing. Satellite imagery and change detection techniques will identify areas still needing in-field verification.
2) Administrative records from government agencies will be used to identify vacant housing units and pre-fill census responses to reduce costly in-person follow-ups.
3) A new operational control system using a predictive model and real-time data will efficiently assign and track field workers, improving productivity.
Prototype Global Coding of Political Geographies for Library and Data Managem...Tom Christoffel
Abstract
Regional geographic analysis in the United States is constrained by the alphabetic FIPS codes which were assigned in the 1960’s. Base codes were assigned alphabetically for states, then alphabetically for counties and comparable geographies within states, making it simple to lookup individual state or county data in a list, but offering no geographic information on proximity. Some regional aggregation was done in the establishment of Metropolitan Statistical Areas (MSA), which were separately coded. At the same time, there was no comparable national system to aggregate non-metropolitan counties into standard regions, although most states established some form of multi-county regional councils. Some, like Virginia, used sub-state districts for data aggregation and use by other State agencies, allowing the region number to act like a FIPS code that also embedded geographic information.
The author began work in 1998 on the issue. A review of other national and international systems led to the conclusion that a global geocode system was needed, since existing formats were based on the alphabetic approach which could be handled by early computing. Economists Jeffery Sachs and James K. Galbraith have expressed interest in such a system, as Professor Sach opened his 2012 AAG address by saying that “economists think counties are arranged alphabetically on the globe, since that is the way the data appears.” The purpose of this paper is to present the prototype design for the purpose of further consideration by the user communities.
The system is based on a geocode scheme set up for earth that focuses on established political boundaries as a basis for regional grouping of nations, states and localities. It is decimal system based to take advantage of the sort criteria for numbers in computers. It utilized the Sector Group and Region codes of the United Nations and ISO. Geographic information system technology does not solve the problem, but its tools can be used with the geocodes.
The geocode system effectively organizes Wikipedia entries as a library management and the geocodes can be used for data aggregation. This has been developed under a Creative Commons license and would benefit from a global network implementation where local users cooperatively related subnational geographic regions and component political geography.
Papers in Applied Geography, Volume 36, 2013
The document summarizes the state of GIS in Colorado. It discusses how the Governor's Office of Information Technology (OIT) is working to coordinate GIS data across state agencies through initiatives like the Colorado Information Marketplace and a statewide GIS plan. It provides details on specific datasets like hydrography, municipal boundaries, and addresses that are under active stewardship. It also announces an upcoming winter 2015 data call where local governments will be requested to provide updated thematic data to the state.
The Shape of the 2020 Census (Jim Castagneri)GIS Colorado
The 2020 census will represent a major redesign since 1970, moving from a reliance on paper questionnaires to online and telephone responses. This change aims to make the census more affordable and efficient, with the potential to save $5 billion compared to using 1970s methods. Some households without internet access will still receive paper questionnaires or visits from census staff. New technologies like administrative records and mobile devices integrated with mapping software will help with address listing, self-response, and non-response follow up to reach all households. These changes are part of a larger Census Bureau effort called CEDCaP to modernize data collection, processing, and dissemination for future surveys.
The document discusses analyzing COVID-19 case data using a data science pipeline approach. It outlines collecting data from sources like the COVID Tracking Project and Johns Hopkins University, then cleaning and manipulating the data in Pandas dataframes. It focuses on Maryland county-level case and testing data. The goals are to make predictions about how the crisis may evolve over time to better inform policymakers.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAkevig
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of
expressing any opinions, thoughts towards anything even support their status against any social or
political matter at the same time. Nowadays, people connected to those networks are more likely to prefer
to employ themselves utilizing these online platforms to exhibit their standings upon any political
organizations participating in the election throughout the whole election period. The aim of this paper is to
predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian
federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract
the information from the tweet data to count a virtual vote for each corresponding political group. The
original results of the election closely match the findings of our investigation, published by the Australian
Electoral Commission.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAkevig
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of expressing any opinions, thoughts towards anything even support their status against any social or political matter at the same time. Nowadays, people connected to those networks are more likely to prefer to employ themselves utilizing these online platforms to exhibit their standings upon any political organizations
participating in the election throughout the whole election period. The aim of this paper is to predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract the
information from the tweet data to count a virtual vote for each corresponding political group. The original results of the election closely match the findings of our investigation, published by the Australian Electoral Commission.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAijnlc
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of expressing any opinions, thoughts towards anything even support their status against any social or political matter at the same time. Nowadays, people connected to those networks are more likely to prefer to employ themselves utilizing these online platforms to exhibit their standings upon any political organizations participating in the election throughout the whole election period. The aim of this paper is to predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract the information from the tweet data to count a virtual vote for each corresponding political group. The original results of the election closely match the findings of our investigation, published by the Australian Electoral Commission.
NDGISUC2017 - Introduction to the Boundary and Annexation Survey (BAS)North Dakota GIS Hub
The BAS is an annual voluntary survey conducted by the Census Bureau to collect legal boundary and status updates for governments. Participating governments submit boundary changes by certain deadlines to ensure accurate population and statistical data reporting. The BAS involves state and local governments and aims to maintain accurate governmental unit boundaries and data nationwide.
Prototype Global Coding of Political Geographies for Library and Data Managem...Tom Christoffel
Abstract
Regional geographic analysis in the United States is constrained by the alphabetic FIPS codes which were assigned in the 1960’s. Base codes were assigned alphabetically for states, then alphabetically for counties and comparable geographies within states, making it simple to lookup individual state or county data in a list, but offering no geographic information on proximity. Some regional aggregation was done in the establishment of Metropolitan Statistical Areas (MSA), which were separately coded. At the same time, there was no comparable national system to aggregate non-metropolitan counties into standard regions, although most states established some form of multi-county regional councils. Some, like Virginia, used sub-state districts for data aggregation and use by other State agencies, allowing the region number to act like a FIPS code that also embedded geographic information.
The author began work in 1998 on the issue. A review of other national and international systems led to the conclusion that a global geocode system was needed, since existing formats were based on the alphabetic approach which could be handled by early computing. Economists Jeffery Sachs and James K. Galbraith have expressed interest in such a system, as Professor Sach opened his 2012 AAG address by saying that “economists think counties are arranged alphabetically on the globe, since that is the way the data appears.” The purpose of this paper is to present the prototype design for the purpose of further consideration by the user communities.
The system is based on a geocode scheme set up for earth that focuses on established political boundaries as a basis for regional grouping of nations, states and localities. It is decimal system based to take advantage of the sort criteria for numbers in computers. It utilized the Sector Group and Region codes of the United Nations and ISO. Geographic information system technology does not solve the problem, but its tools can be used with the geocodes.
The geocode system effectively organizes Wikipedia entries as a library management and the geocodes can be used for data aggregation. This has been developed under a Creative Commons license and would benefit from a global network implementation where local users cooperatively related subnational geographic regions and component political geography.
Papers in Applied Geography, Volume 36, 2013
The document summarizes the state of GIS in Colorado. It discusses how the Governor's Office of Information Technology (OIT) is working to coordinate GIS data across state agencies through initiatives like the Colorado Information Marketplace and a statewide GIS plan. It provides details on specific datasets like hydrography, municipal boundaries, and addresses that are under active stewardship. It also announces an upcoming winter 2015 data call where local governments will be requested to provide updated thematic data to the state.
The Shape of the 2020 Census (Jim Castagneri)GIS Colorado
The 2020 census will represent a major redesign since 1970, moving from a reliance on paper questionnaires to online and telephone responses. This change aims to make the census more affordable and efficient, with the potential to save $5 billion compared to using 1970s methods. Some households without internet access will still receive paper questionnaires or visits from census staff. New technologies like administrative records and mobile devices integrated with mapping software will help with address listing, self-response, and non-response follow up to reach all households. These changes are part of a larger Census Bureau effort called CEDCaP to modernize data collection, processing, and dissemination for future surveys.
The document discusses analyzing COVID-19 case data using a data science pipeline approach. It outlines collecting data from sources like the COVID Tracking Project and Johns Hopkins University, then cleaning and manipulating the data in Pandas dataframes. It focuses on Maryland county-level case and testing data. The goals are to make predictions about how the crisis may evolve over time to better inform policymakers.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAkevig
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of
expressing any opinions, thoughts towards anything even support their status against any social or
political matter at the same time. Nowadays, people connected to those networks are more likely to prefer
to employ themselves utilizing these online platforms to exhibit their standings upon any political
organizations participating in the election throughout the whole election period. The aim of this paper is to
predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian
federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract
the information from the tweet data to count a virtual vote for each corresponding political group. The
original results of the election closely match the findings of our investigation, published by the Australian
Electoral Commission.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAkevig
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of expressing any opinions, thoughts towards anything even support their status against any social or political matter at the same time. Nowadays, people connected to those networks are more likely to prefer to employ themselves utilizing these online platforms to exhibit their standings upon any political organizations
participating in the election throughout the whole election period. The aim of this paper is to predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract the
information from the tweet data to count a virtual vote for each corresponding political group. The original results of the election closely match the findings of our investigation, published by the Australian Electoral Commission.
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATAijnlc
In this era of technology, enormous Online Social Networking Sites (OSNs) have arisen as a medium of expressing any opinions, thoughts towards anything even support their status against any social or political matter at the same time. Nowadays, people connected to those networks are more likely to prefer to employ themselves utilizing these online platforms to exhibit their standings upon any political organizations participating in the election throughout the whole election period. The aim of this paper is to predict the outcome of the election by engaging the tweets posted on Twitter pertaining to the Australian federal election-2019 held on May 18, 2019. We aggregated two efficacious techniques in order to extract the information from the tweet data to count a virtual vote for each corresponding political group. The original results of the election closely match the findings of our investigation, published by the Australian Electoral Commission.
NDGISUC2017 - Introduction to the Boundary and Annexation Survey (BAS)North Dakota GIS Hub
The BAS is an annual voluntary survey conducted by the Census Bureau to collect legal boundary and status updates for governments. Participating governments submit boundary changes by certain deadlines to ensure accurate population and statistical data reporting. The BAS involves state and local governments and aims to maintain accurate governmental unit boundaries and data nationwide.
The document discusses preparations for the 2020 Census and Geographic Partnerships. It provides updates on the 2020 Census program, the Geographic Support System Initiative (GSS-I), the Local Update of Census Addresses Program (LUCA), and the Redistricting Data Program (RDP). It summarizes various census tests conducted in 2015, 2016, and future tests planned for 2017 and 2018. The goals are to test new address canvassing methods, self-response optimization, use of administrative records, and field operations to improve the 2020 Census.
This document provides instructions for analyzing data on limited English proficient (LEP) individuals living in poverty using Census Bureau datasets. It explains how to access LEP and combined LEP-poverty information from the American Community Survey using FactFinder and Data Ferrett. It also provides guidance and formulas for manipulating the downloaded Public Use Microdata Sample data in Excel to calculate cross-tabulations of LEP status and poverty level. The goal is to summarize the number of LEP and non-LEP individuals below or above the poverty threshold.
TFTN refers to a nationwide transportation dataset comprised of core transportation features and attributes that meet federal data needs and accuracy requirements. It incorporates data from federal, state, tribal, county, and local sources, with a focus initially on road centerlines. The federal government's role should be to meet federal road requirements, which will capture many national needs, while state and local governments should add to and maintain the national dataset. The private sector has a role in collecting and maintaining some attributes, and public participation through tools like VGI can also contribute.
Federal Statistical System, Transparency Camp Westbradstenger
Peter Orszag, the Director of the Office of Management and Budget, cites "evidence-based policy" to support healthcare reform. However, his evidence comes from Dartmouth University rather than the Federal Statistical System overseen by Katherine Wallman. The statistical system faces challenges in meeting transparency goals due to cultural and technical issues. While statistics are underfunded at just $10-25 per taxpayer, they provide crucial information and were important in WWII. Collaboration between journalists, programmers, statisticians, and policymakers could help improve the system.
This document discusses how think tanks can harness big data and machine learning to enhance policy analysis. It covers the following key points:
I. The rise of big data and artificial intelligence worldwide and how countries and companies are applying these technologies.
II. Examples of how the Korea Development Institute (KDI) has used machine learning and big data to analyze policy issues like selecting R&D grant recipients and assessing regional economies.
III. The future potential for statistical agencies and think tanks to build integrated data platforms, modernize analytical methods, and conduct predictive policy analysis and evaluation using big data.
1. The U.S. Census Bureau faces challenges from the rise of big data sources produced outside of traditional government surveys. These new sources are generated faster and more cheaply than surveys.
2. To remain reliable sources of demographic and economic information, the Census Bureau must integrate these new big data sources with traditional surveys. This requires linking massive datasets and developing new statistical modeling techniques.
3. The Census Bureau is exploring ways to use new big data sources like web search data, social media, and e-commerce transactions to improve surveys and provide more timely, detailed information. However, maintaining privacy and developing new technology is difficult.
OpenDataCommunities and Hampshire Hub presentation for Hampshire and Isle of ...Mark Braggins
Joint presentation by Steve Peters and Mark Braggins for Hampshire and Isle of Wight local authority Chief Executives about OpenDataCommunities and the Hampshire Hub linked open data initiatives.
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
In our modern world more and more data are generated on the web and produced by sensors in the ever growing number of electronic devices surrounding us. The amount of data and the frequency at which they are produced have led to the concept of 'Big data'. Big data is characterized as data sets of increasing volume, velocity and variety; the 3 V's. Big data is often largely unstructured, meaning that it has no pre-defined data model and/or does not fit well into conventional relational databases.
EUBrasilCloudFORUM actively participated at Beyon2020 event. Professor Sergio Takeo Kofuji from the University of São Paulo (USP) discussed the importance of Open Data for cities.
The Beyond 2020 event took place at "Centro de Convenções de Pernambuco" from the 27th to the 29th of July, focusing on how urban innovation ecosystems can support citizens and what the future will look like for Cities, Politics, Citizens, Local Development, and Tourism, based on the use of Open data Platforms.
As an organisation who work extensively with the chief statistical organisations on both sides of the Tasman, our demographer and Census expert, Glenn Capuano, shares his experience working with data from the Australian Bureau of Statistics and Stats New Zealand.
Web scraping, a technique to extract data from websites, plays a pivotal role in gathering information from government sites and the public sector. This data holds immense value for making informed decisions across various industries and enhancing public services.
Government websites are treasure troves of information, offering insights into policies, regulations, public spending, demographics, and more. By scraping data from these sources, analysts can monitor changes in legislation, track government expenditures, and identify emerging trends.
For example, in healthcare, web scraping can be used to collect data on public health initiatives, disease outbreaks, hospital performance metrics, and pharmaceutical regulations. Analyzing this data enables policymakers and healthcare providers to allocate resources effectively, improve patient care, and develop targeted interventions.
Similarly, in finance and economics, scraping data from government websites provides access to economic indicators, budget allocations, trade statistics, and labor market trends. This information aids investors, economists, and businesses in making strategic decisions, assessing market conditions, and forecasting economic trends.
In the education sector, web scraping facilitates the collection of data on school enrollment, academic performance, funding allocations, and educational policies. This data helps education policymakers, administrators, and researchers identify areas for improvement, allocate resources efficiently, and implement evidence-based reforms.
Furthermore, web scraping empowers governments and organizations to enhance public services by gathering feedback and sentiment data from social media, forums, and review websites. Analyzing this data enables policymakers to understand public opinion, identify issues, and address citizen concerns effectively.
However, web scraping also raises ethical and legal considerations, especially when scraping data from government websites. It's crucial to ensure compliance with data protection regulations, respect website terms of service, and obtain consent when necessary. Additionally, data quality and accuracy must be carefully evaluated to prevent misinformation and bias in decision-making processes.
In conclusion, web scraping plays a vital role in collecting data from government sites and the public sector to inform decision-making and enhance public services across various industries. By leveraging this technique responsibly, organizations can gain valuable insights to drive positive change and improve societal outcomes.
A research poster presented as part of the Exploring the Emerging Impacts of Open Data in Developing Countries project at the Research Sharing Event in Berlin, 15th July 2014. For more see http://www.opendataresearch.org/emergingimpacts/
The document summarizes the progress and plans of the UK Office for National Statistics' (ONS) Administrative Data Census Project. The project aims to replace the traditional census with population statistics derived from administrative data by 2021. So far, the project has had success producing population estimates from linked health and tax records. However, fully replacing the census will require improved access to additional administrative data, better data linkage methods, and methods to produce a wider range of statistical outputs to meet user needs. The assessment concludes that while estimates of population size and numbers of households may be feasible by 2023, fully replacing the census with administrative data alone is unlikely due to limitations in available data and methods. Continued progress will depend on new legislation, engagement with
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...Brittne Kakulla, Ph.D.
Big Data Science analysis of economic drivers impacting US broadband development using Census data and State Broadband Initiative Broadband Map data from 2011-2014.
ODDC Context - Opening the Cities: Open Government Data in Local Governments ...Open Data Research Network
Presentation in the first workshop of the Exploring the Emerging Impacts of Open Data in Developing Countries project. Looking at the context of open data, and the research case study planned for 2013 - 2014. See http://www.opendataresearch.org/project/2013/jcv
This report was prepared for the City of Syracuse by a Masters of Public Administration class at the Maxwell School of Citizenship and Public Affairs at Syracuse University. The team consisted of Jinsol Park, Dan Petrick, Krishna Kesari, Sarah Baumunk, and was overseen by Jesse Lecy.
This document discusses the international statistical community's movement towards open data and the case for statistics being a key category of open data. It notes that the UN Statistical Commission organized a seminar in 2010 to discuss emerging trends in data communication, including innovations in making data freely available and reusable through open licensing. The document argues that statistics should be considered a public good since they are produced using public funds and are essential for development. It outlines how statistics have historically been collected by governments for fiscal and policy purposes. Key principles for official statistics emphasize making data available to the public on an impartial basis and presenting metadata to facilitate proper interpretation.
The Mapping Revolution: Incorporating Geographic Information Systems in Gover...GovLoop
Since the beginning of civilization, humans have used images as a means to tell stories. We have used images to educate, entertain or to build a just and moral society. Our ancient ancestors would use images to remember stories and archive information for future generations. Similar to our ancient ancestors, we use images today to convey meaning, understand complex relationships and improve communication.
The use of mapping and geospatial technology is at the heart of story telling and improved communications. As the challenges of the public sector continue to grow in complexity, efficient and effective communication tools are essential. Today, government is more interconnected than ever before, and the complexity has led to increased integration between state, local and federal officials. At all levels of government, agencies are looking for solutions to find value and improve public sector decision-making through data. http://www.govloop.com/profiles/blogs/govloop-guide-the-mapping-revolution-incorporating-geographic-inf
Research Poster-Exploring the Impact of Web Publishing Budgetary Information ...Open Data Research Network
A research poster presented as part of the Exploring the Emerging Impacts of Open Data in Developing Countries project at the Research Sharing Event in Berlin, 15th July 2014. For more see http://www.opendataresearch.org/emergingimpacts
Better data to better monitor status on women in informal employment, unpaid ...Dr Lendy Spires
The roundtable discussion will bring together key stakeholders to discuss improving gender data and statistics related to labor. It will identify challenges in measuring various forms of unpaid work performed mainly by women. Participants will discuss good practices for establishing comprehensive measures that account for all productive activities, in order to better monitor the status of women worldwide and inform policies to promote gender equality. The roundtable aims to develop concrete actions and partnerships to strengthen countries' statistical capabilities and address remaining data gaps.
The document discusses preparations for the 2020 Census and Geographic Partnerships. It provides updates on the 2020 Census program, the Geographic Support System Initiative (GSS-I), the Local Update of Census Addresses Program (LUCA), and the Redistricting Data Program (RDP). It summarizes various census tests conducted in 2015, 2016, and future tests planned for 2017 and 2018. The goals are to test new address canvassing methods, self-response optimization, use of administrative records, and field operations to improve the 2020 Census.
This document provides instructions for analyzing data on limited English proficient (LEP) individuals living in poverty using Census Bureau datasets. It explains how to access LEP and combined LEP-poverty information from the American Community Survey using FactFinder and Data Ferrett. It also provides guidance and formulas for manipulating the downloaded Public Use Microdata Sample data in Excel to calculate cross-tabulations of LEP status and poverty level. The goal is to summarize the number of LEP and non-LEP individuals below or above the poverty threshold.
TFTN refers to a nationwide transportation dataset comprised of core transportation features and attributes that meet federal data needs and accuracy requirements. It incorporates data from federal, state, tribal, county, and local sources, with a focus initially on road centerlines. The federal government's role should be to meet federal road requirements, which will capture many national needs, while state and local governments should add to and maintain the national dataset. The private sector has a role in collecting and maintaining some attributes, and public participation through tools like VGI can also contribute.
Federal Statistical System, Transparency Camp Westbradstenger
Peter Orszag, the Director of the Office of Management and Budget, cites "evidence-based policy" to support healthcare reform. However, his evidence comes from Dartmouth University rather than the Federal Statistical System overseen by Katherine Wallman. The statistical system faces challenges in meeting transparency goals due to cultural and technical issues. While statistics are underfunded at just $10-25 per taxpayer, they provide crucial information and were important in WWII. Collaboration between journalists, programmers, statisticians, and policymakers could help improve the system.
This document discusses how think tanks can harness big data and machine learning to enhance policy analysis. It covers the following key points:
I. The rise of big data and artificial intelligence worldwide and how countries and companies are applying these technologies.
II. Examples of how the Korea Development Institute (KDI) has used machine learning and big data to analyze policy issues like selecting R&D grant recipients and assessing regional economies.
III. The future potential for statistical agencies and think tanks to build integrated data platforms, modernize analytical methods, and conduct predictive policy analysis and evaluation using big data.
1. The U.S. Census Bureau faces challenges from the rise of big data sources produced outside of traditional government surveys. These new sources are generated faster and more cheaply than surveys.
2. To remain reliable sources of demographic and economic information, the Census Bureau must integrate these new big data sources with traditional surveys. This requires linking massive datasets and developing new statistical modeling techniques.
3. The Census Bureau is exploring ways to use new big data sources like web search data, social media, and e-commerce transactions to improve surveys and provide more timely, detailed information. However, maintaining privacy and developing new technology is difficult.
OpenDataCommunities and Hampshire Hub presentation for Hampshire and Isle of ...Mark Braggins
Joint presentation by Steve Peters and Mark Braggins for Hampshire and Isle of Wight local authority Chief Executives about OpenDataCommunities and the Hampshire Hub linked open data initiatives.
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
In our modern world more and more data are generated on the web and produced by sensors in the ever growing number of electronic devices surrounding us. The amount of data and the frequency at which they are produced have led to the concept of 'Big data'. Big data is characterized as data sets of increasing volume, velocity and variety; the 3 V's. Big data is often largely unstructured, meaning that it has no pre-defined data model and/or does not fit well into conventional relational databases.
EUBrasilCloudFORUM actively participated at Beyon2020 event. Professor Sergio Takeo Kofuji from the University of São Paulo (USP) discussed the importance of Open Data for cities.
The Beyond 2020 event took place at "Centro de Convenções de Pernambuco" from the 27th to the 29th of July, focusing on how urban innovation ecosystems can support citizens and what the future will look like for Cities, Politics, Citizens, Local Development, and Tourism, based on the use of Open data Platforms.
As an organisation who work extensively with the chief statistical organisations on both sides of the Tasman, our demographer and Census expert, Glenn Capuano, shares his experience working with data from the Australian Bureau of Statistics and Stats New Zealand.
Web scraping, a technique to extract data from websites, plays a pivotal role in gathering information from government sites and the public sector. This data holds immense value for making informed decisions across various industries and enhancing public services.
Government websites are treasure troves of information, offering insights into policies, regulations, public spending, demographics, and more. By scraping data from these sources, analysts can monitor changes in legislation, track government expenditures, and identify emerging trends.
For example, in healthcare, web scraping can be used to collect data on public health initiatives, disease outbreaks, hospital performance metrics, and pharmaceutical regulations. Analyzing this data enables policymakers and healthcare providers to allocate resources effectively, improve patient care, and develop targeted interventions.
Similarly, in finance and economics, scraping data from government websites provides access to economic indicators, budget allocations, trade statistics, and labor market trends. This information aids investors, economists, and businesses in making strategic decisions, assessing market conditions, and forecasting economic trends.
In the education sector, web scraping facilitates the collection of data on school enrollment, academic performance, funding allocations, and educational policies. This data helps education policymakers, administrators, and researchers identify areas for improvement, allocate resources efficiently, and implement evidence-based reforms.
Furthermore, web scraping empowers governments and organizations to enhance public services by gathering feedback and sentiment data from social media, forums, and review websites. Analyzing this data enables policymakers to understand public opinion, identify issues, and address citizen concerns effectively.
However, web scraping also raises ethical and legal considerations, especially when scraping data from government websites. It's crucial to ensure compliance with data protection regulations, respect website terms of service, and obtain consent when necessary. Additionally, data quality and accuracy must be carefully evaluated to prevent misinformation and bias in decision-making processes.
In conclusion, web scraping plays a vital role in collecting data from government sites and the public sector to inform decision-making and enhance public services across various industries. By leveraging this technique responsibly, organizations can gain valuable insights to drive positive change and improve societal outcomes.
A research poster presented as part of the Exploring the Emerging Impacts of Open Data in Developing Countries project at the Research Sharing Event in Berlin, 15th July 2014. For more see http://www.opendataresearch.org/emergingimpacts/
The document summarizes the progress and plans of the UK Office for National Statistics' (ONS) Administrative Data Census Project. The project aims to replace the traditional census with population statistics derived from administrative data by 2021. So far, the project has had success producing population estimates from linked health and tax records. However, fully replacing the census will require improved access to additional administrative data, better data linkage methods, and methods to produce a wider range of statistical outputs to meet user needs. The assessment concludes that while estimates of population size and numbers of households may be feasible by 2023, fully replacing the census with administrative data alone is unlikely due to limitations in available data and methods. Continued progress will depend on new legislation, engagement with
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...Brittne Kakulla, Ph.D.
Big Data Science analysis of economic drivers impacting US broadband development using Census data and State Broadband Initiative Broadband Map data from 2011-2014.
ODDC Context - Opening the Cities: Open Government Data in Local Governments ...Open Data Research Network
Presentation in the first workshop of the Exploring the Emerging Impacts of Open Data in Developing Countries project. Looking at the context of open data, and the research case study planned for 2013 - 2014. See http://www.opendataresearch.org/project/2013/jcv
This report was prepared for the City of Syracuse by a Masters of Public Administration class at the Maxwell School of Citizenship and Public Affairs at Syracuse University. The team consisted of Jinsol Park, Dan Petrick, Krishna Kesari, Sarah Baumunk, and was overseen by Jesse Lecy.
This document discusses the international statistical community's movement towards open data and the case for statistics being a key category of open data. It notes that the UN Statistical Commission organized a seminar in 2010 to discuss emerging trends in data communication, including innovations in making data freely available and reusable through open licensing. The document argues that statistics should be considered a public good since they are produced using public funds and are essential for development. It outlines how statistics have historically been collected by governments for fiscal and policy purposes. Key principles for official statistics emphasize making data available to the public on an impartial basis and presenting metadata to facilitate proper interpretation.
The Mapping Revolution: Incorporating Geographic Information Systems in Gover...GovLoop
Since the beginning of civilization, humans have used images as a means to tell stories. We have used images to educate, entertain or to build a just and moral society. Our ancient ancestors would use images to remember stories and archive information for future generations. Similar to our ancient ancestors, we use images today to convey meaning, understand complex relationships and improve communication.
The use of mapping and geospatial technology is at the heart of story telling and improved communications. As the challenges of the public sector continue to grow in complexity, efficient and effective communication tools are essential. Today, government is more interconnected than ever before, and the complexity has led to increased integration between state, local and federal officials. At all levels of government, agencies are looking for solutions to find value and improve public sector decision-making through data. http://www.govloop.com/profiles/blogs/govloop-guide-the-mapping-revolution-incorporating-geographic-inf
Research Poster-Exploring the Impact of Web Publishing Budgetary Information ...Open Data Research Network
A research poster presented as part of the Exploring the Emerging Impacts of Open Data in Developing Countries project at the Research Sharing Event in Berlin, 15th July 2014. For more see http://www.opendataresearch.org/emergingimpacts
Better data to better monitor status on women in informal employment, unpaid ...Dr Lendy Spires
The roundtable discussion will bring together key stakeholders to discuss improving gender data and statistics related to labor. It will identify challenges in measuring various forms of unpaid work performed mainly by women. Participants will discuss good practices for establishing comprehensive measures that account for all productive activities, in order to better monitor the status of women worldwide and inform policies to promote gender equality. The roundtable aims to develop concrete actions and partnerships to strengthen countries' statistical capabilities and address remaining data gaps.
RESEARCH ARTICLEThe Government Finance Database ACommon.docxrgladys1
RESEARCH ARTICLE
The Government Finance Database: A
Common Resource for Quantitative Research
in Public Financial Analysis
Kawika Pierson*, Michael L. Hand, Fred Thompson
Atkinson Graduate School of Management, Center for Governance and Public Policy Research, Willamette
University, Salem, Oregon, United States of America
* [email protected]
Abstract
Quantitative public financial management research focused on local governments is limited
by the absence of a common database for empirical analysis. While the U.S. Census
Bureau distributes government finance data that some scholars have utilized, the arduous
process of collecting, interpreting, and organizing the data has led its adoption to be prohibi-
tive and inconsistent. In this article we offer a single, coherent resource that contains all of
the government financial data from 1967-2012, uses easy to understand natural-language
variable names, and will be extended when new data is available.
Introduction
Widely shared and easy to use databases facilitate quantitative research and render the replica-
tion of findings practical and convenient [1]. Indeed, much of what we know about public
finance has been tested against large microdata sets–in the United States, primarily merged
information files based on household-level data from the IRS Individual Public-Use Tax Files,
the Current Population Survey, the Consumer Expenditure Survey, and the triennial Survey of
Consumer Finances. Unfortunately, students of public financial management at the local gov-
ernment level must often rely on one-off, custom-built datasets to pursue their inquiries, which
is costly, inimical to replication, and leaves practitioners uncertain about the utility of academic
insights.
For someone from outside the field of public financial management the lack of widely used
and consistently applied data might seem an unlikely obstacle. After all, scholars of public
financial management have access to a database that is in many respects ideally suited to their
needs. The U.S. Census Bureau has surveyed state and local governments annually since 1967,
and, as the Director of the U.S. Census Bureau stated in a letter accompanying the 2013 request
for financial information: “This survey is the only comprehensive source of information on the
finances of local governments in the United States.”
Many examples of research using these data exist, including recent papers by Gore [2],
Baber and Gore [3], Kido et al. [4], Murray et al. [5], Carroll [6], Mullins [7], and Fisher and
PLOS ONE | DOI:10.1371/journal.pone.0130119 June 24, 2015 1 / 22
OPEN ACCESS
Citation: Pierson K, Hand ML, Thompson F (2015)
The Government Finance Database: A Common
Resource for Quantitative Research in Public
Financial Analysis. PLoS ONE 10(6): e0130119.
doi:10.1371/journal.pone.0130119
Editor: Frank Emmert-Streib, Queen's University
Belfast, UNITED KINGDOM
Received: November 23, 2014
Accepted: May 17, 2015
Published: June 24, 2015
C.
Australia bureau of statistics some initiatives on big data - 23 july 2014noviari sugianto
This document discusses the opportunities and challenges of using Big Data in official statistics. It outlines several potential applications of Big Data, including sample frame creation, full or partial data substitution, imputation, and generating new insights. However, the decision to use a Big Data source should be based on a strong business case and cost-benefit analysis. The document provides an example cost-benefit analysis for using satellite imagery to replace agricultural survey data. It also emphasizes that Big Data sources must meet validity criteria for statistical inferences.
Big Data refers to large, complex datasets that traditional data processing applications are unable to handle efficiently. Spark is a fast, general engine for large-scale data processing that supports multiple languages and data sources. Spark uses resilient distributed datasets (RDDs) that operate on data stored in cluster memory for faster performance compared to the disk-based MapReduce model. DataFrames provide a distributed collection of data organized into named columns similar to a relational database, enabling SQL-like queries and optimizations.
Businesses, governments, and the research community can all derive value from the massive amounts of digital data they collect. Analyzing big-data application projects by governments offers guidance for follower countries for their own future big-data initiatives. Decision making in government usually takes much longer and is conducted through consultation and mutual consent of a large number of diverse actors, including officials, interest groups, and ordinary citizens. Governments deal not only with general issues of big-data integration from multiple sources and in different formats and cost but also with some special challenges. The biggest is collecting data, governments have difficulty, as the data not only comes from multiple channels but from different sources. Most governments operating or planning big-data projects need to take a step-by-step approach for setting the right goals and realistic expectations. Success depends on their ability to integrate and analyze information, develop supporting systems, and support decision making through analytics. Pravin Dattu Gangad | Pranay Pravin Gaikwad"Use of Big Data in Government Sector" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd15676.pdf http://www.ijtsrd.com/computer-science/database/15676/use-of-big-data-in-government-sector/pravin-dattu-gangad
2. Elemental Economics - Mineral demand.pdfNeal Brewster
After this second you should be able to: Explain the main determinants of demand for any mineral product, and their relative importance; recognise and explain how demand for any product is likely to change with economic activity; recognise and explain the roles of technology and relative prices in influencing demand; be able to explain the differences between the rates of growth of demand for different products.
Lecture slide titled Fraud Risk Mitigation, Webinar Lecture Delivered at the Society for West African Internal Audit Practitioners (SWAIAP) on Wednesday, November 8, 2023.
Seminar: Gender Board Diversity through Ownership NetworksGRAPE
Seminar on gender diversity spillovers through ownership networks at FAME|GRAPE. Presenting novel research. Studies in economics and management using econometrics methods.
1. Elemental Economics - Introduction to mining.pdfNeal Brewster
After this first you should: Understand the nature of mining; have an awareness of the industry’s boundaries, corporate structure and size; appreciation the complex motivations and objectives of the industries’ various participants; know how mineral reserves are defined and estimated, and how they evolve over time.
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...mayaclinic18
Whatsapp (+971581248768) Buy Abortion Pills In Dubai/ Qatar/Kuwait/Doha/Abu Dhabi/Alain/RAK City/Satwa/Al Ain/Abortion Pills For Sale In Qatar, Doha. Abu az Zuluf. Abu Thaylah. Ad Dawhah al Jadidah. Al Arish, Al Bida ash Sharqiyah, Al Ghanim, Al Ghuwariyah, Qatari, Abu Dhabi, Dubai.. WHATSAPP +971)581248768 Abortion Pills / Cytotec Tablets Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujeira, Ras Al Khaima, Umm Al Quwain., UAE, buy cytotec in Dubai– Where I can buy abortion pills in Dubai,+971582071918where I can buy abortion pills in Abudhabi +971)581248768 , where I can buy abortion pills in Sharjah,+97158207191 8where I can buy abortion pills in Ajman, +971)581248768 where I can buy abortion pills in Umm al Quwain +971)581248768 , where I can buy abortion pills in Fujairah +971)581248768 , where I can buy abortion pills in Ras al Khaimah +971)581248768 , where I can buy abortion pills in Alain+971)581248768 , where I can buy abortion pills in UAE +971)581248768 we are providing cytotec 200mg abortion pill in dubai, uae.Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman
Vicinity Jobs’ data includes more than three million 2023 OJPs and thousands of skills. Most skills appear in less than 0.02% of job postings, so most postings rely on a small subset of commonly used terms, like teamwork.
Laura Adkins-Hackett, Economist, LMIC, and Sukriti Trehan, Data Scientist, LMIC, presented their research exploring trends in the skills listed in OJPs to develop a deeper understanding of in-demand skills. This research project uses pointwise mutual information and other methods to extract more information about common skills from the relationships between skills, occupations and regions.
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
In a tight labour market, job-seekers gain bargaining power and leverage it into greater job quality—at least, that’s the conventional wisdom.
Michael, LMIC Economist, presented findings that reveal a weakened relationship between labour market tightness and job quality indicators following the pandemic. Labour market tightness coincided with growth in real wages for only a portion of workers: those in low-wage jobs requiring little education. Several factors—including labour market composition, worker and employer behaviour, and labour market practices—have contributed to the absence of worker benefits. These will be investigated further in future work.
OJP data from firms like Vicinity Jobs have emerged as a complement to traditional sources of labour demand data, such as the Job Vacancy and Wages Survey (JVWS). Ibrahim Abuallail, PhD Candidate, University of Ottawa, presented research relating to bias in OJPs and a proposed approach to effectively adjust OJP data to complement existing official data (such as from the JVWS) and improve the measurement of labour demand.
1. 1
Big Data and Censuses:
The Challenge for the 2020 World Round
of Population and Housing Censuses
Talking points developed for the
Population and Housing Census Lunchtime Seminar
46th
Session of the
United Nations Statistical Commission
United Nations Headquarters, New York
3 March 2015
Lisa M. Blumerman
Assistant Director, Decennial Census Programs
U.S. Census Bureau
Disclaimer: This report is released to inform interested parties of research and to encourage
discussion. Any views expressed are those of the author and not necessarily those of the U.S.
Census Bureau.
2. 2
Good afternoon. I’d like to begin by thanking the Chair and Organizer of this session for the
opportunity to present this afternoon, to share the United States experiences with big data as
we prepare for our 2020 Census, and to learn from others as to their experiences in this area.
The panel was asked to address three issues today:
What is the main focus in terms of using big data for a population and housing census?
Is the primary focus on governmental big data, corporate big data, or both?
What are the main findings so far?
These are exciting questions for me to consider and discuss as the plans for the United States
2020 Census unfolds.
To begin, please allow me to provide a little context on the design for our 2020 Census.
A principal goal for the U.S. 2020 Census is to conduct the Census at a lower cost than our
previous 2010 Census while maintaining high quality results. In order to do this, we are
reengineering the majority of our Census processes, including our data collection techniques,
our methodologies, and our field structure. As we have planned over the past 5 years, we have
focused our research and testing efforts in the areas with the greatest opportunity for cost
savings and the introduction of innovations, including the use of big data and tools to manage
big data.
As part of our redesign, the United States has several high-level guiding principles that we are
operating under. The first guiding principal for the design of the 2020 Census is to:
1) Use the Internet to increase self response; [optimizing self response]
Internet data collection
Notify Me – early engagement for people to provide an email address or mobile
phone number that we would use to contact them when we are ready to begin
data collection.
Advertising, partnership, and promotion
3. 3
Non-ID processing enables processing of a questionnaire without an identification
number, for those households that did not receive a notice or questionnaire
mailed to their housing unit.
2) Automate operations to increase productivity and reduce temporary staff and offices;
[reengineering field operations]
3) Use information people have already given the government for other purposes that can
be used to answer Census questions and reduce the follow-up workload; [reengineering
nonresponse follow-up] and
4) Update existing maps and addresses to reflect changes rather than walking every block
in every neighborhood in America in 2019. [reengineered address canvassing]
Our work today in these areas allows us to estimate that we will achieve a savings of
approximately $5.1 billion, as compared with our 2010 Census in 2020 dollars.
The use of big data isn’t new for the United States. We’ve been using administrative data, such
as tax data, for decades to improve our data collections. Today, there is a new generation of
big data – as the electronic environment flourishes – that we must keep up with. We are
researching ways to utilize these new data sources in our collections in order to increase
efficiencies and to reduce costs and the time it takes to disseminate statistics. At the same
time, we must also continue to maintain the quality of the official statistics.
With that as a brief background, I would like to delve into the three questions posed and
discuss the United States’ proposed use of big data in the context of two of our innovation
areas and operations – the address canvassing operation and the nonresponse follow-up
operation.
A specific new application of the use of big data proposed for the 2020 Census is in our plans to
reengineer our address canvassing operation. Traditionally, in the years prior to the decennial
4. 4
census, the Census Bureau employed a large-scale field collection effort to walk every block1
in
the United States (about 6.7 million blocks and drove 137 million miles). The availability of high
quality, high resolution satellite, aerial, and street-level imagery, with increasing frequency of
updating, now provides a viable and effective alternative to fieldwork for many parts of the
United States, Puerto Rico, and the Island Areas2
. For the 2020 Census, we have redefined
address canvassing to be a combination of in-office and in-field canvassing where we will
continue to canvass every block, but we will only conduct in-field, on the ground canvassing
where it is necessary.
The goal of reengineering address canvassing is to eliminate a nationwide in-field address
canvassing in 2019. Instead, the Census Bureau proposes to utilize the use of statistical models
to help predict where change is occurring, combined with aerial imagery and change detection
techniques to identify the areas where in-field canvassing is required.
We are working now with terabytes worth of data from federal, state and local government
sources, as well as the private sector to update our Master Address File (MAF) and determine
where in-field work is required. Specific federal files we are working with include: extracts from
the United States Postal Service, Indian Health Service Registration File, U.S. Department of
Housing and Urban Development Public and Indian Housing Information File, and the Selective
Service Registration File. In addition to the files from the Federal Government, we also
maintain an extensive partnership program with state and local governments where we have
been receiving “partner” files for several years and have processed and ingested those files into
1
Blocks (Census Blocks) are statistical areas bounded by visible features, such as streets, roads, streams, and railroad tracks,
and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short
line-of-sight extensions of streets and roads. Generally, census blocks are small in area; for example, a block in a city bounded
on all sides by streets. Census blocks in suburban and rural areas may be large, and irregular, and bounded by a variety of
features, such as roads, streams, and transmission lines. In remote areas, census blocks may encompass hundreds of square
miles. Census blocks cover the entire territory of the United States. Census blocks nest within all other tabulated census
geographic entities and are the basis for all tabulated data.
2
The Island Areas of the United States are American Samoa, Guam, the Commonwealth of the Northern Mariana Islands
(Northern Mariana Islands), and the United States Virgin Islands. Sometimes the Island Areas are referred to as "Island
Territories" or "Insular Areas." For the 1990 and previous censuses, the U.S. Census Bureau referred to the entities as "Outlying
Areas."
5. 5
our address frame. We are also using private sector data where we have found gaps that exist
in the governmental data.
For the purpose of address canvassing, a large focus of our efforts with big data include both
governmental and commercial (corporate data). We are looking at the best methodologies to
integrate these data, predict change, and then take appropriate actions by either updating our
Master Address File or conducting an in-field canvass.
Current research in this area is very promising. We just concluded our first test of the new
approach, called the Address Validation Test, and we are in the data analysis stage now. In this
test, we developed two independent statistical models that predict the stability of a block, and
then conducted a dependent listing operation of 10,100 blocks in the United States. The result
of that dependent listing is now being compared to the output from the statistical models to
assess the veracity of the models. A second component of this test identified 29 counties in the
United States for which we had aerial imagery in-house that overlapped with the 10,100 blocks.
Both automated and manual change detection techniques were utilized in those 29 counties to
identify blocks with change and of those approximately 700 blocks were identified as suitable
for an in-field canvass. The results of this in-field canvass will be compared with the results of
the change detection work, as well as the results of the dependent listing operation. We expect
to have these findings by early April and they will inform the proposed methodology for the
2020 Census.
A second application of the use of big data in the 2020 Census design is in our planning and
operationalization of our nonresponse follow-up operation. The reengineering of the
components of this operation encompasses two of the Census Bureau’s innovation areas:
Utilizing Administrative Data and Reengineering Field Operations.
The goal of Utilizing Administrative Data is to use data that the public has already provided to
the government, and, potentially, third-party (or commercially available) data to reduce the
6. 6
nonresponse follow-up (NRFU) workload. The Census Bureau proposes to use data from
internal and external sources, such as the 2010 Census, the United States Postal Service (USPS),
and the Internal Revenue Service (IRS) to identify vacant housing units and those units that do
not meet the Census Bureau’s definition of a housing unit (deletes). The data sources may also
be used to enumerate the population in cases of nonresponse.
During the 2010 Census, the nonresponse follow-up universe included 50 million housing units.
Each of those units received at least one personal visit, resulting in the identification of 31
million occupied and 14 million vacant housing units. Another five million units were deleted
because they did not meet the Census Bureau’s definition of a housing unit3
. Vacant and
deleted units accounted for about 38 percent of the non-responding universe. The use of
administrative data to avoid the expense of conducting a personal visit to an address to
discover vacant units for which no questionnaire was, or could be returned from, is one of the
key cost drivers of the Census. By using administrative data to identify these vacant units, we
can eliminate and reduce the field effort involved.
In terms of our initial research and testing in this area, our primary focus has been on the use of
government provided files (both federal and state level files). We have supplemented our
research and testing with commercial files, as necessary, when they could supplement the
existing federal data. We have conducted two field tests of these new methodologies and will
be beginning our third field test in March of this year. To date, the research has been promising
and has informed changes in the methodologies employed in each round of testing. For
example, as a result of the 2013 Census Test, we identified additional information in the
administrative data available from the Postal Service that informed the identification of vacant
housing units in the 2014 Census Test.
3
The U.S. Census Bureau definition of a housing unit is a house, an apartment, a mobile home, a group of rooms, or a single
room that is occupied (or if vacant, is intended for occupancy) as separate living quarters. Separate living quarters are those in
which the occupants live and eat separately from any other persons in the building and which have direct access from the
outside of the building or through a common hall
7. 7
A second component in our efforts to reduce the cost of our nonresponse follow-up operation
is our reengineering of our field operation. The goal of Reengineering Field Operations is to use
technology to more efficiently and effectively manage the 2020 Census fieldwork. The Census
Bureau plans to develop an operational control system that manages tasks and makes decisions
typically made by humans (e.g., case assignments, contact attempts). Additional modernization
includes a streamlined approach to implementing and managing field operations through a new
field management structure, including the infrastructure, field staff roles, work schedule, and
staffing ratios.
Specifically, we are utilizing big data to help us in the efficient assignment of work. Our
operational control system (case management system) will consider outstanding work,
enumerator home location, and enumerator attributes to tailor a work assignment that is
appropriate for each enumerator on a daily basis. Enumerator assistance from remote
operations centers also reduces the enumerator-supervisor ratio.
We will be using real-time paradata to manage the work and our enumerators by providing
alerts. Using real-time data within our control system will alert supervisory level employees
when specified performance or conduct issues are identified. This could be as simple as the
employee completed cases for the day but did not send us their corresponding payroll and
expense information, or as complex as an employee's device location being nowhere near
where we believe the work to be.
Behind the scenes, a significant component of this effort, and the use of big data, is in the
development of the Constructive Simulation Model. This is the model that is at the core of our
control system and is a three-dimensional model that utilizes 2010 Census data, data from the
American Community Survey4
, and real-time paradata to determine the best time for our
enumerators to contact households. This information is then combined with household data,
4
The American Community Survey or ACS is a nationwide survey that collects and produces information on demographic,
social, economic, and housing characteristics about our nation's population every year. The Census Bureau contacts about 3.5
million households each year to participate in the survey. The ACS replaced the census long form starting with the 2010
Census.
8. 8
enumerator data, and business stopping rules to determine an enumerator’s case assignments
and the route they should follow each day.
Inputs into the model include, GIS inputs, 2010 Census data (Census response data, payroll
data, employee location data), information from the Master Address File, and paradata from
the American Community Survey. The resulting outputs of this model are an assigned daily case
list for enumerators, predicted travel time for all legs within the optimized route, estimated
time to find location for all case contacts and estimated interview time and contact outcome
for all case contacts (in order to set standard for number of cases to provide enumerators per
shift). We have conducted one successful large-scale simulation of this prototype and are
putting it in the field for the first time later next month in the 2015 Census Test. We are very
excited about the promise our new control system holds for the Census and the surveys
conducted by the Census Bureau. The test this year is very important, but it is just the first test
in the field for this prototype system.
I could continue with additional examples of our reengineering efforts and the integration of
big data into the planning of the 2020 Census, but I think I will pause here. As I’ve discussed,
for our 2020 Census we are focusing our research and testing in a number of areas. We believe
that the availability of technology, combined with the data that are now available, will allow for
a large number of enhancements to our processes and to the 2020 Census.
Thank you.