The document discusses analyzing COVID-19 case data using a data science pipeline approach. It outlines collecting data from sources like the COVID Tracking Project and Johns Hopkins University, then cleaning and manipulating the data in Pandas dataframes. It focuses on Maryland county-level case and testing data. The goals are to make predictions about how the crisis may evolve over time to better inform policymakers.
Dr Dev Kambhampati | Cybersecurity Guide for UtilitiesDr Dev Kambhampati
This document provides guidance for small and under-resourced utilities to improve cybersecurity, reliability, and resilience. It finds that existing guidance documents are not always scalable to small utilities due to challenges like limited resources, staff expertise, and information sharing. It recommends tailoring approaches to individual utility contexts by starting simply and growing programs over time. The document also proposes several forms of federal and mutual assistance to support improvement efforts.
The goal of this project was to determine the relationship between privacy risk and data utility when using aggregated mobile data for policy planning and crisis response. The project assessed these factors for transportation planning and pandemic control using simulated mobile call data. Experts in these domains evaluated the utility of various aggregation levels for their work. Re-identification risk was also measured for each data set. Results showed that while aggregation reduced risk, it also reduced utility, and this relationship varied by context and purpose. The project aims to help develop evidence-based standards for using mobile data proportionately based on balancing privacy risk and social benefits. Further research is needed applying this methodology to more scenarios and experts to better understand how data aggregation can enable use of mobile data for public
The document discusses the United States Census Bureau's plans to use big data for the 2020 Census. Specifically:
1) The Census Bureau plans to use governmental and corporate data sources to update addresses and maps, reducing the need for in-person address canvassing. Satellite imagery and change detection techniques will identify areas still needing in-field verification.
2) Administrative records from government agencies will be used to identify vacant housing units and pre-fill census responses to reduce costly in-person follow-ups.
3) A new operational control system using a predictive model and real-time data will efficiently assign and track field workers, improving productivity.
Global Pulse: Mining Indonesian Tweets to Understand Food Price Crises copyUN Global Pulse
Sudden increases in the price of staple foodstuffs like rice can push whole families below the poverty line and cause regional economic instability; these changes can happen rapidly but food price statistics are generally published only monthly or even less frequently.
This project, in collaboration with the Indonesian Ministry of Development Planning, UNICEF and WFP in Indonesia seeks to use social media analysis to provide real-time information from the population that could enable faster responses to food price increases in the form of social protection policies. Global Pulse analysed tweet volumes relevant to food and fuel between March 2011 and April 2013 and found a significant correlation, suggesting that even potential (rather than realised) fuel price rises affect people’s perceptions of food security. Researchers also found a relationship between retrospective official food inflation statistics and the number of tweets referencing food price increases.
http://www.unglobalpulse.org/social-media-social-protection-indonesia
Evaluating the fake news problem at the scale of the information ecosystemGuy Boulianne
This document analyzes Americans' daily media consumption using a unique multimode dataset covering mobile, desktop, and television. It finds:
1) News consumption of any sort comprises at most 14.2% of Americans' daily media diets, which are otherwise dominated by entertainment and non-news content.
2) Americans consume news overwhelmingly from television, which accounts for roughly five times as much news consumption as online sources.
3) Fake news only comprises 0.15% of Americans' total daily media consumption. The prevalence of misinformation may be overstated as a problem, and the causes of public misinformedness likely lie more in the content of ordinary news or news avoidance.
This collaborative research-project between Global Pulse (www.unglobalpulse.org) and SAS (www.sas.com) investigates how social media and online user-generated content can be used to enrich the understanding of the changing job conditions in the US and Ireland by analyzing the moods and topics present in unemployment-related conversations from the open social web and relating them to official unemployment statistics. For more information on this project or the other projects in this series, please visit: http://www.unglobalpulse.org/research.
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
Keeping Governments Accountable with Open Data Science: Extracting and Analyz...odsc
This document summarizes Marc Joffe's presentation on extracting and analyzing data from municipal financial disclosures. It discusses gathering pension data from over 1,400 PDF reports published by CalPERS on city pension plans in California. It describes downloading the PDFs, extracting text data using Python scripts, and loading the extracted data into spreadsheets. It also discusses combining the pension data with revenue data from the State Controller to calculate ratios of pension costs to total revenue for each city.
Dr Dev Kambhampati | Cybersecurity Guide for UtilitiesDr Dev Kambhampati
This document provides guidance for small and under-resourced utilities to improve cybersecurity, reliability, and resilience. It finds that existing guidance documents are not always scalable to small utilities due to challenges like limited resources, staff expertise, and information sharing. It recommends tailoring approaches to individual utility contexts by starting simply and growing programs over time. The document also proposes several forms of federal and mutual assistance to support improvement efforts.
The goal of this project was to determine the relationship between privacy risk and data utility when using aggregated mobile data for policy planning and crisis response. The project assessed these factors for transportation planning and pandemic control using simulated mobile call data. Experts in these domains evaluated the utility of various aggregation levels for their work. Re-identification risk was also measured for each data set. Results showed that while aggregation reduced risk, it also reduced utility, and this relationship varied by context and purpose. The project aims to help develop evidence-based standards for using mobile data proportionately based on balancing privacy risk and social benefits. Further research is needed applying this methodology to more scenarios and experts to better understand how data aggregation can enable use of mobile data for public
The document discusses the United States Census Bureau's plans to use big data for the 2020 Census. Specifically:
1) The Census Bureau plans to use governmental and corporate data sources to update addresses and maps, reducing the need for in-person address canvassing. Satellite imagery and change detection techniques will identify areas still needing in-field verification.
2) Administrative records from government agencies will be used to identify vacant housing units and pre-fill census responses to reduce costly in-person follow-ups.
3) A new operational control system using a predictive model and real-time data will efficiently assign and track field workers, improving productivity.
Global Pulse: Mining Indonesian Tweets to Understand Food Price Crises copyUN Global Pulse
Sudden increases in the price of staple foodstuffs like rice can push whole families below the poverty line and cause regional economic instability; these changes can happen rapidly but food price statistics are generally published only monthly or even less frequently.
This project, in collaboration with the Indonesian Ministry of Development Planning, UNICEF and WFP in Indonesia seeks to use social media analysis to provide real-time information from the population that could enable faster responses to food price increases in the form of social protection policies. Global Pulse analysed tweet volumes relevant to food and fuel between March 2011 and April 2013 and found a significant correlation, suggesting that even potential (rather than realised) fuel price rises affect people’s perceptions of food security. Researchers also found a relationship between retrospective official food inflation statistics and the number of tweets referencing food price increases.
http://www.unglobalpulse.org/social-media-social-protection-indonesia
Evaluating the fake news problem at the scale of the information ecosystemGuy Boulianne
This document analyzes Americans' daily media consumption using a unique multimode dataset covering mobile, desktop, and television. It finds:
1) News consumption of any sort comprises at most 14.2% of Americans' daily media diets, which are otherwise dominated by entertainment and non-news content.
2) Americans consume news overwhelmingly from television, which accounts for roughly five times as much news consumption as online sources.
3) Fake news only comprises 0.15% of Americans' total daily media consumption. The prevalence of misinformation may be overstated as a problem, and the causes of public misinformedness likely lie more in the content of ordinary news or news avoidance.
This collaborative research-project between Global Pulse (www.unglobalpulse.org) and SAS (www.sas.com) investigates how social media and online user-generated content can be used to enrich the understanding of the changing job conditions in the US and Ireland by analyzing the moods and topics present in unemployment-related conversations from the open social web and relating them to official unemployment statistics. For more information on this project or the other projects in this series, please visit: http://www.unglobalpulse.org/research.
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
Keeping Governments Accountable with Open Data Science: Extracting and Analyz...odsc
This document summarizes Marc Joffe's presentation on extracting and analyzing data from municipal financial disclosures. It discusses gathering pension data from over 1,400 PDF reports published by CalPERS on city pension plans in California. It describes downloading the PDFs, extracting text data using Python scripts, and loading the extracted data into spreadsheets. It also discusses combining the pension data with revenue data from the State Controller to calculate ratios of pension costs to total revenue for each city.
DARPA has had some success transitioning technologies since 2010, but inconsistently defines and assesses transitions. GAO's analysis identified four key factors for successful transition: military/commercial demand for the technology, sustained DARPA interest in the research area, technology maturity, and partnerships. However, DARPA prioritizes innovation over transition and provides limited transition training and assessment. GAO recommends DARPA regularly assess transition strategies, refine training, and increase sharing of technical data to improve transition success.
Federal Statistical System, Transparency Camp Westbradstenger
Peter Orszag, the Director of the Office of Management and Budget, cites "evidence-based policy" to support healthcare reform. However, his evidence comes from Dartmouth University rather than the Federal Statistical System overseen by Katherine Wallman. The statistical system faces challenges in meeting transparency goals due to cultural and technical issues. While statistics are underfunded at just $10-25 per taxpayer, they provide crucial information and were important in WWII. Collaboration between journalists, programmers, statisticians, and policymakers could help improve the system.
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic PlanDr Dev Kambhampati
This document presents the Federal Cybersecurity Research and Development Strategic Plan, which was developed in response to a requirement in the Cybersecurity Enhancement Act of 2014. The plan outlines the US government's strategic approach to guide Federal investments in cybersecurity research over the next 5 years, with the goals of deterring cyber attacks, protecting systems and data, detecting threats, and helping systems adapt. It emphasizes critical areas like the scientific foundations of security, risk management, human aspects, workforce development and transitioning research into practice. The plan aims to establish a position of assurance, strength and trust in cyber systems through advances in cybersecurity science and engineering.
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
On May 8th, 2015 UN Global Pulse hosted a workshop on data privacy and security in technology-enabled development projects and programmes, as part of a series of events about the Nine Principles for Digital Development. This report summarizes the presentations and discussions from the workshop. http://unglobalpulse.org/blog/improving-privacy-and-data-security-ict4d-projects
The document provides background on a research project investigating the data breach at the U.S. Office of Personnel Management in 2015. The project aims to interview OPM executives to understand the breach and analyze the relationship between cyber attacks and upgrades to the agency's technology. The researcher plans to enter the OPM for 3 weeks to conduct interviews and examine how often software/hardware patches are implemented each year.
This document discusses information behaviors in the U.S. intelligence community. It notes that intelligence analysts experience information overload due to the vast amounts of data they must process each day from numerous sources. This overload can compromise their efficiency and ability to identify threats in a timely manner. The document also examines issues between different levels of government in the intelligence community, such as a lack of consistent training and information sharing between federal, state, and local agencies. It proposes applying theories of information behavior from library and information science, such as minimizing effort, to help analysts better manage information overload.
This document outlines a presentation on big data for development (BD4D). It discusses the rise of big data and how BD4D techniques like data analytics can be applied. Potential BD4D applications include healthcare, emergency response, and agriculture. Data sources include mobile phones, crowdsourcing, and social media. The presentation also covers BD4D research in Pakistan using mobile data and challenges like data bias, privacy and causation. Open research areas are suggested to further mitigate challenges and advance predictive and multimodal BD4D analytics.
Jason Parker gave a presentation on "Open Data Sources for Grants" to the Tennessee Chapter of the Grant Professionals Association on September 10, 2014. This presentation includes a wide variety of open data resources that grant writers can use to strengthen proposals.
El Estudio Data Journalism in 2017 aborda cómo los periodistas usan los datos para contar historias.
El análisis ofrece una visión general del estado del periodismo de datos en 2017 y destaca los retos clave para que el campo avance.
Algunas conclusiones:
- El 42% de los periodistas emplean los datos para contar historias de manera regular (dos veces o más por semana).
- El 51% de los medios de comunicación en Estados Unidos y Europa tienen en las redacciones al menos un periodista especializadp a los datos (periodista de datos). Este porcentaje se eleva al 60% para los medios digitales.
- El 33% de los periodistas usan datos para historias políticas, seguidos por 28% para finanzas y economía y 25% por historias enmarcadas en el periodismo de investigación.
Twitter Based Election Prediction and AnalysisIRJET Journal
This document discusses using Twitter data to predict election outcomes through sentiment analysis. It begins with an introduction to election prediction methods and why social media data is being explored as an alternative. The paper then reviews related work on using features like user profiles, linguistic content, and sentiment analysis of tweets mentioning candidates. It describes the methodology used, including data collection from Twitter's API, preprocessing tweets, and performing sentiment analysis using both machine learning and lexicon-based approaches. The results section shows the sentiment analysis identified more positive tweets for Clinton and more negative tweets for Trump, suggesting Clinton would win. Emotion analysis found more tweets expressing sadness for Clinton and joy for Trump.
How to handle government related questions.Kyle Guzik
This document contains 12 questions asking the reader to find answers from US government sources on the internet. It provides detailed responses to 6 of the questions, citing the specific government websites used and discussing the information found. The responses indicate information on demographics, Medicare plans, a national portrait gallery, historical newspapers, legislative information, country profiles, occupational outlook, state education data, Native American tribes, internet protection requirements, and how to file a Freedom of Information Act request.
The document summarizes the economic projections of Federal Reserve Board members and Federal Reserve Bank presidents from their June 2021 meeting. It includes median projections for real GDP growth, unemployment, and inflation for 2021-2023 and longer-run. It also shows the central tendency and ranges for each indicator. Projections see robust GDP growth of 7% in 2021 slowing to 3.3% in 2022 and 2.4% in 2023. Unemployment is projected to decline to 4.5% in 2021 and 3.8% in 2022. Inflation is projected to rise to 3.4% in 2021 but slow to 2.1% in 2022 and 2.2% in 2023. Appropriate monetary
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Gabriela Agustini
Our study questions the current news selection criteria, revealing that while editors focus on picking hard news such as politics for the front page, social media users
are rather into soft news such as science and fashion
Arbor Realty's U.S. Economic Overview for 2018 q4 with insights on U.S. employment growth, the consumer price index, average earnings and the homeownership rate.
This document discusses workforce development challenges and opportunities in the post-pandemic world. It provides data on the economic impact of the pandemic, including job losses and gains by industry. While some industries like leisure and hospitality saw major declines, others like transportation and warehousing have seen increased demand. The data also shows shifts in the top in-demand occupations, skills, tools/technologies and certifications needed. Overall, basic and interpersonal skills remain highly important for job seekers, along with emerging demand for skills related to distribution/logistics. The document aims to inform workforce strategies to help workers adapt to changes in the post-pandemic labor market.
Informe de Google Labs y PolizyViz (ENG) para averiguar cómo utilizan los periodistas los datos a la hora de redactar las informaciones.
Es el resultado de realizar 56 entrevistas en profundidad a responsables, expertos en visualización de datos, periodistas de datos y vídeoperiodistas de EEUU, Alemania, Francia y Gran Bretaña. Además, se hizo una encuesta cuantitativa a más de 900 periodistas y editores.
Página web: https://newslab.withgoogle.com/assets/docs/data-journalism-in-2017.pdf
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...rmcnab67
This document provides an economic forecast for 2018. It summarizes recent economic trends and forecasts for the coming year. The national economy is expected to see continued growth in 2018, with real GDP growth of 3.0% predicted. Unemployment is forecast to fall to 3.8% while inflation rises to 2.9%. The Virginia economy is also expected to grow, with real GDP increasing 2.2% in 2018. For Hampton Roads, growth has lagged the national rate in recent years. The forecast predicts stagnant growth will continue in the short-term but prospects are improving for the future.
This document proposes a spatial-temporal data science engine to analyze COVID-19 data. The engine integrates data from various sources, preprocesses the data by classifying it by location, date, health status, and other attributes. It then organizes the data into a spatial-temporal hierarchy from local to global levels. The engine applies analytics like pattern mining to uncover interesting spatial-temporal trends in the data. It is evaluated on real COVID-19 data and finds patterns like most cases being transmitted domestically but not requiring hospitalization in Canada. The engine reveals insights about COVID-19's epidemiological characteristics in different locations over time.
This document discusses forecasting models for dengue outbreaks in San Juan, Puerto Rico. The team analyzed historical dengue case data along with environmental factors like temperature and humidity. They developed autoregressive integrated moving average (ARIMA) models to forecast quarterly dengue cases over a 3-year period. The best model included factors for 6 outbreak periods and an AR(15) term. It produced plausible forecasts with a mean absolute percentage error of 43.42%. Recommendations include targeted insecticide use and increased awareness during peak seasons.
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...Andrew Cannon
A detailed report on the results from GRBN's 24 country global survey on the issue of Trust & Personal Data. The report dives into how the level of familiarity with the issue as well as the level of concern about the abuse of personal data varies across the globe. The report compares how trustworthy people consider different types of both public and private organisations to be, and looks at how sensitive people consider different types of personal data to be.
ESOMAR Telephone and Internet Coverage around the World 2016T.S. Lim
This document provides an overview of telephone and internet coverage around the world based on data from various sources. It finds that:
- For the European Union and United States, coverage data comes from robust surveys like Eurobarometer and US government sources, providing reliable estimates of penetration levels.
- For some other countries, estimates come from the Google Connected Consumer Survey, but these data have limitations as they relate to indeterminate populations and mobile phones only.
- Overall, telephone coverage looks adequate in most countries, while internet coverage is approaching universal levels in many markets but still faces issues that require more work.
- The document aims to help researchers design better international studies by understanding coverage levels and the optimal research
This document summarizes the key findings of the 2013 Reuters Institute Digital News Report, which tracked news consumption across multiple countries. Some of the main findings include:
1) Tablet and mobile usage for accessing news has grown substantially since the previous year, with tablet usage doubling in many countries and over 40% of users in some countries accessing news on smartphones weekly.
2) Approximately one-third of users now get news from at least two different devices, indicating a trend toward multi-platform news consumption.
3) However, the pace of this digital transition varies significantly between countries, with Germany and France still showing stronger allegiance to traditional media platforms than countries like the US and Japan.
DARPA has had some success transitioning technologies since 2010, but inconsistently defines and assesses transitions. GAO's analysis identified four key factors for successful transition: military/commercial demand for the technology, sustained DARPA interest in the research area, technology maturity, and partnerships. However, DARPA prioritizes innovation over transition and provides limited transition training and assessment. GAO recommends DARPA regularly assess transition strategies, refine training, and increase sharing of technical data to improve transition success.
Federal Statistical System, Transparency Camp Westbradstenger
Peter Orszag, the Director of the Office of Management and Budget, cites "evidence-based policy" to support healthcare reform. However, his evidence comes from Dartmouth University rather than the Federal Statistical System overseen by Katherine Wallman. The statistical system faces challenges in meeting transparency goals due to cultural and technical issues. While statistics are underfunded at just $10-25 per taxpayer, they provide crucial information and were important in WWII. Collaboration between journalists, programmers, statisticians, and policymakers could help improve the system.
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic PlanDr Dev Kambhampati
This document presents the Federal Cybersecurity Research and Development Strategic Plan, which was developed in response to a requirement in the Cybersecurity Enhancement Act of 2014. The plan outlines the US government's strategic approach to guide Federal investments in cybersecurity research over the next 5 years, with the goals of deterring cyber attacks, protecting systems and data, detecting threats, and helping systems adapt. It emphasizes critical areas like the scientific foundations of security, risk management, human aspects, workforce development and transitioning research into practice. The plan aims to establish a position of assurance, strength and trust in cyber systems through advances in cybersecurity science and engineering.
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
On May 8th, 2015 UN Global Pulse hosted a workshop on data privacy and security in technology-enabled development projects and programmes, as part of a series of events about the Nine Principles for Digital Development. This report summarizes the presentations and discussions from the workshop. http://unglobalpulse.org/blog/improving-privacy-and-data-security-ict4d-projects
The document provides background on a research project investigating the data breach at the U.S. Office of Personnel Management in 2015. The project aims to interview OPM executives to understand the breach and analyze the relationship between cyber attacks and upgrades to the agency's technology. The researcher plans to enter the OPM for 3 weeks to conduct interviews and examine how often software/hardware patches are implemented each year.
This document discusses information behaviors in the U.S. intelligence community. It notes that intelligence analysts experience information overload due to the vast amounts of data they must process each day from numerous sources. This overload can compromise their efficiency and ability to identify threats in a timely manner. The document also examines issues between different levels of government in the intelligence community, such as a lack of consistent training and information sharing between federal, state, and local agencies. It proposes applying theories of information behavior from library and information science, such as minimizing effort, to help analysts better manage information overload.
This document outlines a presentation on big data for development (BD4D). It discusses the rise of big data and how BD4D techniques like data analytics can be applied. Potential BD4D applications include healthcare, emergency response, and agriculture. Data sources include mobile phones, crowdsourcing, and social media. The presentation also covers BD4D research in Pakistan using mobile data and challenges like data bias, privacy and causation. Open research areas are suggested to further mitigate challenges and advance predictive and multimodal BD4D analytics.
Jason Parker gave a presentation on "Open Data Sources for Grants" to the Tennessee Chapter of the Grant Professionals Association on September 10, 2014. This presentation includes a wide variety of open data resources that grant writers can use to strengthen proposals.
El Estudio Data Journalism in 2017 aborda cómo los periodistas usan los datos para contar historias.
El análisis ofrece una visión general del estado del periodismo de datos en 2017 y destaca los retos clave para que el campo avance.
Algunas conclusiones:
- El 42% de los periodistas emplean los datos para contar historias de manera regular (dos veces o más por semana).
- El 51% de los medios de comunicación en Estados Unidos y Europa tienen en las redacciones al menos un periodista especializadp a los datos (periodista de datos). Este porcentaje se eleva al 60% para los medios digitales.
- El 33% de los periodistas usan datos para historias políticas, seguidos por 28% para finanzas y economía y 25% por historias enmarcadas en el periodismo de investigación.
Twitter Based Election Prediction and AnalysisIRJET Journal
This document discusses using Twitter data to predict election outcomes through sentiment analysis. It begins with an introduction to election prediction methods and why social media data is being explored as an alternative. The paper then reviews related work on using features like user profiles, linguistic content, and sentiment analysis of tweets mentioning candidates. It describes the methodology used, including data collection from Twitter's API, preprocessing tweets, and performing sentiment analysis using both machine learning and lexicon-based approaches. The results section shows the sentiment analysis identified more positive tweets for Clinton and more negative tweets for Trump, suggesting Clinton would win. Emotion analysis found more tweets expressing sadness for Clinton and joy for Trump.
How to handle government related questions.Kyle Guzik
This document contains 12 questions asking the reader to find answers from US government sources on the internet. It provides detailed responses to 6 of the questions, citing the specific government websites used and discussing the information found. The responses indicate information on demographics, Medicare plans, a national portrait gallery, historical newspapers, legislative information, country profiles, occupational outlook, state education data, Native American tribes, internet protection requirements, and how to file a Freedom of Information Act request.
The document summarizes the economic projections of Federal Reserve Board members and Federal Reserve Bank presidents from their June 2021 meeting. It includes median projections for real GDP growth, unemployment, and inflation for 2021-2023 and longer-run. It also shows the central tendency and ranges for each indicator. Projections see robust GDP growth of 7% in 2021 slowing to 3.3% in 2022 and 2.4% in 2023. Unemployment is projected to decline to 4.5% in 2021 and 3.8% in 2022. Inflation is projected to rise to 3.4% in 2021 but slow to 2.1% in 2022 and 2.2% in 2023. Appropriate monetary
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Gabriela Agustini
Our study questions the current news selection criteria, revealing that while editors focus on picking hard news such as politics for the front page, social media users
are rather into soft news such as science and fashion
Arbor Realty's U.S. Economic Overview for 2018 q4 with insights on U.S. employment growth, the consumer price index, average earnings and the homeownership rate.
This document discusses workforce development challenges and opportunities in the post-pandemic world. It provides data on the economic impact of the pandemic, including job losses and gains by industry. While some industries like leisure and hospitality saw major declines, others like transportation and warehousing have seen increased demand. The data also shows shifts in the top in-demand occupations, skills, tools/technologies and certifications needed. Overall, basic and interpersonal skills remain highly important for job seekers, along with emerging demand for skills related to distribution/logistics. The document aims to inform workforce strategies to help workers adapt to changes in the post-pandemic labor market.
Informe de Google Labs y PolizyViz (ENG) para averiguar cómo utilizan los periodistas los datos a la hora de redactar las informaciones.
Es el resultado de realizar 56 entrevistas en profundidad a responsables, expertos en visualización de datos, periodistas de datos y vídeoperiodistas de EEUU, Alemania, Francia y Gran Bretaña. Además, se hizo una encuesta cuantitativa a más de 900 periodistas y editores.
Página web: https://newslab.withgoogle.com/assets/docs/data-journalism-in-2017.pdf
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...rmcnab67
This document provides an economic forecast for 2018. It summarizes recent economic trends and forecasts for the coming year. The national economy is expected to see continued growth in 2018, with real GDP growth of 3.0% predicted. Unemployment is forecast to fall to 3.8% while inflation rises to 2.9%. The Virginia economy is also expected to grow, with real GDP increasing 2.2% in 2018. For Hampton Roads, growth has lagged the national rate in recent years. The forecast predicts stagnant growth will continue in the short-term but prospects are improving for the future.
This document proposes a spatial-temporal data science engine to analyze COVID-19 data. The engine integrates data from various sources, preprocesses the data by classifying it by location, date, health status, and other attributes. It then organizes the data into a spatial-temporal hierarchy from local to global levels. The engine applies analytics like pattern mining to uncover interesting spatial-temporal trends in the data. It is evaluated on real COVID-19 data and finds patterns like most cases being transmitted domestically but not requiring hospitalization in Canada. The engine reveals insights about COVID-19's epidemiological characteristics in different locations over time.
This document discusses forecasting models for dengue outbreaks in San Juan, Puerto Rico. The team analyzed historical dengue case data along with environmental factors like temperature and humidity. They developed autoregressive integrated moving average (ARIMA) models to forecast quarterly dengue cases over a 3-year period. The best model included factors for 6 outbreak periods and an AR(15) term. It produced plausible forecasts with a mean absolute percentage error of 43.42%. Recommendations include targeted insecticide use and increased awareness during peak seasons.
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...Andrew Cannon
A detailed report on the results from GRBN's 24 country global survey on the issue of Trust & Personal Data. The report dives into how the level of familiarity with the issue as well as the level of concern about the abuse of personal data varies across the globe. The report compares how trustworthy people consider different types of both public and private organisations to be, and looks at how sensitive people consider different types of personal data to be.
ESOMAR Telephone and Internet Coverage around the World 2016T.S. Lim
This document provides an overview of telephone and internet coverage around the world based on data from various sources. It finds that:
- For the European Union and United States, coverage data comes from robust surveys like Eurobarometer and US government sources, providing reliable estimates of penetration levels.
- For some other countries, estimates come from the Google Connected Consumer Survey, but these data have limitations as they relate to indeterminate populations and mobile phones only.
- Overall, telephone coverage looks adequate in most countries, while internet coverage is approaching universal levels in many markets but still faces issues that require more work.
- The document aims to help researchers design better international studies by understanding coverage levels and the optimal research
This document summarizes the key findings of the 2013 Reuters Institute Digital News Report, which tracked news consumption across multiple countries. Some of the main findings include:
1) Tablet and mobile usage for accessing news has grown substantially since the previous year, with tablet usage doubling in many countries and over 40% of users in some countries accessing news on smartphones weekly.
2) Approximately one-third of users now get news from at least two different devices, indicating a trend toward multi-platform news consumption.
3) However, the pace of this digital transition varies significantly between countries, with Germany and France still showing stronger allegiance to traditional media platforms than countries like the US and Japan.
This document summarizes the key findings of the 2013 Reuters Institute Digital News Report. Some of the main findings include:
- Tablet and mobile usage for accessing news has grown substantially since the previous year, with tablet usage doubling in many countries.
- One-third of respondents now get news from at least two devices, indicating a trend toward multi-platform news consumption.
- However, the pace of change varies between countries, with Germany and France still showing stronger allegiance to traditional media platforms.
- Traditional news brands continue to attract large online audiences in many countries, though "pure players" have more success in places like the US and Japan.
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...Andrew Cannon
The report deep dives into the results from GRBN's 24 country survey on Trust & Personal Data, detailing the findings by region (Americas, APAC and Europe) and country
COVID-19 data configuration and statistical analysisAnshJAIN50
This document analyzes factors that influence the spread of COVID-19 in different countries. It compares the total cases and deaths in 3 more economically developed countries (MEDCs) - Italy, Japan, China - and 3 less economically developed countries (LEDCs) - Israel, India, Indonesia. By modeling the data with different functions, the author finds that a Verhulst function best represents the data, predicting a plateau. Analysis of total tests and positive cases shows Japan having the highest continued spread. Considering population density helps explain differences in case numbers between countries like Italy and China. The document evaluates how factors like population density, healthcare quality, and testing rates impact the spread of COVID-19 in different nations.
Proposed requirements of the management dashboard at the national levelarmabadi
This document proposes requirements for a national-level Covid-19 management dashboard. It recommends a strategic dashboard containing high-level metrics and a tactical dashboard providing more up-to-date information. The suggested dashboard would include sections on the global Covid-19 situation and forecasts using statistical data and categorized news, as well as the internal situation and forecasts covering healthcare system status, the epidemic situation and forecasts, feedback on control measures, and the national macroeconomic outlook. Key performance indicators are outlined for each section. The goal is to help decision-makers track the epidemic and make data-driven decisions.
An analysis of consumer interest level data for online health information in ...IJMIT JOURNAL
This paper takes a unique look at the online consumer interest in health information and products during a unique moment - the COVID-19/coronavirus pandemic of 2020. The present research examines Glimpse’s data on consumer interest levels in a wide variety of health topics, products, and services during March 2020, the month when the COVID-19 outbreak brought much of American life and the U.S. economy to a standstill. Through an analysis of the trend data on the health information sought out online during this critical period, the author provides insights into the consumer behavior demonstrated by Americans in light of the pandemic.
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...IJMIT JOURNAL
This paper takes a unique look at the online consumer interest in health information and products during a unique moment - the COVID-19/coronavirus pandemic of 2020. The present research examines Glimpse’s data on consumer interest levels in a wide variety of health topics, products, and services during March 2020, the month when the COVID-19 outbreak brought much of American life and the U.S. economy to a standstill. Through an analysis of the trend data on the health information sought out online during this critical period, the author provides insights into the consumer behavior demonstrated by Americans in light of the pandemic.
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...Gregoire Burel
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fact-checks”. Using more than 3 years of data collected from Twitter and fact-checking organizations and a combination of spread variance analysis, impulse response modeling, and causal analysis, we will highlight the weak causal relationships between the spread of misinformation and fact-checks and discuss what topics are the less likely to be affected by fact-checks. We will also show how the proposed observatory can be used for tracking demographics, fact-checks, and topics over time.
EUBrasilCloudFORUM actively participated at Beyon2020 event. Professor Sergio Takeo Kofuji from the University of São Paulo (USP) discussed the importance of Open Data for cities.
The Beyond 2020 event took place at "Centro de Convenções de Pernambuco" from the 27th to the 29th of July, focusing on how urban innovation ecosystems can support citizens and what the future will look like for Cities, Politics, Citizens, Local Development, and Tourism, based on the use of Open data Platforms.
A look at two different Datasets (infection data & mobility data to make some predictions about Corona Virus. The main takeaways:
1. Without a vaccine Corona is here to stay for 18 months till herd immunity. We need to have cyclical lockdowns of 2 weeks lockdown 6 weeks opening.
2. The structure of a city dictates whether a lockdown works or not. Rural and Nature heavy cities like Utah can't follow the same strategy like NY or Manhattan.
The document summarizes research on COVID-19 cases, testing, and mortality. It notes that while the total number of cases is unknown, confirmed cases provide a minimum count. Testing is important to understand the outbreak but many countries lack testing capacity. The number of confirmed deaths is known but the final mortality rate requires knowing all case outcomes. South Korea has performed the most tests per capita while many countries, including the US, have low testing.
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
In our modern world more and more data are generated on the web and produced by sensors in the ever growing number of electronic devices surrounding us. The amount of data and the frequency at which they are produced have led to the concept of 'Big data'. Big data is characterized as data sets of increasing volume, velocity and variety; the 3 V's. Big data is often largely unstructured, meaning that it has no pre-defined data model and/or does not fit well into conventional relational databases.
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...ijcisjournal
While the COVID-19 pandemic continues to be as complex as ever, the collection and exchange of data in the light of fighting coronavirus poses a major challenge for privacy systems around the globe. The disease’s size and magnitude are not uncommon but it appears to be at the point of hysteria surrounding it. Consequently, in a very short time, extreme measures for dealing with the situation appear to have become
the norm. Any such actions affect the privacy of individuals in particular. In some cases, there is intensive monitoring of the whole population while the medical data of those diagnosed with the virus is commonly circulated through institutions and nations. This may well be in the interest of saving the world from a deadly disease, but is it appropriate and right? Although creative solutions have been implemented in many countries to address the issue, proponents of privacy are concerned that technologies will eventually erode privacy, while regulators and privacy supporters are worried about what kind of impact this could bring. While that tension has always been present, privacy has been thrown into sharp relief by the sheer urgency
of containing an exponentially spreading virus. The essence of this dilemma indicates that establishing the right equilibrium will be the best solution. The jurisprudence concerning cases regarding the willingness of public officials to interfere with the constitutional right to privacy in the interests of national security or public health has repeatedly proven that a reasonable balance can be reached.
This document discusses using fuzzy clustering techniques with big data to diagnose diseases, specifically focusing on diabetes. It first provides background on big data in healthcare and challenges in managing large, diverse clinical datasets. It then discusses fuzzy logic and how it can help handle uncertainty in clinical data. The proposed approach uses fuzzy subtractive clustering on a clinical diabetes database to create a compact fuzzy model and increase prediction accuracy for diagnosing diabetes. The outcomes indicate this integrated method can effectively diagnose diabetes from clinical big data.
The Enchantment and Shadows_ Unveiling the Mysteries of Magic and Black Magic...Phoenix O
This manual will guide you through basic skills and tasks to help you get started with various aspects of Magic. Each section is designed to be easy to follow, with step-by-step instructions.
A375 Example Taste the taste of the Lord, the taste of the Lord The taste of...franktsao4
It seems that current missionary work requires spending a lot of money, preparing a lot of materials, and traveling to far away places, so that it feels like missionary work. But what was the result they brought back? It's just a lot of photos of activities, fun eating, drinking and some playing games. And then we have to do the same thing next year, never ending. The church once mentioned that a certain missionary would go to the field where she used to work before the end of his life. It seemed that if she had not gone, no one would be willing to go. The reason why these missionary work is so difficult is that no one obeys God’s words, and the Bible is not the main content during missionary work, because in the eyes of those who do not obey God’s words, the Bible is just words and cannot be connected with life, so Reading out God's words is boring because it doesn't have any life experience, so it cannot be connected with human life. I will give a few examples in the hope that this situation can be changed. A375
The forces involved in this witchcraft spell will re-establish the loving bond between you and help to build a strong, loving relationship from which to start anew. Despite any previous hardships or problems, the spell work will re-establish the strong bonds of friendship and love upon which the marriage and relationship originated. Have faith, these stop divorce and stop separation spells are extremely powerful and will reconnect you and your partner in a strong and harmonious relationship.
My ritual will not only stop separation and divorce, but rebuild a strong bond between you and your partner that is based on truth, honesty, and unconditional love. For an even stronger effect, you may want to consider using the Eternal Love Bond spell to ensure your relationship and love will last through all tests of time. If you have not yet determined if your partner is considering separation or divorce, but are aware of rifts in the relationship, try the Love Spells to remove problems in a relationship or marriage. Keep in mind that all my love spells are 100% customized and that you'll only need 1 spell to address all problems/wishes.
Save your marriage from divorce & make your relationship stronger using anti divorce spells to make him or her fall back in love with you. End your marriage if you are no longer in love with your husband or wife. Permanently end your marriage using divorce spells that work fast. Protect your marriage from divorce using love spells to boost commitment, love & bind your hearts together for a stronger marriage that will last. Get your ex lover who has remarried using divorce spells to break up a couple & make your ex lost lover come back to you permanently.
Visit https://www.profbalaj.com/love-spells-loves-spells-that-work/
Call/WhatsApp +27836633417 for more info.
The Hope of Salvation - Jude 1:24-25 - MessageCole Hartman
Jude gives us hope at the end of a dark letter. In a dark world like today, we need the light of Christ to shine brighter and brighter. Jude shows us where to fix our focus so we can be filled with God's goodness and glory. Join us to explore this incredible passage.
A Free eBook ~ Valuable LIFE Lessons to Learn ( 5 Sets of Presentations)...OH TEIK BIN
A free eBook comprising 5 sets of PowerPoint presentations of meaningful stories /Inspirational pieces that teach important Dhamma/Life lessons. For reflection and practice to develop the mind to grow in love, compassion and wisdom. The texts are in English and Chinese.
My other free eBooks can be obtained from the following Links:
https://www.slideshare.net/ohteikbin/presentations
https://www.slideshare.net/ohteikbin/documents
Sanatan Vastu | Experience Great Living | Vastu ExpertSanatan Vastu
Santan Vastu Provides Vedic astrology courses & Vastu remedies, If you are searching Vastu for home, Vastu for kitchen, Vastu for house, Vastu for Office & Factory. Best Vastu in Bahadurgarh. Best Vastu in Delhi NCR
The Book of Ruth is included in the third division, or the Writings, of the Hebrew Bible. In most Christian canons it is treated as one of the historical books and placed between Judges and 1 Samuel.
Why is this So? ~ Do Seek to KNOW (English & Chinese).pptxOH TEIK BIN
A PowerPoint Presentation based on the Dhamma teaching of Kamma-Vipaka (Intentional Actions-Ripening Effects).
A Presentation for developing morality, concentration and wisdom and to spur us to practice the Dhamma diligently.
The texts are in English and Chinese.
1. coronavirus-case-tracking
May 31, 2020
1 The Data Science Pipeline: COVID-19 Case Tracking
Philip Tian, Jonathan Lin, David Ahmed
1.1 Introduction
We will introduce the data science pipeline by analyzing data from the (currently ongoing) COVID-
19 pandemic in the United States. COVID-19, also known as the coronavirus, or more specifically
the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a disease that first appeared
in the continental United States in early February and March. A comprehensive list of known
and documented symptoms can be found here. As of May 15, 2020, there are around 4.5 million
individuals infected with the virus, which is why many experts in the media and academia are
calling this crisis a pandemic.
The COVID-19 crisis has caused many temporary economic and social changes. For example, most
states in the United States have issued stay-at-home orders, where guidelines are issued on when
and why one can leave home for certain reasons. These issues mean that normal work activities have
stopped and countries have been struggling with issues like sudden drops in economic productivity
(documented in news stories like this one or this one). These economic issues directly caused by
the virus have big impacts on the well-being of people all across the world, as unemployment grows
in several countries worldwide, which is a direct consequence of the effects that the pandemic and
governmental policies like stay-at-home and the closure of “non-essential businesses” have on the
economy and people’s income.
Given this balance between maintaining local or national economic health and preventing the
spread of a dangerous disease, policymakers are faced with certain questions about the current
state of affairs. - How long should we maintain a stay-at-home order nationally/locally? - Are
other states/countries handling the situation “better”? What are they doing differently? - What
does “better” even mean? Can we quantify these things when we are making policy decisions? -
How will cases grow from today? Can our current infrastructure (hospitals) handle this growth?
Faced with a variety of policy decisions that might literally cost lives, policy makers are walking on
tightropes, and in this day and age of information it is especially important that the decision which
is finally made is close to optimal. But how does one know whether or not a decision is optimal?
Such a question can be answered using the techniques of data science.
1.1.1 The Importance of Data Science During the COVID-19 Crisis
As the total number of COVID-19 cases increases in the USA and worldwide, it seems almost too
easy for the mainstream media to capitalize on all the news cases to churn out new news stories. An
1
2. increase in cases (perhaps coupled with some official or expert statement) is a news story waiting
to happen. For example, here, here, and here are all news stories which were directly obtained by
internet searching the phrase “increase in cases”. With this influx of information, there are certain
questions we should ask. - Can we trust that the data that is being reported is accurate and
correct? - What exactly does 1000 (or 2000, or 500) cases daily mean in the context of a certain
state/country/county? - To what extent can we predict how the crisis (especially in terms of the
number of cases) evolves over time? - Is it possible to make policy or economic decisions based on
such analysis, and should we make these decisions?
It is precisely the field of data science and its techniques and methods which allow us to conduct an
accurate analysis of the data in order to produce more accurate answers to the questions above. By
extracting, manipulating, and analyzing sets of data, we ensure that policymakers are much more
informed about the status of the crisis locally and nationally much better than just day by day
information and playing it by ear, so to speak. Gathering such information is critical, especially in
our time when reliable information seems hard to find.
In order to even start considering questions like those posed above, we must consider many of
the aspects of the data science pipeline, which we have listed below. 1. Data Collection: data
which is relevant to the project is found and collected. 2. Data Manipulation and Cleaning:
Data is manipulated into a format which is suitable for analysis. 3. Initial Data Visualization
(or exploratory data analysis): numerical data can be graphed to spot trends. 4. Modeling and
Prediction: One can use various methods to do predictions on future data or general population.
In our project, we will attempt to illustrate the data science pipeline with respect to the COVID-19
crisis. We will illustrate (with working code) the basic aspects of this pipeline as described above,
starting from the beginning. First we will discuss the data and where we got it from, as well as
the background and assessment of accuracy. Then, we will manipulate and modify the data for our
purposes. Finally, we will use the data we obtained to make future predictions.
1.2 Data Collection
For this project, we would like to collect data about total cases in the United States more closely
restricted to the local level, such as states and counties. We’ve obtained data from the following
sources, described below.
1.2.1 The COVID Tracking Project
The COVID Tracking Project is a collective volunteer effort to document data about the crisis day
by day. Their data collection claims to be comprehensive, with data from every state and most US
Territories. Their data, as one of only comprehensive sources of data on cases in the US, is being
used by many news outlets and academic experts.
We will be using their data on the day by day number of cases in states and in the US. In
the code below, this data is encompassed in the csv files Data/CTPStates-historical.csv,
Data/CTPTestingMaryland and Data/CTPUS-historical, the first containing the data of the
states, the second the number of tests given out in Maryland, and the last containing overall
US data. These spreadsheets give us the data we need for cases over time. The data is updated
daily and can be found here in this spreadsheet.
2
3. 1.2.2 COVID-19 Data from JHU
Johns Hopkins University has a nice GUI which displays the current case data from around the
world. It can be found here. It is a very visually striking application and readers are encouraged to
go and play around with it. But where does this application pull data from? It pulls the data from a
github repository which is maintained by JHU for the purpose of keeping the application up to date.
The data itself is more specific than that of the COVID Tracking project, in that it tracks cases and
deaths by county, which is much more specific than state-wide. For example, in Maryland, one might
notice that the majority of cases arise in Montgomery County and Prince George’s County, with
less overall cases in neighboring counties (though Baltimore County does not trail too far behind).
The data we’ve obtained from JHU is packaged into the files Data/CasesDeathsCounty.csv and
Data/CountyConfirmedCases.csv.
1.2.3 Date of Stay-at-Home Information
We can find the date that each state established a stay-at-home order through this CNN Article,
which gives detailed information on when all the US States established this order.
Limitations of the data gathered Though as clearly highlighted above, there is lots of data
regarding overall cases both nationally and locally regarding the total cases over time of COVID-19,
one must at least be aware of the limitations that such convenient data offers us.
The primary concern when working with this data is accuracy. If a policymaker were to consider
using such data and its trends to make important decisions relevant to the crisis, then the first
question they ask should be - Is this data reliable? Can I trust that the data values are accurately
measured? This concern about accuracy is very important when considering stay-at-home orders,
shutdowns, and other impactful decisions.
The main limitation with large datasets which measure data from lots of locations, such as the data
from the COVID Tracking project and the JHU dataset is that you need lots and lots of sources in
order to measure accuracy of the data. With COVID, it is at least one source for each state and
territory. For JHU, it is at least one source for each county. This is seemingly even worse, as a lot
more can go wrong.
Concerns about the COVID Tracker Project Data The COVID-19 Tracker Project Dataset
pulls data from the relevant state and territory government health services. This means that the
reliability of this data starts and ends with the accuracy of state government reporting. It is a
difficult job to figure out how each state is measuring its data and whether or not it is accurate.
Concerns about the JHU Dataset As of May 16, the github page for the JHU dataset has
over 1300 reported issues. Some of these issues are complete non-issues. But some of these issues
report supposedly serious problems with the data, for example this one which is claiming that the
applet is not reporting the case number for the country of Nepal correctly. A lot of the issues with
the data are encoding related, having to do with state or county codes. Hopefully these will not be
too much of an issue with respect to this project.
For the purposes of our project, we need not really concern ourselves with the reliability of the data
collection. It is not our job, and besides, we don’t have the power to confirm to ourselves. However,
3
4. if we were in a position of more power, with the influence to affect policy decisions, then weighing
these issues is something that we absolutely have to do. Hopefully we have made the point that
in many cases the hardest part of the data science pipeline is verifying that the data
you get to be analyzed is completely accurate to the fullest extent possible. But for the
purposes of continuing our project we will assume that it is accurate.
2 Setup
Here we set up all the libraries we will be using. We list them below: - pandas will be used for data
manipulation. We will use it to format our raw data into tables. - plotnine is a Python library
similar to ggplot2 in R. We will use it to graph our data. - numpy is a scientific computing library.
- statsmodels is a statistical library that we will use for our data analysis.
[29]: import pandas as pd
from plotnine import *
import numpy as np
import statsmodels.formula.api as sm
import warnings
warnings.filterwarnings("ignore")
2.1 Data Manipulation
Here, this code manipulates the data into a format amenable to analysis. First we extract the infor-
mation from the .csv files into pandas dataframes using the read_csv command. Using some filters
we filter the relevant information into two data tables, maryland_deaths and maryland_confirmed.
[30]: counties = set(["Allegany", "Anne Arundel", "Baltimore", "Calvert",
"Caroline", "Carroll", "Cecil", "Charles", "Dorchester",
"Frederick", "Garrett", "Harford", "Howard", "Kent",
"Montgomery", "Prince George's", "Queen Anne's",
"St. Mary's", "Somerset", "Talbot", "Washington",
"Wicomico", "Worcester", "Baltimore City"])
county_deaths = pd.read_csv("Data/CasesDeathsCounty.csv");
county_confirmed = pd.read_csv("Data/CountyConfirmedCases.csv")
testing = pd.read_csv("Data/CTPTestingMaryland.csv")
testing['date'] = pd.to_datetime(testing['date'],format = '%Y-%m-%d')
maryland_deaths = county_deaths.loc[county_deaths["state"] == "Maryland"].
→loc[county_deaths["location_name"].isin(counties)].loc[county_deaths["date"]
.between(("04/1/2020"), ("05/12/2018"))].
→reset_index(drop=True)
maryland_confirmed = county_confirmed.loc[county_confirmed["county_name"].
→isin(counties)].query('state == "Maryland"').reset_index(drop=True)
4
6. 8 9.73 0.00
9 0.00 0.00
[31]: maryland_confirmed.head(10)
[31]: last_update state county_name county_name_long
0 2020-05-12 21:32:28 Maryland Allegany Allegany, Maryland, US
1 2020-05-12 21:32:28 Maryland Anne Arundel Anne Arundel, Maryland, US
2 2020-05-12 21:32:28 Maryland Baltimore Baltimore, Maryland, US
3 2020-05-12 21:32:28 Maryland Calvert Calvert, Maryland, US
4 2020-05-12 21:32:28 Maryland Caroline Caroline, Maryland, US
5 2020-05-12 21:32:28 Maryland Carroll Carroll, Maryland, US
6 2020-05-12 21:32:28 Maryland Cecil Cecil, Maryland, US
7 2020-05-12 21:32:28 Maryland Charles Charles, Maryland, US
8 2020-05-12 21:32:28 Maryland Dorchester Dorchester, Maryland, US
9 2020-05-12 21:32:28 Maryland Frederick Frederick, Maryland, US
fips_code lat lon NCHS_urbanization total_population
0 24001.0 39.623576 -78.692805 Small metro 71977.0
1 24003.0 39.006702 -76.603293 Large fringe metro 567696.0
2 24005.0 39.457847 -76.629120 Large fringe metro 827625.0
3 24009.0 38.539616 -76.568206 Large fringe metro 91082.0
4 24011.0 38.871723 -75.829042 Non-core 32875.0
5 24013.0 39.564536 -77.023737 Large fringe metro 167522.0
6 24015.0 39.566477 -75.946274 Large fringe metro 102517.0
7 24017.0 38.510923 -76.985807 Large fringe metro 157671.0
8 24019.0 38.454135 -76.027524 Micropolitan 32261.0
9 24021.0 39.472966 -77.399994 Large fringe metro 248472.0
confirmed confirmed_per_100000 deaths deaths_per_100000
0 148 205.62 13 18.06
1 2520 443.90 127 22.37
2 4051 489.47 204 24.65
3 211 231.66 13 14.27
4 174 529.28 0 0.00
5 589 351.60 60 35.82
6 270 263.37 15 14.63
7 761 482.65 55 34.88
8 102 316.17 2 6.20
9 1282 515.95 77 30.99
[32]: maryland_overall = pd.read_csv("Data/CTPStates-historical.csv").query('state ==␣
→"MD"').reset_index()
maryland_overall['date'] = pd.to_datetime(maryland_overall['date'],format =␣
→'%Y%m%d')
maryland_overall = maryland_overall.sort_values('date')
6
7. maryland_overall = maryland_overall.merge(testing, left_on='date',␣
→right_on='date', how='inner')
maryland_overall['positiveRatio'] = maryland_overall['positive'] /␣
→maryland_overall['cumulative_total_people_tested']
maryland_overall =␣
→maryland_overall[['date','positive','positiveIncrease','positiveRatio','cumulative_total_peo
maryland_overall.head(10)
[32]: date positive positiveIncrease positiveRatio
0 2020-03-05 0.0 NaN 0.000000
1 2020-03-06 3.0 3.0 0.103448
2 2020-03-07 3.0 0.0 0.068182
3 2020-03-08 3.0 0.0 0.054545
4 2020-03-09 5.0 2.0 0.064103
5 2020-03-10 6.0 1.0 0.063158
6 2020-03-11 9.0 3.0 0.087379
7 2020-03-12 12.0 3.0 0.113208
8 2020-03-13 17.0 5.0 0.153153
9 2020-03-14 26.0 9.0 0.216667
cumulative_total_people_tested
0 17
1 29
2 44
3 55
4 78
5 95
6 103
7 106
8 111
9 120
2.2 Exploratory Data Analysis
The simplest way we can view the data is by looking at the number of positive tested cases over
time. This will give us a basic understanding about the spread of COVID-19.
[33]: (ggplot(maryland_overall,aes(y='positive',x='date'))
+ geom_point() + ggtitle("Number of Confirmed Cases in MD")
+ xlab("Date") + ylab("Cases"))
7
8. [33]: <ggplot: (146476666316)>
We can see that the number of cases is rapidly increasing, and it seems like the rate is stabilizing
into a more linear shape as time goes on.
Another interesting thing we can look at is by graphing the positive increase per day, essentially
a graph of derivatives at each day. This will give us an idea of how the rate of transmission is
changing.
[34]: (ggplot(maryland_overall,aes(y='positiveIncrease',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Increase in confirmed cases in MD"))
8
9. [34]: <ggplot: (146483016147)>
We see that the rate of transmission seems to be still on the rise, despite quarantine and stay-at-
home orders.
We can also view the spread of COVID-19 through the ratio of tests that return positive for the virus
against total number of tests given. It should also be noted that this data is not truly indicative
of the true rate of transmission of the virus, as the we imagine only people who are exhibiting
symptoms will go out to be tested. Nevertheless, this graph will show give us some interesting
results.
[35]: (ggplot(maryland_overall,aes(y='positiveRatio',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Ratio of positive cases in MD"))
9
10. [35]: <ggplot: (146482781353)>
Here, the ratio of cases has a sudden discontinuity between the dates of 3/26 and 3/27. The reason
for this is that within the data, prior to 3/27, only positive tests were tracked. This means that
the ratio is extremely skewed towards positive number of cases, and creates an improbable curve
within the graph. In this case, to see the change in overall ratio, we will limit the graph to after
3/27. The COVID Tracking project corroborates this discontinuity here where under the MD row
in “States” they record that Maryland did not report negative cases between 3/12 and 3/28 (which
is a one day discrepancy between this plot and that recording). Below you can find a plot where
the values before 3/28 are thrown out and a linear regression model is fit.
[36]: temp = maryland_overall.query('date > "2020-03-27"')
(ggplot(temp,aes(y='positiveRatio',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Ratio of positive cases in MD"))
10
11. [36]: <ggplot: (146473328399)>
Here is a facet plot of the cumulative cases for all counties in Maryland. From the chart it is clear
that some counties have far larger case growth rates than other counties (which is to be expected,
as only a few counties are very populous).
[37]: (ggplot(maryland_deaths, aes(x="date", y="cumulative_cases"))
+ geom_point()
+ facet_wrap('~location_name', ncol=4)
+ theme(axis_text_x = element_text(angle=90), figure_size=(30,30)))
11
12. [37]: <ggplot: (146482755697)>
This is a similar plot, but restricted to the 8 most populous counties (Montgomery, Prince George’s,
Baltimore, Baltimore City, Howard, Anne Arundel, Frederick, Harford) as listed here. As popula-
tion likely affects transmission rate, it may be useful to isolate these.
[38]: temp = maryland_deaths.query('location_name in ["Montgomery", "Prince␣
→George's", "Baltimore", "Baltimore City",
"Frederick", "Howard", "Anne Arundel",␣
→"Harford"]')
(ggplot(temp, aes(x="date", y="cumulative_cases"))
+ geom_point()
+ facet_wrap('~location_name', ncol=4)
12
13. + theme(axis_text_x = element_text(angle=90), figure_size=(70,35)))
[38]: <ggplot: (-9223371890378682827)>
2.3 Hypothesis Testing and Machine Learning
Now we will perform some hypothesis testing to see if quarantine has actually changed the transmis-
sion rate of COVID-19. To do so, we will create linear regressions for the rate of increasing positive
cases before and during the quarantine. We will set our null hypothesis as there is no difference
between rate of increasing cases, and our alternative hypothesis that there is some difference.
[39]: JDate = list()
before_df = maryland_overall.query('date < "2020-04-14"')
for i, row in before_df.iterrows():
JDate.append(row['date'].to_julian_date()-2458912.5)
before_df['jDate'] = JDate
before_df['status'] = "0"
JDate = list()
after_df = maryland_overall.query('date >= "2020-04-14"')
for i, row in after_df.iterrows():
JDate.append(row['date'].to_julian_date()-2458952.5)
after_df['jDate'] = JDate
after_df['status'] = "1"
13
15. Df Residuals: 64 BIC: 1164.
Df Model: 3
Covariance Type: nonrobust
================================================================================
=====
coef std err t P>|t| [0.025
0.975]
--------------------------------------------------------------------------------
-----
Intercept -2068.0115 368.746 -5.608 0.000 -2804.667
-1331.356
status[T.1] 9435.4242 577.425 16.341 0.000 8281.886
1.06e+04
jDate 191.6871 15.674 12.230 0.000 160.376
222.999
jDate:status[T.1] 711.5184 31.022 22.936 0.000 649.546
773.491
==============================================================================
Omnibus: 6.598 Durbin-Watson: 0.111
Prob(Omnibus): 0.037 Jarque-Bera (JB): 6.084
Skew: 0.722 Prob(JB): 0.0478
Kurtosis: 3.243 Cond. No. 99.8
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
"""
The above plot and table show the information for a linear regression. The table shows us that
the rate of transmission prior to the quarentine is ~191 new confirmed cases per day, whereas after
quarentine began and the incubation period passed, the rate increased to ~711 new cases per day.
We can also see that the p-value for each of these two regressions are lower than our designated
α of 0.05, thus we can say that these slopes are reasonable. We also see that the p-value for
comparing the two slopes is 0, meaning that we reject our null hypothesis that the rate stayed the
same throughout quarantine.
Something we should note is that our rate actually increased drastically after quarantine(and incu-
bation period), which is not at all what we expect, nor what actually happened. It should be noted
that it makes sense for viruses, especially ones as contagious as COVID, to increase proportionally
to the number of positive cases, as more cases means higher rate of transmission. Additionally,
there are many other factors, such as differences between states, urbanization, availability of testing,
as well as other reasons that are likely outside the scope of our understanding.
2.4 Other Resources
As the COVID-19 case is an ongoing pandemic, you should strive to stay informed about the
virus in general. Every major news source, and most minor sources as well, will have ongoing
15
16. updates concerning the virus. As our analysis was solely centered around the case in Maryland,
you should check your local and state government webpages concerning their stance and laws
concerning the situation. For more in-depth news and articles concerning COVID, you should
follow the World Health Organization website for scholarly articles and global news. To view data
about the pandemic, we would recommend the data provided by the World Health Organization,
as well as taking a look at the COVID Tracking Project and data gathered from Johns Hopkins
University, which were used in the analysis of our data. COVID-19 is a serious threat accross the
globe, so the more informed you are and the more involved you are with the data, the better your
decision making will be on how to stay safe and healthy.
[ ]:
16