This document summarizes a study that analyzed the relationship between internet use, homicide rates, and population sizes in countries from 1990-2014. The researchers combined data from UN databases on these factors into a single dataset. Their analysis in Weka found that population best predicted homicide rates, but further analysis in R found no significant correlation between internet use and homicide rates. Scatter plots suggested homicide variability decreases as internet use increases above 5%, but the relationship differed by region. The researchers concluded there is no direct correlation between internet use and homicide rates globally.
This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naïve Bayesian classifier in order to predict potential crime types. To further analyse crimes’ datasets, the paper introduces an analysis study by combining our findings of Denver crimes’ dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise people’s awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within
a particular time.
The document describes the Twitter ISA (Intelligent Synthesis and Real Time Response) system. The Twitter ISA analyzes social media streams in real-time to identify events. It uses machine learning models to classify tweets by topic, like identifying traffic-related tweets. The system also evaluates different event detection techniques using sentiment analysis, active users, and social graphs. Additionally, it develops methods to distinguish event hashtags from noisy meme hashtags using classifiers. Evaluations show the Twitter ISA can accurately perform these tasks but that a 1% sample stream has limitations compared to the 10% garden hose sample for applications needing more detail.
An Initial Homophily Indicator to Reinforce Context-Aware Semantic ComputingAlejandro Rivero
This document describes research on using social network analysis and homophily indicators to improve context-aware computing systems. Homophily is the tendency for similar people to connect in social networks. The researchers propose a new "homophily indicator" to measure homophily in networks that overcomes limitations of existing indicators. They test their indicator on a dataset of smartphone interactions between high school students, running experiments with different parameters. The results show their indicator performs better than alternatives in interpreting levels of homophily in networks. Future work will focus on extending the indicator to account for time periods and using it to develop context inference models.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
The document discusses sources of bias in search results from a large newspaper archive and methods for quantifying this bias. It finds significant retrievability bias based on Lorenz curves and Gini coefficients, with many documents never retrieved. Certain document features like date, size and type correlate with lower retrievability scores. Real user queries exhibit more bias than simulated queries. Quantifying and understanding biases can help provide more representative search results.
Grupo Herdez is a 102-year-old Mexican company that produces food and beverages. To better understand customer needs and reduce procurement costs, Grupo Herdez implemented a strategic sourcing framework using SAP Ariba Sourcing. In just 4 years, this led to market leadership growth and significant procurement cost savings, exceeding initial estimates.
Este documento resume la evolución del universo digital de datos desde 2008 hasta 2013 según informes de IDC y EMC. Describe cómo la cantidad de datos digitales almacenados en el mundo se ha multiplicado exponencialmente en la última década debido al aumento de dispositivos digitales, redes sociales y sensores. También introduce el concepto de "Big Data" y datos abiertos, y analiza iniciativas pioneras de gobiernos para publicar datos públicos.
This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naïve Bayesian classifier in order to predict potential crime types. To further analyse crimes’ datasets, the paper introduces an analysis study by combining our findings of Denver crimes’ dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise people’s awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within
a particular time.
The document describes the Twitter ISA (Intelligent Synthesis and Real Time Response) system. The Twitter ISA analyzes social media streams in real-time to identify events. It uses machine learning models to classify tweets by topic, like identifying traffic-related tweets. The system also evaluates different event detection techniques using sentiment analysis, active users, and social graphs. Additionally, it develops methods to distinguish event hashtags from noisy meme hashtags using classifiers. Evaluations show the Twitter ISA can accurately perform these tasks but that a 1% sample stream has limitations compared to the 10% garden hose sample for applications needing more detail.
An Initial Homophily Indicator to Reinforce Context-Aware Semantic ComputingAlejandro Rivero
This document describes research on using social network analysis and homophily indicators to improve context-aware computing systems. Homophily is the tendency for similar people to connect in social networks. The researchers propose a new "homophily indicator" to measure homophily in networks that overcomes limitations of existing indicators. They test their indicator on a dataset of smartphone interactions between high school students, running experiments with different parameters. The results show their indicator performs better than alternatives in interpreting levels of homophily in networks. Future work will focus on extending the indicator to account for time periods and using it to develop context inference models.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
The document discusses sources of bias in search results from a large newspaper archive and methods for quantifying this bias. It finds significant retrievability bias based on Lorenz curves and Gini coefficients, with many documents never retrieved. Certain document features like date, size and type correlate with lower retrievability scores. Real user queries exhibit more bias than simulated queries. Quantifying and understanding biases can help provide more representative search results.
Grupo Herdez is a 102-year-old Mexican company that produces food and beverages. To better understand customer needs and reduce procurement costs, Grupo Herdez implemented a strategic sourcing framework using SAP Ariba Sourcing. In just 4 years, this led to market leadership growth and significant procurement cost savings, exceeding initial estimates.
Este documento resume la evolución del universo digital de datos desde 2008 hasta 2013 según informes de IDC y EMC. Describe cómo la cantidad de datos digitales almacenados en el mundo se ha multiplicado exponencialmente en la última década debido al aumento de dispositivos digitales, redes sociales y sensores. También introduce el concepto de "Big Data" y datos abiertos, y analiza iniciativas pioneras de gobiernos para publicar datos públicos.
Este documento describe los diferentes tipos de polaridad que puede haber en el sistema internacional: unipolaridad, bipolaridad, multipolaridad y apolaridad. También habla sobre la tripolaridad, cuando existen tres grandes polos de poder. Explica que en algunos períodos después de la Segunda Guerra Mundial, Europa tenía una configuración tripolar con Estados Unidos, Reino Unido y la Unión Soviética como los tres principales polos. También menciona que actualmente a nivel global, Estados Unidos, Rusia y China pueden considerarse los tres grandes polos
The document summarizes the first tourism and hospitality MOOC offered by USI called "eTourism: Communication Perspectives". It provides details on the history and growth of MOOCs, an analysis of existing tourism and hospitality MOOCs, and the process undertaken by USI to develop their pilot MOOC. This included selecting a partner platform, creating content, promoting the course, delivering the course, and evaluating its performance using the Kirkpatrick model. Details are given on the course curriculum, participation rates, and plans for a second round of the MOOC.
Tuquito es un sistema operativo de código abierto que ofrece alternativas gratuitas a Windows y Office como un navegador web, suite de oficina, aplicaciones multimedia y mensajería instantánea. Se puede descargar como archivo ISO y probar sin instalar o instalar en la computadora. Incluye un escritorio intuitivo llamado Gnome y acceso a aplicaciones a través de un menú central organizado por categorías. También provee soporte técnico a través de foros, wiki, redes sociales e IRC y ha sido usado en iniciativas de educación con comput
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
Este proyecto transversal tiene como objetivo fomentar valores a través de actividades deportivas para establecer un ambiente armónico en la comunidad estudiantil del Plantel 29 San José del Rincón. Se han observado conductas en los alumnos que carecen de valores, por lo que se busca practicar valores como el respeto y la tolerancia mediante actividades deportivas como fútbol rápido y atletismo. El proyecto incluye diversas actividades en diferentes asignaturas para lograr sus objetivos de forma transversal, y se evaluará el desarrollo de valores
PPT ON 50 AYURVEDIC DRUGS BY DR SIBA PRASAD ROUT,IPGTsibaprasad Rout
This document summarizes 50 Ayurvedic drugs based on their clinical pharmacological properties or "karmas" as described in classical Ayurvedic texts. It provides the Sanskrit name, English name, family, and key synonyms for each drug derived from descriptions of their specific karmas or actions in major Ayurvedic texts. For some drugs, it highlights key chemical constituents and references modern research that helps validate their traditional uses. The drugs are categorized based on their actions on different body systems, such as the digestive, respiratory, cardiovascular and nervous systems.
Histology within the GI tract - from cheek to cheekmeducationdotnet
The document provides an overview of the histology of the alimentary system from mouth to anus. It describes the general structure of the GI tract, which consists of four layers: mucosa, submucosa, muscularis propria, and adventitia. It then details the histology and features of specific areas of the GI tract, including the mouth, tongue, esophagus, stomach (body and fundus, pylorus), small intestine (duodenum, jejunum, ileum) and colon. Common pathological changes in different areas are also mentioned, such as Barrett's esophagus, Helicobacter pylori infection, celiac disease, and Crohn's disease.
1.1.9 Система Angara и дренажные трубы Igor Golovin
Система предназначена для прокладки трасс кондиционирования, отопления и водоснабжения. Короба устанавливаются как в жилых и офисных, так и производственных помещениях, а также по фасадам зданий. Прокладка коммуникаций осуществляется как по стенам – в коробах настенного типа, так и по полу – в коробах плинтусного типа. Короба гармонично вписываются в интерьеры помещений и рассчитаны на длительную эксплуатацию. Одним из отличий профессиональных коробов от обычных электротехнических коробов является специальная конструкция с округлой крышкой, охватывающей короб с 3-х сторон. Такая конструкция облегчает монтаж системы и позволяет ей идеально вписываться в любые интерьеры за счет полного отсутствия щелей на внешней поверхности короба. Также в ассортименте присутствует набор специализированных аксессуаров, которые обеспечивают как удобный монтаж системы, так и удобство последующей эксплуатации.
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
This document analyzes crime data from Los Angeles from 2010-2020 to identify trends, predict future crime rates, and make recommendations to law enforcement. Key findings include:
- Crime rates have generally declined over the past decade but dropped significantly in 2020 due to the pandemic.
- Robbery, burglary, and vandalism are the most common crimes.
- Areas with lower median household incomes tend to have higher crime rates.
- Females are consistently the most impacted victims of crime over the past 10 years.
- Southwest LA and other areas have been identified as "hot spots" for criminal activity.
Predictive analysis indicates crime rates will continue increasing post-lockdown in
The document analyzes crime data from Chicago between 2001 and present to help the Chicago Police Department predict and prevent crime. Random forest and naive bayes classification models were used to predict the probability of different crime types occurring in specific police beats. Clustering analysis found that most arrests occurred during nighttime and summer months, and that homicides, robberies, and burglaries decreased between 2001-2008 but increased in 2014. The analysis can help police allocate resources more effectively based on predicted crime types and locations.
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while preparing for project on Computational Epidemics:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele, Jae-Seung Yeom, Nikhil Jain, Chris J. Kuhlman, Yarden Livnat, Keith R. Bisset, Laxmikant V. Kale, Madhav V. Marathe
Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while studying for project work:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele†
Jae-Seung Yeom†
Nikhil Jain†
Chris J. Kuhlman∗
Yarden Livnat‡
Keith R. Bisset∗
Laxmikant V. Kale§
Madhav V. Marathe∗
†Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94551 USA
∗Biocomplexity Institute & Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061 USA
‡Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112 USA
§Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 USA
E-mail: †{bhatele, yeom2, nikhil}@llnl.gov, ∗{ckuhlman, kbisset, mmarathe}@vbi.vt.edu
Abstract—Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
Using deep learning and Google Street View to estimate the demographic makeup...eraser Juan José Calderón
Using deep learning and Google Street View to
estimate the demographic makeup of neighborhoods
across the United States. Timnit Gebrua,1, Jonathan Krausea
, Yilun Wanga
, Duyun Chena
, Jia Dengb
, Erez Lieberman Aidenc,d,e, and Li Fei-Feia
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSangeetha Mam
This document discusses using sentiment analysis and geographic analysis to enhance security by predicting crime rates. It proposes collecting tweets from areas known for high crime and analyzing the sentiment to identify "red alert" neighborhoods in real-time. It also involves analyzing historical crime data using kernel density estimation to identify long-term crime hotspots. By combining real-time tweet sentiment analysis with historical crime patterns, the goal is to more accurately predict crime occurrences and notify citizens. The paper reviews literature on using machine learning techniques like sentiment analysis and kernel density estimation for crime prediction and monitoring social media for risky situations.
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data. It is about datafication – a word so new even spellcheck functions don’t know it is a real word!
Learn more about:
» How BIG DATA changes career paths of even the most unsuspecting?
» How BIG DATA changes the way business decision are made?
» How BIG DATA changes who makes those decisions & the reshuffle of the balance of power it causes?
» What BIG DATA skills can you bring to the office tomorrow to increase your value to the firm
Analysis of Crime Big Data using MapReduceKaushik Rajan
Analyzed Crime Big data of Washington DC to solve the following business queries:
> Which hour has the highest crime count?
> Which shift has the highest crime count?
> Year wise crime count
> Hour wise crime count
> Crime count by an offense
> Average of Shift wise crime count
The data was initially stored in MySql which was then moved to HDFS using SQOOP, from where 4 MapReduce operations are doing using JAVA in Eclipse IDE. The outputs of the queries are then moved to HBase using SQOOP. Two more MapReduce operations are done using PIG, the output of which is also moved to HBase using SQOOP. All the outputs were then moved to the local system and are visualized using RStudio and Tableau.
Tools used:
> MySQL, HDFS and HBase to store the data
> SCOOP to move the data from one database to another
> JAVA (Eclipse IDE) and PIG to run the MapReduce queries
> RStudio for data pre-processing and visualization
> Tableau for visualization
> LATEX for Documentation
Building Proxy Indicators of National Wellbeing with Postal Data - Project Ov...UN Global Pulse
This study investigated using data from international postal flows and other global networks as proxy indicators for national socioeconomic metrics. Electronic postal records from 2010-2014 involving 187 countries were analyzed. Connectivity measures from these networks were strongly correlated with indicators like GDP, HDI, and poverty rate. Combining these network data into a multiplex model further improved correlations and generated multidimensional connectivity indicators. This demonstrated new approaches for approximating standard socioeconomic benchmarks in a global, real-time manner using alternative data sources like postal and digital network flows.
Este documento describe los diferentes tipos de polaridad que puede haber en el sistema internacional: unipolaridad, bipolaridad, multipolaridad y apolaridad. También habla sobre la tripolaridad, cuando existen tres grandes polos de poder. Explica que en algunos períodos después de la Segunda Guerra Mundial, Europa tenía una configuración tripolar con Estados Unidos, Reino Unido y la Unión Soviética como los tres principales polos. También menciona que actualmente a nivel global, Estados Unidos, Rusia y China pueden considerarse los tres grandes polos
The document summarizes the first tourism and hospitality MOOC offered by USI called "eTourism: Communication Perspectives". It provides details on the history and growth of MOOCs, an analysis of existing tourism and hospitality MOOCs, and the process undertaken by USI to develop their pilot MOOC. This included selecting a partner platform, creating content, promoting the course, delivering the course, and evaluating its performance using the Kirkpatrick model. Details are given on the course curriculum, participation rates, and plans for a second round of the MOOC.
Tuquito es un sistema operativo de código abierto que ofrece alternativas gratuitas a Windows y Office como un navegador web, suite de oficina, aplicaciones multimedia y mensajería instantánea. Se puede descargar como archivo ISO y probar sin instalar o instalar en la computadora. Incluye un escritorio intuitivo llamado Gnome y acceso a aplicaciones a través de un menú central organizado por categorías. También provee soporte técnico a través de foros, wiki, redes sociales e IRC y ha sido usado en iniciativas de educación con comput
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
Este proyecto transversal tiene como objetivo fomentar valores a través de actividades deportivas para establecer un ambiente armónico en la comunidad estudiantil del Plantel 29 San José del Rincón. Se han observado conductas en los alumnos que carecen de valores, por lo que se busca practicar valores como el respeto y la tolerancia mediante actividades deportivas como fútbol rápido y atletismo. El proyecto incluye diversas actividades en diferentes asignaturas para lograr sus objetivos de forma transversal, y se evaluará el desarrollo de valores
PPT ON 50 AYURVEDIC DRUGS BY DR SIBA PRASAD ROUT,IPGTsibaprasad Rout
This document summarizes 50 Ayurvedic drugs based on their clinical pharmacological properties or "karmas" as described in classical Ayurvedic texts. It provides the Sanskrit name, English name, family, and key synonyms for each drug derived from descriptions of their specific karmas or actions in major Ayurvedic texts. For some drugs, it highlights key chemical constituents and references modern research that helps validate their traditional uses. The drugs are categorized based on their actions on different body systems, such as the digestive, respiratory, cardiovascular and nervous systems.
Histology within the GI tract - from cheek to cheekmeducationdotnet
The document provides an overview of the histology of the alimentary system from mouth to anus. It describes the general structure of the GI tract, which consists of four layers: mucosa, submucosa, muscularis propria, and adventitia. It then details the histology and features of specific areas of the GI tract, including the mouth, tongue, esophagus, stomach (body and fundus, pylorus), small intestine (duodenum, jejunum, ileum) and colon. Common pathological changes in different areas are also mentioned, such as Barrett's esophagus, Helicobacter pylori infection, celiac disease, and Crohn's disease.
1.1.9 Система Angara и дренажные трубы Igor Golovin
Система предназначена для прокладки трасс кондиционирования, отопления и водоснабжения. Короба устанавливаются как в жилых и офисных, так и производственных помещениях, а также по фасадам зданий. Прокладка коммуникаций осуществляется как по стенам – в коробах настенного типа, так и по полу – в коробах плинтусного типа. Короба гармонично вписываются в интерьеры помещений и рассчитаны на длительную эксплуатацию. Одним из отличий профессиональных коробов от обычных электротехнических коробов является специальная конструкция с округлой крышкой, охватывающей короб с 3-х сторон. Такая конструкция облегчает монтаж системы и позволяет ей идеально вписываться в любые интерьеры за счет полного отсутствия щелей на внешней поверхности короба. Также в ассортименте присутствует набор специализированных аксессуаров, которые обеспечивают как удобный монтаж системы, так и удобство последующей эксплуатации.
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
This document analyzes crime data from Los Angeles from 2010-2020 to identify trends, predict future crime rates, and make recommendations to law enforcement. Key findings include:
- Crime rates have generally declined over the past decade but dropped significantly in 2020 due to the pandemic.
- Robbery, burglary, and vandalism are the most common crimes.
- Areas with lower median household incomes tend to have higher crime rates.
- Females are consistently the most impacted victims of crime over the past 10 years.
- Southwest LA and other areas have been identified as "hot spots" for criminal activity.
Predictive analysis indicates crime rates will continue increasing post-lockdown in
The document analyzes crime data from Chicago between 2001 and present to help the Chicago Police Department predict and prevent crime. Random forest and naive bayes classification models were used to predict the probability of different crime types occurring in specific police beats. Clustering analysis found that most arrests occurred during nighttime and summer months, and that homicides, robberies, and burglaries decreased between 2001-2008 but increased in 2014. The analysis can help police allocate resources more effectively based on predicted crime types and locations.
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while preparing for project on Computational Epidemics:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele, Jae-Seung Yeom, Nikhil Jain, Chris J. Kuhlman, Yarden Livnat, Keith R. Bisset, Laxmikant V. Kale, Madhav V. Marathe
Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while studying for project work:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele†
Jae-Seung Yeom†
Nikhil Jain†
Chris J. Kuhlman∗
Yarden Livnat‡
Keith R. Bisset∗
Laxmikant V. Kale§
Madhav V. Marathe∗
†Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94551 USA
∗Biocomplexity Institute & Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061 USA
‡Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112 USA
§Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 USA
E-mail: †{bhatele, yeom2, nikhil}@llnl.gov, ∗{ckuhlman, kbisset, mmarathe}@vbi.vt.edu
Abstract—Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
Using deep learning and Google Street View to estimate the demographic makeup...eraser Juan José Calderón
Using deep learning and Google Street View to
estimate the demographic makeup of neighborhoods
across the United States. Timnit Gebrua,1, Jonathan Krausea
, Yilun Wanga
, Duyun Chena
, Jia Dengb
, Erez Lieberman Aidenc,d,e, and Li Fei-Feia
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSangeetha Mam
This document discusses using sentiment analysis and geographic analysis to enhance security by predicting crime rates. It proposes collecting tweets from areas known for high crime and analyzing the sentiment to identify "red alert" neighborhoods in real-time. It also involves analyzing historical crime data using kernel density estimation to identify long-term crime hotspots. By combining real-time tweet sentiment analysis with historical crime patterns, the goal is to more accurately predict crime occurrences and notify citizens. The paper reviews literature on using machine learning techniques like sentiment analysis and kernel density estimation for crime prediction and monitoring social media for risky situations.
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data. It is about datafication – a word so new even spellcheck functions don’t know it is a real word!
Learn more about:
» How BIG DATA changes career paths of even the most unsuspecting?
» How BIG DATA changes the way business decision are made?
» How BIG DATA changes who makes those decisions & the reshuffle of the balance of power it causes?
» What BIG DATA skills can you bring to the office tomorrow to increase your value to the firm
Analysis of Crime Big Data using MapReduceKaushik Rajan
Analyzed Crime Big data of Washington DC to solve the following business queries:
> Which hour has the highest crime count?
> Which shift has the highest crime count?
> Year wise crime count
> Hour wise crime count
> Crime count by an offense
> Average of Shift wise crime count
The data was initially stored in MySql which was then moved to HDFS using SQOOP, from where 4 MapReduce operations are doing using JAVA in Eclipse IDE. The outputs of the queries are then moved to HBase using SQOOP. Two more MapReduce operations are done using PIG, the output of which is also moved to HBase using SQOOP. All the outputs were then moved to the local system and are visualized using RStudio and Tableau.
Tools used:
> MySQL, HDFS and HBase to store the data
> SCOOP to move the data from one database to another
> JAVA (Eclipse IDE) and PIG to run the MapReduce queries
> RStudio for data pre-processing and visualization
> Tableau for visualization
> LATEX for Documentation
Building Proxy Indicators of National Wellbeing with Postal Data - Project Ov...UN Global Pulse
This study investigated using data from international postal flows and other global networks as proxy indicators for national socioeconomic metrics. Electronic postal records from 2010-2014 involving 187 countries were analyzed. Connectivity measures from these networks were strongly correlated with indicators like GDP, HDI, and poverty rate. Combining these network data into a multiplex model further improved correlations and generated multidimensional connectivity indicators. This demonstrated new approaches for approximating standard socioeconomic benchmarks in a global, real-time manner using alternative data sources like postal and digital network flows.
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...IJCI JOURNAL
Activity based travel demand models are widely used in transportation planning to predict future demand of
transportation. Disaggregate level data for the entire population is required as input to these models, which
included household level and person level attributes for the entire study area. These data are usually
collected by the population census, but are rarely available due to confidentiality reasons. Hence as a
viable alternative, population synthesis techniques are used to supplement the microdata. An attempt has
been made in this study to generate synthetic population using Markov Chain Monte Carlo Simulation
method and to compare this with conventional method. Thiruvananthapuram Corporation in Kerala was
selected as the study area and sample data were collected by household survey. The algorithm for
population synthesis was coded in C++. The methodology was validated using 16 percentage of the
collected data. Prediction accuracy of the method was compared with conventional method and was found
better.
GIS is a discipline that heavily relies on data. In this presentation we highlight all the geospatial data sources for crime mapping.
Visit https://expertwritinghelp.com/gis-assignment-help/ for quality gis assignment aid
John MitchellMazala MoodyProfessor BrownMath 36April 1.docxjesssueann
John Mitchell
Mazala Moody
Professor Brown
Math 36
April 1st, 2020
Final Project Script
Crime. We’ve all heard about it on the news or on social media. And what we hear about the most—
despite the fact that it is by far not the most common kind of crime—is violent crime, defined by the
FBI as consisting of four offenses: murder and non-negligent manslaughter, forcible rape, robbery, and
aggravated assault. But exactly how common is violent crime across the country? Do certain areas
experience more violent crime? And if so, why? What factors are linked to violent crime, and why?
What is the link between violent crime, poverty, and education? Stay tuned to find out!
In terms of total violent crimes committed, in 2011 there were an estimated total of 1,203,564 violent
crimes committed nationwide, with the vast majority of 62.4% being instances of aggravated assault.
(I plan to get clearer images)
But where did those crimes take place? Did certain areas experience more violent crime than others? If
so, why? First, as we can see, many of the major clusters of violent crime are located in major cities,
including New York, Los Angeles, and Chicago. If those three cities sound familiar, they also happen
to be the three largest cities in the United States. So violent crime has a definite correlation with large
population clusters, but is this simply because there are more people around to commit crimes, or are
there other factors at work here? Let’s look closer.
If we look at the numbers of families with income below poverty level, what do we see? The clusters
match up very closely with those of violent crime, with the largest numbers centered again on major
cities such as New York, Los Angeles, and Chicago. So as we can see, there is a very close correlation
between there are a lot of people living in poverty, and where a lot of people are committing violent
crimes. Note that these are raw numbers, not percentages, but they can still give us a valuable picture of
where the most crime is occurring, and why that might be.
But let’s look closer still. When we look at which areas have the most people 25 years of age or older
who have completed less than high school, the map again shows us massive clusters of people in the
big cities again. Los Angeles County features a whopping 23.73% of people 25 years of age or older
having completed less than high school. Bronx County, New York, is even worse, with a staggering
30.71% percent. What else do these areas have lots of? You guessed it, violent crime!
So it is clear that violent crime is linked very closely with poverty, and perhaps even more closely with
a lack of education. So what should be done to fix the problem? Well, to start with, increasing funding
and access to education has been shown many times to decrease people’s chances of living in poverty
and of committing violent—and other types of—crime.
(im trying to find more reliable data that compares education, pov ...
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities?
We present our data analytics empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
This document analyzes gun-related crime data using big data tools like Apache Hive and Pig. It summarizes the deadliest US mass shootings from 2016 to 2015. It then outlines the tools, data specifications, and workflow used to analyze gun sales rates, gun ownership rates, and crime rates over 2014-2015. Visualizations created in Excel, Tableau and 3D maps show trends in gun crimes in different areas for those years. In conclusion, it finds higher gun crime in central LA, guns comprising 28% of total crimes in 2015, and areas with higher income reporting less gun crimes in New York. Suggestions include using financial stability to predict gun crime likelihood.
The document provides an overview of sentiment analysis and summarizes the current approaches used. It discusses how machine learning classifiers like Naive Bayes can be used for sentiment classification of texts, treating it as a two-class text classification problem. It also mentions the use of natural language processing techniques. The current system discussed will use machine learning and NLP for sentiment analysis of tweets, training classifiers on labeled tweet data to classify the polarity of new tweets.
Helping Chicago Communities Identify Subjects Who Are Likely to be Involved i...Brendan Sigale
- The document describes a project to analyze Chicago Police Department data to develop a model that predicts an individual's risk level of being involved in a shooting based on various factors.
- An artificial neural network model was found to best predict the risk score, achieving an R2 of 0.9234 and mean average error of 10.583.
- K-means clustering identified 3 clusters that characterize individuals based on attributes like gender, race, age, and criminal history with drugs and weapons.
This document proposes using a network science approach to analyze crime data and forecast crime occurrences. It describes plotting crime incident data from two days as networks and analyzing the networks to understand crime patterns and predict if crime will increase or decrease at certain locations. The document outlines collecting and preprocessing crime datasets, defining nodes and edges to create networks, visualizing the networks in Gephi, analyzing the networks to draw conclusions, and discusses challenges and the potential for future improvement and expansion of the approach.
Dataset from National Institute of justice about the crimes of San Francisco. Apply Network Analysis after calculation of distances between different crime points as nodes of a city and then put the approach.
1. The document discusses using call detail record (CDR) data to study how mobile phone users manage their social contacts over time and characterize or predict social turnover.
2. By detecting new and old social relationships from CDRs that show communication patterns and frequencies between users, the author aims to analyze how users' social networks evolve and change.
3. The author proposes studying properties like the distribution of inter-event times between calls to the same contact and how this distribution depends on relationship longevity to provide insights into social turnover.
Social and economical networks from (big-)data - Esteban Moro II
Cat Videos Save Lives
1. CS 105 Final Project
Kelsey Borovinsky (klboro)
Megan Fantes (mfantes)
Rebecca Jahnke (rsjahnke)
Victor Kholod (vkholod)
Cat Videos Save Lives
A Study of the Relationship Between Internet Use and Homicide Rates Across the Globe
1. Introduction
Our project interprets internet use, homicide rates and population size for countries around the
world from 1990-2014 to see what, if any, relationships exist between these factors. The central,
initial question that piqued our curiosity was whether internet use is linked to homicide rates.
Perhaps high internet use signified a more highly educated population, and so there would be
less crime (or maybe people were spending more time watching cat videos than plotting a
crime). Or, perhaps more internet use would mean more access to violent content and forums
for violent communities, resulting in higher homicide rates. We set out to build a model that
would allow us to determine what, if any, link exists.
Ultimately, when we ran our data in Weka, we found that the highest indicator of homicide rate
was population. This makes sense: as population rises, you’d expect homicide to rise
proportionally.
However, since this model did not answer our central curiosity as to whether there was a link
between internet use and homicide rates, we also ran our data in R, another statistical tool. R
presented us with a model showing higher internet use as linked with lower homicide rates. Our
theory of high internet use indicating a more educated, civilized culture with less homicide held.
However, the p-value of this model was quite high, indicating that this seemingly potential link -
and theory - is insignificant.
2. Dataset Description
We denormalized three tables from the United Nations’ collection of databases to create our
dataset.
The first table we pulled from detailed the percent of individuals using the internet in each
country for each year from 1990 to 2014
(http://data.un.org/Data.aspx?d=ITU&f=ind1Code%3aI99H).
The second table we pulled from gave homicide rate for each country in each year from 1990 to
2014 (http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1).
2. Finally, the third table gave population size for each country from 1979 to 2014
(http://data.un.org/Data.aspx?d=POP&f=tableCode%3a22).
We combined data from all three data tables into one table describing the rate of internet use,
the homicide count, the homicide rate and the population size for each country in each year
from 1990 to 2014.
Key attributes present in the table we’ve created from these three datasets are population size
per country, count (representing the number of homicides per country), homicide rate per
country, and internet use (quantified as the percent of individuals using the internet per country).
The primary key was the combination of (_Country_, _Year_), because countries will appear
multiple times for multiple years with that year’s data, however the combination of country and
year is unique (all data for a given country in a given year will be in one record). See the table
below for more information on each attribute:
Country (Primary Key) Name of the country (string)
Year (Primary Key) Year the population, homicide count,
homicide rate and internet use numbers are
for (String)
Population An integer representing the population size of
the country
Homicide Count An integer representing the number of
homicides per country
Homicide Rate An value representing the homicide rate per
country based on number of homicides and
the population
Internet Use An integer representing the percent of
individuals using the internet per country
Our final relational table contained the year, country, population, internet use percent, homicide
count and homicide rate.
3. Data Preparation
To begin our data preparation, we downloaded the internet, homicide, and population tables
from the UN database website. However, the UN website only allows users to download
100,000 rows of a table at a time. The internet and homicide tables were small enough to
3. download in their entirety, but the population table contained more than 1.5 million rows, so we
had to download the table 100,000 rows at a time.
After downloading all of the 100,000-row population subtables, we had to “clean” the population
data and combine the sub-tables into one large table. The original table from the UN website
was so large because it broke down the population of each country for each year into many
different categories (total population, number of men, number of women, number of people
living in urban/rural areas, population by age, etc.). We only needed the total population of each
country in each year, so we wrote a Python code to extract only the total population values. This
code read all CSV files containing the population sub-tables, and printed the total populations to
a single output file.1
Once the population was cleaned, we uploaded the population, internet,
and homicide tables to SQLite, to create a SQLite database that we could access through
Python.
Once the tables were uploaded to SQLite, we wrote a Python code to denormalize the tables,
i.e. combine them into one relational table. Our code connected Python to the sqlite3 module,
executed a join command, and printed the resulting table to an output file.2
The output table
contained attributes for Country, Year, Population, Percent of Population Using the Internet,
Homicide Count per 100,000 People, and Homicide Rate per 100,000 People. After cleaning
and combining all of our data, we ended with 731 data points.
For our data analysis, we were curious about the change in internet use, homicide rate, and
population from year to year. We planned to conduct numeric estimation using the numeric
change in internet, homicide, and population, so we needed to calculate the change in each
attribute from year to year. We also planned to conduct classification learning on indicators for
internet use, homicide rate, and population -- 1 for increase, -1 for decrease, 0 for no change --
to see if there were any relationships simply between changes in internet use, homicide, and
population without knowing specific numbers. We wrote a Python code to calculate the average
change in each attribute from year to year and determine the correct indicator.3
Our code made
pairwise comparisons between sequential years for each country, calculated averages and
indicators, and printed the resulting table to an output file.
Finally, we wanted to add a column for the region each country is in to see if there were trends
in different areas of the world. We found a list of each country and its region,4
created a table in
Excel, and wrote a Python code to join our table of internet use, homicide, and population with
the table indicating region,5
and print it to an output file.6
For our data analysis, we planned to use both categorical and numeric algorithms in Weka. In
Excel, we created two separate tables: one table with region and the categorical indicators for
change in population, internet use, and homicide count per 100,000 people,7
and one table with
the numeric changes in population, internet use, and homicide count per 100,000 people.8
Once
these tables were uploaded to Weka, we split them into training and test sets using an 80/20
split, respectively.
4. 4. Data Analysis
To analyze our categorical data, we first conducted a 1R analysis on the indicator variable for
change in homicide count per 100,000 people, which chose the single attribute that best
predicted the change in homicide. The 1R algorithm acts as a baseline test providing a baseline
accuracy to which we can compare the accuracy of the rest of our categorical tests. If the
accuracy of a more complicated algorithm is lower than 1R, then we know the more complicated
algorithm is not efficient or worthwhile for our data. Once we ran a 1R algorithm, we conducted
a J48 analysis on our categorical attributes, which created a decision tree from the attributes to
best predict change in homicide count.
To further explore the relationships between our attributes, we ran a numeric estimation
algorithm on our numeric attributes (Avg. change in population, avg. change in internet use) to
find the best prediction model for change in homicide count. The numeric estimate algorithm
conducts a linear regression on the data and outputs a formula for the regression. We used the
M5 method, which removes the least significant attributes from the model, to create our model.
Weka does not give significance values for its linear models, nor does it allow us to create a
model for our original question: are internet use and homicide count correlated? We decided to
use the R statistical computing package to find significance values and independently create our
own linear models. Once we imported our data set into R, we could write our own formulas for
linear models because the R interface is much more similar to Python than Weka, in that we can
type commands into the console to get output. R gives a much more detailed summary of the
model, its accuracy, and its significance, giving us even better insight into the true relationship
between our attributes than numeric estimation in Weka.
5. Results
The 1R algorithm chose region as the most accurate indicator of change in homicide count, with
55% accuracy.9
When we ran the 1R model on the test set, the model still yielded 55%
accuracy, indicating that the 1R model did not overfit the training data and generalized well.
The J48 algorithm created a decision tree that predicted change in homicide count directly from
region, like the 1R model, for all regions except Asia.10
If a country is in Asia, the decision tree
then looks at the indicator for change in internet use in order to predict change in homicide rate.
The decision tree had 52% accuracy,11
which is lower than the accuracy of 1R and indicates
that the simpler 1R model is sufficient to describe the data. When we ran the J48 model on the
test set, the model yielded 54% accuracy, which is higher than the training accuracy and
indicates that the model did not overfit the training data and generalizes well.
When we conducted our numeric estimation test, with the M5 method, the algorithm created the
following formula:
5. Avg. Change in Hom. Count = -12.0074 + (0.0001) * Avg. Change in Pop.
This formula indicates that Average Change in Homicide Count is more significantly correlated
with Average Change in Population than Average Change in Internet Use, because the M5
method removed the Change in Internet Use variable. The positive coefficient for the Average
Change in Population variable indicates that homicide and population are positively correlated,
i.e. as population increases, homicide increases. Note that this makes intuitive sense - as
population rises, you’d expect homicide to rise proportionally.
We found the significance of our linear model by running a regression with the same formula in
R.12
The significance of a linear model is given as a p-value, and a very significant model has a
very low p-value. (A p-value essentially indicates the probability that our model is wrong -- so
the lower the p-value, the lower the probability that our model is wrong). In general, a p-value
less than 0.05 indicates a significant model, and a p-value less than 0.001 indicates a very
significant model. When we ran our model in R, we found that its p-value is less than 0.001,
meaning homicide count is significantly correlated with change in population.
However, the correlation between homicide and population size was not our research question.
We wanted to know the correlation between homicide and internet use, so we ran a regression
on the data predicting homicide from internet use.13
R produced the following formula:
Avg. Change in Hom. Count = 0.5490 + (-0.4601) * Avg. Change in Internet Use
The negative coefficient for the Average Change in Internet Use variable indicates that homicide
and internet use are negatively correlated, i.e. as internet use increases, homicide decreases.
However, the p-value for this model is 0.825, which is very high and indicates that our model is
not significant. Thus we may conclude that homicide and internet use are not correlated.
Even though we found that homicide and internet use are not significantly correlated, we still
wanted to fully understand any relationship they may have and give our original research
question due diligence. We began by making a scatter plot of homicide count vs. internet use:
6. When we looked at the graph, we noticed that there were a few extreme outliers in the data.
Such outliers prevent us from understanding the relationship between the majority of the data
points, because they stretch the scale of the graph and squish the rest of the data points
together. We were interested to see the graph with the outliers removed, so we removed all
data points that were more than 3 standard deviations away from the mean of either the
average change in homicide count or the mean of average change in internet use. We removed
20 outliers and created a new scatter plot:
7. In the graph with outliers removed, we can more easily see general trends in the data. From the
new graph, we see that lower values in change in internet use have much higher variability in
change in homicide count, and higher values in change in internet use have less variability in
change in homicide count (i.e. are more concentrated around zero).
Data points with change in internet use less than 5% have the highest variability in change in
homicide rate. This suggests that countries with a slower rate of increase in internet use are
much less predictable. In general, countries that are slower to advance in technology are more
unstable, so their social behavior (like tendency to commit murder) is more erratic. Data points
with change in internet use greater than 5% have low variability in change in homicide count.
This suggests that countries with a higher rate of increase in internet use are more predictable.
Countries that are rapidly advancing in technology are also likely rapidly modernizing and
becoming more organized. Having a more organized society that is more globally connected
through technology yields behavior that is much less erratic and much less likely to drastically
change over time.
8. While we could not quantize the correlation between homicide and internet use using a formula,
a simple scatter plot of the data revealed likely relationships between the variables.
Our scatter plot only reveals global trends. We were curious if there was a difference in the
relationship between homicide and internet use between the different regions of the world14
. We
created a bar graph comparing change in internet use from 1990 - 2014 and the change in
homicide rate from 1990 - 2014:
We decided to compare homicide rates instead of homicide counts so that the units of change
(percent of population) are the same for both homicide rate and internet use (and as homicide
rate tended to follow the trends of homicide count in our data, the switch between rate and
count was inconsequential). From our bar graph, we see that internet use increases for every
region, as expected. However, the change in homicide rate is very different for every region. In
South America, for example, the average change in homicide rate from 1990 - 2014 significantly
decreases. If we had run a linear regression on just South America, we might have found more
significant evidence that an increase in internet use is correlated with a decrease in homicide
rate (or homicide count). Looking back, we can also see from this graph why it made intuitive
sense for 1R to have chosen region; the change in homicide rate varies from region to region,
so region is indicative of what that change will look like. Future studies on the relationship
between homicide and internet use could analyze different regions in more detail and create
linear models for each individual region. Perhaps such a future study will find significant
evidence for vastly different trends in each region, and the variability between regions is the
reason for the lack of significant evidence in the global trends of our study.
9. 6. Conclusions
We set out to see if there was a correlation between Internet use and homicide rates throughout
the world from 1990-2014. After running our data on global populations, homicide counts, and
internet rates through various data mining algorithms in Weka, we found that the greatest
numeric indicator of homicide count was population. This made sense considering if population
increased, then homicide counts should increase as well. To answer our initial question about
the correlation between homicide and internet use, we ran our data in R, and discovered that
when internet use increased, homicide counts decreased. We cannot make this claim in
confidence, though, due to a high p-value indicating that our theory is insignificant. Thus we
conclude that there is no direct correlation linking internet usage with homicide counts.
7. Appendix
1
see CodeToCleanPop.py
2
see CodeToJoinTables.py
3
see CodeToAdd_Indicators_Averages.py
4
on www.internetworldstats.com
5
see CodeToAddRegion.py
6
see Full Table with Regions
7
see Classification Table
8
see Numeric Table
9
Confusion matrix:
Predicted Value:
Actual
Value:
-1 0 1
-1 242 0 55
0 21 0 6
1 179 0 82
10. 10
Decision Tree:
11
Confusion matrix:
Predicted Value:
Actual
Value:
-1 0 1
-1 178 0 119
0 11 0 16
1 137 0 124
12
Abbreviated R output:
Coefficients: Estimate
(Intercept) -1.201e+01
Avg. Change in Pop. 1.350e-04
p-value: 3.022e-11
13
Abbreviated R output:
Coefficients: Estimate
(Intercept) 0.5490
Avg. Change in Internet Use -0.4601
p-value: 0.825
14
see CodeToCalcAvgs.py (includes SQL command to calculate averages)