Crime is a grave problem that affects all countries in the world. The level of crime in a country has a big
impact on its economic growth and quality of life of citizens. In this paper, we provide a survey of trends of
supervised and unsupervised machine learning methods used for crime pattern analysis. We use a spatiotemporal dataset of crimes in San Francisco, CA to demonstrate some of these strategies for crime
analysis. We use classification models, namely, Logistic Regression, Random Forest, Gradient Boosting
and Naive Bayes to predict crime types such as Larceny, Theft, etc. and propose model optimization
strategies. Further, we use a graph based unsupervised machine learning technique called core periphery
structures to analyze how crime behavior evolves over time. These methods can be generalized to use for
different counties and can be greatly helpful in planning police task forces for law enforcement and crime
prevention.
This document summarizes a crime analysis project conducted by a university team. The team analyzed multiple data sets to build models predicting crime rates based on factors like population, weather, daylight hours, and economic indicators. They created binary, numeric, and crime ratio models and found the crime ratio model was most statistically significant. The team's analyses found crime rates generally increased with daylight savings time and increased slightly with higher temperatures. Their best model could predict monthly crime totals by city for most crime types except rare crimes like homicide and sexual assault.
Crime Pattern Detection using K-Means ClusteringReuben George
Crime pattern detection uses data mining techniques like clustering to analyze crime data and identify patterns. This involves plotting past crimes geographically, clustering similar crimes to detect sprees, and analyzing the results to draw conclusions. It helps improve crime solving by learning from history and preempting future crimes. The method augments detectives' work but has limitations like relying on data quality. Overall, crime pattern detection aids operational efficiency and enhancing resolution rates by optimizing resource deployment based on observed crime trends.
Predictive analysis of crime forecastingFrank Smilda
This document discusses various methods for predictive crime mapping, beginning with simply using past crime "hot spots" as a predictor of future hot spots. While this approach has limited accuracy over short periods, past hot spots can predict up to 90% of future crime over longer periods like a year. The document then reviews more sophisticated predictive modeling methods and the role of geographic information systems in developing spatial models to predict crime.
Crime analysis mapping, intrusion detection using data miningVenkat Projects
Crime analysis mapping, intrusion detection using data mining
Data Mining plays a key role in Crime Analysis. There are many different algorithms mentioned in previous research papers, among them are the virtual identifier, pruning strategy, support vector machines, and apriori algorithms. VID is to find relation between record and vid. The apriori algorithm helps the fuzzy association rules algorithm and it takes around six hundred seconds to detect a mail bomb attack. In this research paper, we identified Crime mapping analysis based on KNN (K – Nearest Neighbor) and ANN (Artificial Neural Network) algorithms to simplify this process. Crime Mapping is conducted and Funded by the Office of Community Oriented Policing Services (COPS). Evidence based research helps in analyzing the crimes. We calculate the crime rate based on the previous data using data mining techniques. Crime Analysis uses quantitative and qualitative data in combination with analytic techniques in resolving the cases. For public safety purposes, the crime mapping is an essential research area to concentrate on. We can identity the most frequently crime occurring zones with the help of data mining techniques. In Crime Analysis Mapping, we follow the following steps in order to reduce the crime rate: 1) Collect crime data 2) Group data 3) Clustering 4) Forecasting the data. Crime Analysis with crime mapping helps in understanding the concepts and practice of Crime Analysis in assisting police and helps in reduction and prevention of crimes and crime disorders.
Crime Analytics: Analysis of crimes through news paper articlesChamath Sajeewa
Crime analysis is one of the most important
activities of the majority of the intelligent and law enforcement
organizations all over the world. Generally they collect domestic
and foreign crime related data (intelligence) to prevent future
attacks and utilize a limited number of law enforcement
resources in an optimum manner. A major challenge faced by
most of the law enforcement and intelligence organizations is
efficiently and accurately analyzing the growing volumes of crime
related data. The vast geographical diversity and the complexity
of crime patterns have made the analyzing and recording of
crime data more difficult. Data mining is a powerful tool that can
be used effectively for analyzing large databases and deriving
important analytical results. This paper presents an intelligent
crime analysis system which is designed to overcome the above
mentioned problems. The proposed system is a web-based system
which comprises of crime analysis techniques such as hotspot
detection, crime comparison and crime pattern visualization. The
proposed system consists of a rich and simplified environment
that can be used effectively for processes of crime analysis.
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
This document analyzes crime data from Los Angeles from 2010-2020 to identify trends, predict future crime rates, and make recommendations to law enforcement. Key findings include:
- Crime rates have generally declined over the past decade but dropped significantly in 2020 due to the pandemic.
- Robbery, burglary, and vandalism are the most common crimes.
- Areas with lower median household incomes tend to have higher crime rates.
- Females are consistently the most impacted victims of crime over the past 10 years.
- Southwest LA and other areas have been identified as "hot spots" for criminal activity.
Predictive analysis indicates crime rates will continue increasing post-lockdown in
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET Journal
This document discusses using data mining and analytics techniques to analyze crime data and predict crime rates. It proposes using linear regression on crime data from the Indian government to predict future crime occurrences and identify high-risk regions. The system would analyze factors like crime type, offender age, month, and year to build a regression model. This model could then predict crime rates and indicate whether a region is high or low risk for criminal activity. Graphs and tables would visualize the predictions to help law enforcement allocate resources. The goal is to help reduce crime and increase public safety by identifying patterns in historical crime data.
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Crime rate analysis using k nn in python
This document summarizes a crime analysis project conducted by a university team. The team analyzed multiple data sets to build models predicting crime rates based on factors like population, weather, daylight hours, and economic indicators. They created binary, numeric, and crime ratio models and found the crime ratio model was most statistically significant. The team's analyses found crime rates generally increased with daylight savings time and increased slightly with higher temperatures. Their best model could predict monthly crime totals by city for most crime types except rare crimes like homicide and sexual assault.
Crime Pattern Detection using K-Means ClusteringReuben George
Crime pattern detection uses data mining techniques like clustering to analyze crime data and identify patterns. This involves plotting past crimes geographically, clustering similar crimes to detect sprees, and analyzing the results to draw conclusions. It helps improve crime solving by learning from history and preempting future crimes. The method augments detectives' work but has limitations like relying on data quality. Overall, crime pattern detection aids operational efficiency and enhancing resolution rates by optimizing resource deployment based on observed crime trends.
Predictive analysis of crime forecastingFrank Smilda
This document discusses various methods for predictive crime mapping, beginning with simply using past crime "hot spots" as a predictor of future hot spots. While this approach has limited accuracy over short periods, past hot spots can predict up to 90% of future crime over longer periods like a year. The document then reviews more sophisticated predictive modeling methods and the role of geographic information systems in developing spatial models to predict crime.
Crime analysis mapping, intrusion detection using data miningVenkat Projects
Crime analysis mapping, intrusion detection using data mining
Data Mining plays a key role in Crime Analysis. There are many different algorithms mentioned in previous research papers, among them are the virtual identifier, pruning strategy, support vector machines, and apriori algorithms. VID is to find relation between record and vid. The apriori algorithm helps the fuzzy association rules algorithm and it takes around six hundred seconds to detect a mail bomb attack. In this research paper, we identified Crime mapping analysis based on KNN (K – Nearest Neighbor) and ANN (Artificial Neural Network) algorithms to simplify this process. Crime Mapping is conducted and Funded by the Office of Community Oriented Policing Services (COPS). Evidence based research helps in analyzing the crimes. We calculate the crime rate based on the previous data using data mining techniques. Crime Analysis uses quantitative and qualitative data in combination with analytic techniques in resolving the cases. For public safety purposes, the crime mapping is an essential research area to concentrate on. We can identity the most frequently crime occurring zones with the help of data mining techniques. In Crime Analysis Mapping, we follow the following steps in order to reduce the crime rate: 1) Collect crime data 2) Group data 3) Clustering 4) Forecasting the data. Crime Analysis with crime mapping helps in understanding the concepts and practice of Crime Analysis in assisting police and helps in reduction and prevention of crimes and crime disorders.
Crime Analytics: Analysis of crimes through news paper articlesChamath Sajeewa
Crime analysis is one of the most important
activities of the majority of the intelligent and law enforcement
organizations all over the world. Generally they collect domestic
and foreign crime related data (intelligence) to prevent future
attacks and utilize a limited number of law enforcement
resources in an optimum manner. A major challenge faced by
most of the law enforcement and intelligence organizations is
efficiently and accurately analyzing the growing volumes of crime
related data. The vast geographical diversity and the complexity
of crime patterns have made the analyzing and recording of
crime data more difficult. Data mining is a powerful tool that can
be used effectively for analyzing large databases and deriving
important analytical results. This paper presents an intelligent
crime analysis system which is designed to overcome the above
mentioned problems. The proposed system is a web-based system
which comprises of crime analysis techniques such as hotspot
detection, crime comparison and crime pattern visualization. The
proposed system consists of a rich and simplified environment
that can be used effectively for processes of crime analysis.
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
This document analyzes crime data from Los Angeles from 2010-2020 to identify trends, predict future crime rates, and make recommendations to law enforcement. Key findings include:
- Crime rates have generally declined over the past decade but dropped significantly in 2020 due to the pandemic.
- Robbery, burglary, and vandalism are the most common crimes.
- Areas with lower median household incomes tend to have higher crime rates.
- Females are consistently the most impacted victims of crime over the past 10 years.
- Southwest LA and other areas have been identified as "hot spots" for criminal activity.
Predictive analysis indicates crime rates will continue increasing post-lockdown in
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET Journal
This document discusses using data mining and analytics techniques to analyze crime data and predict crime rates. It proposes using linear regression on crime data from the Indian government to predict future crime occurrences and identify high-risk regions. The system would analyze factors like crime type, offender age, month, and year to build a regression model. This model could then predict crime rates and indicate whether a region is high or low risk for criminal activity. Graphs and tables would visualize the predictions to help law enforcement allocate resources. The goal is to help reduce crime and increase public safety by identifying patterns in historical crime data.
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Crime rate analysis using k nn in python
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningWaqas Tariq
The concern about national security has increased significantly since the 26/11 attacks at Mumbai, India. However, information and technology overload hinders the effective analysis of criminal and terrorist activities. Data mining applied in the context of law enforcement and intelligence analysis holds the promise of alleviating such problem. In this paper we use a clustering/classify based model to anticipate crime trends. The data mining techniques are used to analyze the city crime data from Tamil Nadu Police Department. The results of this data mining could potentially be used to lessen and even prevent crime for the forth coming years
Crime Data Analysis, Visualization and Prediction using Data MiningAnavadya Shibu
This paper presents a general idea about the
model of Data Mining techniques and diverse crimes. It also
provides an inclusive survey of competent and valuable
techniques on data mining for crime data analysis. The
objective of the data mining is to recognize patterns in
criminal manners in order to predict crime anticipate
criminal activity and prevent it. This project implements a
novel data mining techniques like KNN, Text Clustering, IR
tree for investigating the crime data sets and sorts out the
accessible problems. The collective knowledge of various
data mining algorithms tend certainly to afford an
enhanced, incorporated, and precise result over the crime
prediction in the banking sectors Our law enforcement
organizations require to be adequately outfitted to defeat
and prevent the crime. This project is developed using Java
as front-end and MySQL as back-end. Supporting
applications like Sunset, NetBeans are used to make the
portal more interactive.
This document discusses using machine learning techniques like clustering and decision trees to analyze crime data from Chicago between 2014-2016. It aims to identify crime hot spots and patterns to help police allocate resources more efficiently. The document applies k-means clustering to crime data grouped by location and type, identifying a "vice" cluster with crimes like prostitution and drugs in two adjacent wards. It suggests police could use temporal and hourly crime patterns from the analysis to optimize staff scheduling and deployment. The document also discusses using decision trees and k-nearest neighbors algorithms on the crime data supplemented with temperature and unemployment data to further explore crime patterns.
This document outlines a project to analyze crime and census data in London. It describes a multi-phase approach including: 1) loading and visualizing crime data, 2) adding census data to the model and performing clustering and regression analysis, and 3) using the results to inform data mining. Key analysis techniques include k-means clustering of census variables to categorize areas, linear regression of census factors on crime types, and decision tree analysis using both crime and census data. The goal is to understand how socioeconomic factors relate to crime levels and types in different parts of London.
This document discusses using data mining techniques for crime analysis and prediction. It describes collecting unstructured crime data from various sources and storing it in a NoSQL database. Classification algorithms like Naive Bayes are used to classify crime reports. Apriori algorithm identifies patterns in past crimes. A decision tree is used for prediction. Visualization tools like heat maps, graphs and Neo4j are used to display crime patterns, rates and profiles over time and locations. Future work involves using these techniques for criminal profiling.
This document discusses crime analysis and its applications in community-oriented policing. Crime analysis involves understanding crime patterns through statistical analysis and crime mapping to identify problems and potential solutions. It helps police departments target areas with high crime rates or unusual increases in crime. Crime analysis also examines relationships between crimes in terms of time, location, offender characteristics, and causal factors to aid investigations of serial crimes and displacement. The core functions of law enforcement like prevention, investigation, and apprehension can be enhanced through crime analysis.
A Survey on Data Mining Techniques for Crime Hotspots PredictionIJSRD
A crime is an act which is against the laws of a country or region. The technique which is used to find areas on a map which have high crime intensity is known as crime hotspot prediction. The technique uses the crime data which includes the area with crime rate and predict the future location with high crime intensity. The motivation of crime hotspot prediction is to raise people’s awareness regarding the dangerous location in certain time period. It can help for police resource allocation for creating a safe environment. The paper presents survey of different types of data mining techniques for crime hotspots prediction.
Compstat strategic police management for effective crime deterrence in new yo...Frank Smilda
This document provides an overview of Compstat, a police management model pioneered in the New York Police Department in 1994. It discusses key aspects of the Compstat model including its emphasis on timely and accurate intelligence, rapid deployment of resources, effective tactical strategies, and constant evaluation. The document also examines research on Compstat and its implementation in various police departments in the US. It analyzes how Compstat aims to create a data-driven and collaborative approach to crime reduction through middle manager accountability and information sharing.
News document analysis by using a proficient algorithmIJERA Editor
News articles analyzing is one of the emerging research topic in the past few years. News paper discusses various types (political, education, employment, sports, agriculture, crime, medicine, business, etc) of news in different levels such as International, National, state and district level. In this news articles, crime discussion plays a major role because one crime leads to a many other crimes and also affect many other lives. In India, Madurai is one of the important places which have many historical monuments. Madurai is a sensitive place. This paper analyzes the crimes which occur in the year 2015 in and around Madurai. This analysis helps to police department to reduce the occurrence of crime in the future. This proposed system used Support Vector Machine (SVM) for effectively classify the document. News documents are preprocessed using pruning and stemming. From the stemmed words, the informative words are selected and weighted using feature selection methods such as Term-Frequency and Inverse Document Frequency (TF-IDF) and Chi-square. It returns the high dimensional vector space. It is reduced to low dimension using Latent Semantic Analysis (LSA) method. Compute the cosine similarity between the key document and news documents. Based on the value, the news documents are labeled as crime and non-crime. Some of the documents are used to train the SVM classifier. Some of the documents are used to test the performance of developed system. From the comparative study, it is identified that the performance of the proposed approach improves the classification accuracy.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Viterbi optimization for crime detection and identificationTELKOMNIKA JOURNAL
In this paper, we introduce two types of hybridization. The first contribution is the hybridization between the Viterbi algorithm and Baum Welch in order to predict crime locations. While the second contribution considers the optimization based on decision tree (DT) in combination with the Viterbi algorithm for criminal identification using Iraq and India crime dataset. This work is based on our previous work [1]. The main goal is to enhance the results of the model in both consuming times and to get a more accurate model. The obtained results proved the achievement of both goals in an efficient way.
ANALYSIS OF ROADWAY FATAL ACCIDENTS USING ENSEMBLE-BASED META-CLASSIFIERSijaia
In the past decades, a lot of effort has been put into roadway traffic safety. With the help of data mining, the analysis of roadway traffic data is much needed to understand the factors related to fatal accidents. This paper analyses Fatality Analysis Reporting System (FARS) dataset using several data mining
algorithms. Here, we compare the performance of four meta-classifiers and four data-oriented techniques known for their ability to handle imbalanced datasets, entirely based on Random Forest classifier. Also, we study the effect of applying several feature selection algorithms including PSO, Cuckoo, Bat and Tabu on improving the accuracy and efficiency of classification. The empirical results show that the Threshold
selector meta-classifier combined with over-sampling techniques results were very satisfactory. In this regard, the proposed technique has gained a mean overall Accuracy of 91% and a Balanced Accuracy that varies between 96% to 99% using 7-15 features instead of 50 original features.
According to the Nilson report, the global Credit card and debit card fraud resulted in losses amounting to $24.71 billion in 2016 and 72% were bored by the Card issuers. Therefore, the card issue companies are eager to predict the fraud in real time and in advance to reduce their loss and protect their revenue. The goal of the project is to provide fraud analytics for credit card issue companies to predict fraud in real-time and in advance. By building a supervised fraud prediction model, we are aiming to capture the maximum number of real frauds while limiting the occurrence of mis-flagged frauds, in order to achieve a win-win situation both maximize our ROI and achieve customer satisfaction.
Propose Data Mining AR-GA Model to Advance Crime analysisIOSR Journals
This document proposes a data mining model to advance crime analysis using association rule (AR) and genetic algorithm (GA). The model has three correlated dimensions: a crime dataset, criminal dataset, and geo-crime dataset. AR will be applied to each dataset separately to extract patterns, then GA will be used to mix the resulting ARs and exploit relationships across the three dimensions. This is intended to help detect universal crime patterns and speed up the crime solving process. The model was applied to real crime data from a sheriff's office and validated. Privacy-preserving techniques are also suggested to hide sensitive rules from appearing in the results.
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...IJARIIT
Criminalization may be a general development that has significantly extended in previous few years. In
order, to create the activity of the work businesses easy, use of technology is important. Crime investigation analysis
is a section records in data mining plays a crucial role in terms of predicting and learning the criminals. In our
paper, we've got planned an incorporated version for physical crime as well as cybercrime analysis. Our approach
uses data mining techniques for crime detection and criminal identity for physical crimes and digitized forensic tools
(DFT) for evaluating cybercrimes. The presented tool named as Comparative Digital Forensic Process tool
(CDFPT) is entirely based on digital forensic model and its stages named as Comparative Digital Forensic Process
Model (CDFPM). The primary step includes accepting the case details, categorizing the crime case as physical crime
or cybercrime and sooner or later storing the data in particular databases. For physical crime analysis we've used kmeans
approach cluster set of rules to make crime clusters. The k-means method effects are a lot advantageous by the
utilization of GMAPI generation. This provides advanced and consumer-friendly visual-aid to k-means approach for
tracing the region of the crime. we have applied KNN for criminal identification with the
help of observing beyond crimes and finding similar ones that suit this crime, if no past document is discovered then
the new crime sample are introduced to the crime data-set. With the advancements of web, the network form has
become much more complicated and attacking methods are further more than that as well. For crime analysis
we're detecting the attacks executed on host system through an outsider the usage of
assorted digitized forensic tools to produce information security with the help of generating reports for an
event which could need any investigation. Our digitized technique aids the development of the society
by helping the investigation businesses to follow a custom-built investigative technique in crime analysis and criminal
identification as opposed to manually looking the database to analyze criminal activities, and as a
result facilitate them in combating crimes.
Survey on Crime Interpretation and Forecasting Using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to analyze crime data and predict crime patterns in Bangalore, India. It first provides background on the increasing issue of crime and importance of understanding crime patterns. It then reviews related work applying clustering, classification, and other algorithms to crime data from various locations. Next, it discusses motivations for using machine learning to predict crimes in advance. The paper then compares different studies that have used techniques like k-means clustering, decision trees, naive bayes, and random forests on crime data. It evaluates these techniques and their limitations in accurately analyzing crime patterns and predicting future crimes. Finally, the document proposes using these machine learning methods and data mining approaches on crime data from Bangalore to help law enforcement agencies
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
1) The document discusses using machine learning techniques to predict crime patterns and types based on historical crime data.
2) It proposes using a random forest classification algorithm to analyze crime data and predict the type of crime that may occur in a particular area.
3) The random forest algorithm would be trained on a dataset containing information about past crimes like date, location, and crime type to make predictions about future crimes.
This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naïve Bayesian classifier in order to predict potential crime types. To further analyse crimes’ datasets, the paper introduces an analysis study by combining our findings of Denver crimes’ dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise people’s awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within
a particular time.
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
Our proposed model will be able to extract crime patterns by using association rule mining and clustering to classify crime records on the basis of the values of crime attributes.
Abstract : Crime prediction is a topic of significant research across the fields of criminology, data mining, city planning, law enforcement, and political science. Crime patterns exist on a spatial level; these patterns can be grouped geographically by physical location, and analyzed contextually based on the region
in which crime occurs. This paper proposes a mechanism to parameterize street-level crime, localize crime hotspots, identify correlations between spatiotemporal crime patterns and social trends, and analyze the resulting data for the purposes of knowledge discovery and anomaly detection. The subject of this study is the county of Merseyside in the United Kingdom, over a span of 21 months beginning in December 2010 (monthly) through August 2012. Several types of crime are analyzed in this dataset, including Burglary and Antisocial Behavior. Through this analysis, several interesting findings are drawn about crime in Merseyside, including: hotspots with steadily increasing crime levels, hotspots with unstable crime levels, synchronous changes in crime trends throughout Merseyside as a whole, individual months in which certain hotspots behaved anomalously, and a strong correlation between crime hotspot locations and borough/postal code locations. We believe that this type of statistical and correlative analysis of crime patterns will help law enforcement agencies predict criminal activity, allocate resources, and promote community awareness to reduce overall crime rates.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Crime Analysis based on Historical and Transportation DataValerii Klymchuk
Contains experimental results based on real crime data from an urban city. Our set of statistics reveals seasonality in crime patterns to accompany predictive machine learning models assessing the risks of crime. Moreover, this work provides a discussion on implementation, design for a prototype of cloud based crime analytics dashboard.
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningWaqas Tariq
The concern about national security has increased significantly since the 26/11 attacks at Mumbai, India. However, information and technology overload hinders the effective analysis of criminal and terrorist activities. Data mining applied in the context of law enforcement and intelligence analysis holds the promise of alleviating such problem. In this paper we use a clustering/classify based model to anticipate crime trends. The data mining techniques are used to analyze the city crime data from Tamil Nadu Police Department. The results of this data mining could potentially be used to lessen and even prevent crime for the forth coming years
Crime Data Analysis, Visualization and Prediction using Data MiningAnavadya Shibu
This paper presents a general idea about the
model of Data Mining techniques and diverse crimes. It also
provides an inclusive survey of competent and valuable
techniques on data mining for crime data analysis. The
objective of the data mining is to recognize patterns in
criminal manners in order to predict crime anticipate
criminal activity and prevent it. This project implements a
novel data mining techniques like KNN, Text Clustering, IR
tree for investigating the crime data sets and sorts out the
accessible problems. The collective knowledge of various
data mining algorithms tend certainly to afford an
enhanced, incorporated, and precise result over the crime
prediction in the banking sectors Our law enforcement
organizations require to be adequately outfitted to defeat
and prevent the crime. This project is developed using Java
as front-end and MySQL as back-end. Supporting
applications like Sunset, NetBeans are used to make the
portal more interactive.
This document discusses using machine learning techniques like clustering and decision trees to analyze crime data from Chicago between 2014-2016. It aims to identify crime hot spots and patterns to help police allocate resources more efficiently. The document applies k-means clustering to crime data grouped by location and type, identifying a "vice" cluster with crimes like prostitution and drugs in two adjacent wards. It suggests police could use temporal and hourly crime patterns from the analysis to optimize staff scheduling and deployment. The document also discusses using decision trees and k-nearest neighbors algorithms on the crime data supplemented with temperature and unemployment data to further explore crime patterns.
This document outlines a project to analyze crime and census data in London. It describes a multi-phase approach including: 1) loading and visualizing crime data, 2) adding census data to the model and performing clustering and regression analysis, and 3) using the results to inform data mining. Key analysis techniques include k-means clustering of census variables to categorize areas, linear regression of census factors on crime types, and decision tree analysis using both crime and census data. The goal is to understand how socioeconomic factors relate to crime levels and types in different parts of London.
This document discusses using data mining techniques for crime analysis and prediction. It describes collecting unstructured crime data from various sources and storing it in a NoSQL database. Classification algorithms like Naive Bayes are used to classify crime reports. Apriori algorithm identifies patterns in past crimes. A decision tree is used for prediction. Visualization tools like heat maps, graphs and Neo4j are used to display crime patterns, rates and profiles over time and locations. Future work involves using these techniques for criminal profiling.
This document discusses crime analysis and its applications in community-oriented policing. Crime analysis involves understanding crime patterns through statistical analysis and crime mapping to identify problems and potential solutions. It helps police departments target areas with high crime rates or unusual increases in crime. Crime analysis also examines relationships between crimes in terms of time, location, offender characteristics, and causal factors to aid investigations of serial crimes and displacement. The core functions of law enforcement like prevention, investigation, and apprehension can be enhanced through crime analysis.
A Survey on Data Mining Techniques for Crime Hotspots PredictionIJSRD
A crime is an act which is against the laws of a country or region. The technique which is used to find areas on a map which have high crime intensity is known as crime hotspot prediction. The technique uses the crime data which includes the area with crime rate and predict the future location with high crime intensity. The motivation of crime hotspot prediction is to raise people’s awareness regarding the dangerous location in certain time period. It can help for police resource allocation for creating a safe environment. The paper presents survey of different types of data mining techniques for crime hotspots prediction.
Compstat strategic police management for effective crime deterrence in new yo...Frank Smilda
This document provides an overview of Compstat, a police management model pioneered in the New York Police Department in 1994. It discusses key aspects of the Compstat model including its emphasis on timely and accurate intelligence, rapid deployment of resources, effective tactical strategies, and constant evaluation. The document also examines research on Compstat and its implementation in various police departments in the US. It analyzes how Compstat aims to create a data-driven and collaborative approach to crime reduction through middle manager accountability and information sharing.
News document analysis by using a proficient algorithmIJERA Editor
News articles analyzing is one of the emerging research topic in the past few years. News paper discusses various types (political, education, employment, sports, agriculture, crime, medicine, business, etc) of news in different levels such as International, National, state and district level. In this news articles, crime discussion plays a major role because one crime leads to a many other crimes and also affect many other lives. In India, Madurai is one of the important places which have many historical monuments. Madurai is a sensitive place. This paper analyzes the crimes which occur in the year 2015 in and around Madurai. This analysis helps to police department to reduce the occurrence of crime in the future. This proposed system used Support Vector Machine (SVM) for effectively classify the document. News documents are preprocessed using pruning and stemming. From the stemmed words, the informative words are selected and weighted using feature selection methods such as Term-Frequency and Inverse Document Frequency (TF-IDF) and Chi-square. It returns the high dimensional vector space. It is reduced to low dimension using Latent Semantic Analysis (LSA) method. Compute the cosine similarity between the key document and news documents. Based on the value, the news documents are labeled as crime and non-crime. Some of the documents are used to train the SVM classifier. Some of the documents are used to test the performance of developed system. From the comparative study, it is identified that the performance of the proposed approach improves the classification accuracy.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Viterbi optimization for crime detection and identificationTELKOMNIKA JOURNAL
In this paper, we introduce two types of hybridization. The first contribution is the hybridization between the Viterbi algorithm and Baum Welch in order to predict crime locations. While the second contribution considers the optimization based on decision tree (DT) in combination with the Viterbi algorithm for criminal identification using Iraq and India crime dataset. This work is based on our previous work [1]. The main goal is to enhance the results of the model in both consuming times and to get a more accurate model. The obtained results proved the achievement of both goals in an efficient way.
ANALYSIS OF ROADWAY FATAL ACCIDENTS USING ENSEMBLE-BASED META-CLASSIFIERSijaia
In the past decades, a lot of effort has been put into roadway traffic safety. With the help of data mining, the analysis of roadway traffic data is much needed to understand the factors related to fatal accidents. This paper analyses Fatality Analysis Reporting System (FARS) dataset using several data mining
algorithms. Here, we compare the performance of four meta-classifiers and four data-oriented techniques known for their ability to handle imbalanced datasets, entirely based on Random Forest classifier. Also, we study the effect of applying several feature selection algorithms including PSO, Cuckoo, Bat and Tabu on improving the accuracy and efficiency of classification. The empirical results show that the Threshold
selector meta-classifier combined with over-sampling techniques results were very satisfactory. In this regard, the proposed technique has gained a mean overall Accuracy of 91% and a Balanced Accuracy that varies between 96% to 99% using 7-15 features instead of 50 original features.
According to the Nilson report, the global Credit card and debit card fraud resulted in losses amounting to $24.71 billion in 2016 and 72% were bored by the Card issuers. Therefore, the card issue companies are eager to predict the fraud in real time and in advance to reduce their loss and protect their revenue. The goal of the project is to provide fraud analytics for credit card issue companies to predict fraud in real-time and in advance. By building a supervised fraud prediction model, we are aiming to capture the maximum number of real frauds while limiting the occurrence of mis-flagged frauds, in order to achieve a win-win situation both maximize our ROI and achieve customer satisfaction.
Propose Data Mining AR-GA Model to Advance Crime analysisIOSR Journals
This document proposes a data mining model to advance crime analysis using association rule (AR) and genetic algorithm (GA). The model has three correlated dimensions: a crime dataset, criminal dataset, and geo-crime dataset. AR will be applied to each dataset separately to extract patterns, then GA will be used to mix the resulting ARs and exploit relationships across the three dimensions. This is intended to help detect universal crime patterns and speed up the crime solving process. The model was applied to real crime data from a sheriff's office and validated. Privacy-preserving techniques are also suggested to hide sensitive rules from appearing in the results.
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...IJARIIT
Criminalization may be a general development that has significantly extended in previous few years. In
order, to create the activity of the work businesses easy, use of technology is important. Crime investigation analysis
is a section records in data mining plays a crucial role in terms of predicting and learning the criminals. In our
paper, we've got planned an incorporated version for physical crime as well as cybercrime analysis. Our approach
uses data mining techniques for crime detection and criminal identity for physical crimes and digitized forensic tools
(DFT) for evaluating cybercrimes. The presented tool named as Comparative Digital Forensic Process tool
(CDFPT) is entirely based on digital forensic model and its stages named as Comparative Digital Forensic Process
Model (CDFPM). The primary step includes accepting the case details, categorizing the crime case as physical crime
or cybercrime and sooner or later storing the data in particular databases. For physical crime analysis we've used kmeans
approach cluster set of rules to make crime clusters. The k-means method effects are a lot advantageous by the
utilization of GMAPI generation. This provides advanced and consumer-friendly visual-aid to k-means approach for
tracing the region of the crime. we have applied KNN for criminal identification with the
help of observing beyond crimes and finding similar ones that suit this crime, if no past document is discovered then
the new crime sample are introduced to the crime data-set. With the advancements of web, the network form has
become much more complicated and attacking methods are further more than that as well. For crime analysis
we're detecting the attacks executed on host system through an outsider the usage of
assorted digitized forensic tools to produce information security with the help of generating reports for an
event which could need any investigation. Our digitized technique aids the development of the society
by helping the investigation businesses to follow a custom-built investigative technique in crime analysis and criminal
identification as opposed to manually looking the database to analyze criminal activities, and as a
result facilitate them in combating crimes.
Survey on Crime Interpretation and Forecasting Using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to analyze crime data and predict crime patterns in Bangalore, India. It first provides background on the increasing issue of crime and importance of understanding crime patterns. It then reviews related work applying clustering, classification, and other algorithms to crime data from various locations. Next, it discusses motivations for using machine learning to predict crimes in advance. The paper then compares different studies that have used techniques like k-means clustering, decision trees, naive bayes, and random forests on crime data. It evaluates these techniques and their limitations in accurately analyzing crime patterns and predicting future crimes. Finally, the document proposes using these machine learning methods and data mining approaches on crime data from Bangalore to help law enforcement agencies
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
1) The document discusses using machine learning techniques to predict crime patterns and types based on historical crime data.
2) It proposes using a random forest classification algorithm to analyze crime data and predict the type of crime that may occur in a particular area.
3) The random forest algorithm would be trained on a dataset containing information about past crimes like date, location, and crime type to make predictions about future crimes.
This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naïve Bayesian classifier in order to predict potential crime types. To further analyse crimes’ datasets, the paper introduces an analysis study by combining our findings of Denver crimes’ dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise people’s awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within
a particular time.
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
Our proposed model will be able to extract crime patterns by using association rule mining and clustering to classify crime records on the basis of the values of crime attributes.
Abstract : Crime prediction is a topic of significant research across the fields of criminology, data mining, city planning, law enforcement, and political science. Crime patterns exist on a spatial level; these patterns can be grouped geographically by physical location, and analyzed contextually based on the region
in which crime occurs. This paper proposes a mechanism to parameterize street-level crime, localize crime hotspots, identify correlations between spatiotemporal crime patterns and social trends, and analyze the resulting data for the purposes of knowledge discovery and anomaly detection. The subject of this study is the county of Merseyside in the United Kingdom, over a span of 21 months beginning in December 2010 (monthly) through August 2012. Several types of crime are analyzed in this dataset, including Burglary and Antisocial Behavior. Through this analysis, several interesting findings are drawn about crime in Merseyside, including: hotspots with steadily increasing crime levels, hotspots with unstable crime levels, synchronous changes in crime trends throughout Merseyside as a whole, individual months in which certain hotspots behaved anomalously, and a strong correlation between crime hotspot locations and borough/postal code locations. We believe that this type of statistical and correlative analysis of crime patterns will help law enforcement agencies predict criminal activity, allocate resources, and promote community awareness to reduce overall crime rates.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Crime Analysis based on Historical and Transportation DataValerii Klymchuk
Contains experimental results based on real crime data from an urban city. Our set of statistics reveals seasonality in crime patterns to accompany predictive machine learning models assessing the risks of crime. Moreover, this work provides a discussion on implementation, design for a prototype of cloud based crime analytics dashboard.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Investigating crimes using text mining and network analysisZhongLI28
For the PDF part, copyright belongs to the original author. If there is any infringement, please contact us immediately, we will deal with it promptly.
Crime analysis mapping, intrusion detection using data miningVenkat Projects
This document discusses using data mining techniques like K-nearest neighbors and artificial neural networks for crime analysis mapping and intrusion detection. It proposes collecting crime data, grouping it, clustering it, and forecasting it to help reduce crime rates. The existing system relies on manual storage and analysis of crime data. The proposed system would use data mining tools like artificial neural networks and knowledge discovery in databases to automatically analyze crime data collected from police to help identify patterns and solve cases. It outlines collecting both supervised and unsupervised crime data, identifying unnecessary data, and using tools like SAM to train on supervised data and solve other cases.
The document compares different machine learning algorithms for predicting crime hotspots using historical crime data from 2015-2018 in a large coastal city in China. It finds that an LSTM model outperformed KNN, random forest, SVM, naive Bayes and CNN when using historical crime data alone. The LSTM model was further improved by including additional built environment data on points of interest and road network density as covariates. The study demonstrates that machine learning, especially LSTM, can effectively predict crime hotspots and that incorporating related environmental factors into models enhances predictive power over using crime history alone.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
San Francisco Crime Analysis Classification Kaggle contestSameer Darekar
This document presents a project to analyze and predict crime in San Francisco using data mining techniques. The objectives are to analyze the spatial and temporal relationships of crime, predict the category of crime in a location based on variables like location and date, and suggest safest paths between places. The authors describe using naive Bayes, decision tree, random forest and support vector machine classifiers on a dataset of over 878,000 crime incidents to classify crimes and identify patterns. Cross-validation is used to evaluate the classifiers on the training data. The results are intended to help the police department understand crime patterns and deploy resources more efficiently.
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
With a substantial increase in crime across the globe, there is a need for analysing the crime data to lower the crime rate. This helps the police and citizens to take necessary actions and solve the crimes faster. In this paper, data mining techniques are applied to crime data for predicting features that affect the high crime rate. Supervised learning uses data sets to train, test and get desired results on them whereas Unsupervised learning divides an inconsistent, unstructured data into classes or clusters. Decision trees, Naïve Bayes and Regression are some of the supervised learning methods in data mining and machine learning on previously collected data and thus used for predicting the features responsible for causing crime in a region or locality. Based on the rankings of the features, the Crimes Record Bureau and Police Department can take necessary actions to decrease the probability of occurrence of the crime.
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
With a substantial increase in crime across the globe, there is a need for analysing the crime data to lower
the crime rate. This helps the police and citizens to take necessary actions and solve the crimes faster. In
this paper, data mining techniques are applied to crime data for predicting features that affect the high
crime rate. Supervised learning uses data sets to train, test and get desired results on them whereas
Unsupervised learning divides an inconsistent, unstructured data into classes or clusters. Decision trees,
Naïve Bayes and Regression are some of the supervised learning methods in data mining and machine
learning on previously collected data and thus used for predicting the features responsible for causing
crime in a region or locality. Based on the rankings of the features, the Crimes Record Bureau and Police
Department can take necessary actions to decrease the probability of occurrence of the crime.
Crime forecasting system for soic finalTwaha Minai
The document proposes developing a Crime Forecasting System (CFS) to help reduce crime rates in Karachi, Pakistan. The CFS would collect and analyze crime incident data to help law enforcement agencies predict crime patterns and prevent crimes. It would provide a user-friendly interface with crime reports, maps showing crime locations, and statistical analysis. The goal is to address current issues like a lack of data sharing between agencies, inefficient resource allocation, inconsistent data storage, and slow response times. The CFS could help cut crime rates by enabling more proactive policing strategies.
PhotoQR: A Novel ID Card with an Encoded Viewgerogepatton
There is an increasing interest in developing techniques to identify and assess data to allow an easy and
continuous access to resources, services or places that require thorough ID control. Usually, in order to
give access to these resources, different kinds of documents are mandatory. In order to avoid forgeries
without the need of extra credentials, a new system –named photoQR, is here proposed. This system is
based on a ID card having two objects: one person’s picture (pre-processed via blur and/or swirl
techniques) and one QR code containing embedded data related to the picture. The idea is that the picture
and the QR code can assess each other by a proper hash value in the QR. The QR without the picture
cannot be assessed and vice versa. An open source prototype of the photoQR system has been implemented
in Python and can be used both in offline and real-time environments, which effectively combines security
concepts and image processing algorithms to obtain data assessment.
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI ...gerogepatton
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI & FL 2024) provides a
forum for researchers who address this issue and to present their work in a peer-reviewed forum. Authors
are solicited to contribute to the conference by submitting articles that illustrate research results, projects,
surveying works and industrial experiences that describe significant advances in the following areas, but
are not limited to these topics only.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)gerogepatton
3rd International Conference on Artificial Intelligence Advances (AIAD 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area advanced Artificial Intelligence. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the research area. Core areas of AI and advanced multi-disciplinary and its applications will be covered during the conferences.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Information Extraction from Product Labels: A Machine Vision Approachgerogepatton
This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...gerogepatton
Aiming at the problems of poor local search ability and precocious convergence of fuzzy C-cluster
recursive genetic algorithm (FOLD++), a new fuzzy C-cluster recursive genetic algorithm based on
Bayesian function adaptation search (TS) was proposed by incorporating the idea of Bayesian function
adaptation search into fuzzy C-cluster recursive genetic algorithm. The new algorithm combines the
advantages of FOLD++ and TS. In the early stage of optimization, fuzzy C-cluster recursive genetic
algorithm is used to get a good initial value, and the individual extreme value pbest is put into Bayesian
function adaptation table. In the late stage of optimization, when the searching ability of fuzzy C-cluster
recursive genetic is weakened, the short term memory function of Bayesian function adaptation table in
Bayesian function adaptation search algorithm is utilized. Make it jump out of the local optimal solution,
and allow bad solutions to be accepted during the search. The improved algorithm is applied to function
optimization, and the simulation results show that the calculation accuracy and stability of the algorithm
are improved, and the effectiveness of the improved algorithm is verified
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Soft Computing (...gerogepatton
10th International Conference on Artificial Intelligence and Soft Computing (AIS 2024) will
provide an excellent international forum for sharing knowledge and results in theory, methodology, and
applications of Artificial Intelligence, Soft Computing. The Conference looks for significant
contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical
aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from
both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
Employee attrition refers to the decrease in staff numbers within an organization due to various reasons.
As it has a negative impact on long-term growth objectives and workplace productivity, firms have
recognized it as a significant concern. To address this issue, organizations are increasingly turning to
machine-learning approaches to forecast employee attrition rates. This topic has gained significant
attention from researchers, especially in recent times. Several studies have applied various machinelearning methods to predict employee attrition, producing different resultsdepending on the employed
methods, factors, and datasets. However, there has been no comprehensive comparative review of multiple
studies applying machine-learning models to predict employee attrition to date. Therefore, this study aims
to fill this gap by providing an overview of research conducted on applying machine learning to predict
employee attrition from 2019 to February 2024. A literature review of relevant studies was conducted,
summarized, and classified. Most studies agree on conducting comparative experiments with multiple
predictive models to determine the most effective one.From this literature survey, the RF algorithm and
XGB ensemble method are repeatedly the best-performing, outperforming many other algorithms.
Additionally, the application of deep learning to employee attrition prediction issues also shows promise.
While there are discrepancies in the datasets used in previous studies, it is notable that the dataset
provided by IBM is the most widely utilized. This study serves as a concise review for new researchers,
facilitating their understanding of the primary techniques employed in predicting employee attrition and
highlighting recent research trends in this field. Furthermore, it provides organizations with insight into
the prominent factors affecting employee attrition, as identified by studies, enabling them to implement
solutions aimed at reducing attrition rates.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AIFU 2024) is a forum for presenting new advances and research results in the fields of Artificial Intelligence. The conference will bring together leading researchers, engineers and scientists in the domain of interest from around the world. The scope of the conference covers all theoretical and practical aspects of the Artificial Intelligence.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern Analysis
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
DOI: 10.5121/ijaia.2021.12106 83
SUPERVISED AND UNSUPERVISED
MACHINE LEARNING METHODOLOGIES
FOR CRIME PATTERN ANALYSIS
Divya Sardana1
, Shruti Marwaha2
and Raj Bhatnagar3
1
Teradata Corp., Santa Clara, CA 95054, USA
2
Stanford University, Palo Alto, CA 94305, USA
3
University of Cincinnati, Cincinnati, OH 45219, USA
ABSTRACT
Crime is a grave problem that affects all countries in the world. The level of crime in a country has a big
impact on its economic growth and quality of life of citizens. In this paper, we provide a survey of trends of
supervised and unsupervised machine learning methods used for crime pattern analysis. We use a spatio-
temporal dataset of crimes in San Francisco, CA to demonstrate some of these strategies for crime
analysis. We use classification models, namely, Logistic Regression, Random Forest, Gradient Boosting
and Naive Bayes to predict crime types such as Larceny, Theft, etc. and propose model optimization
strategies. Further, we use a graph based unsupervised machine learning technique called core periphery
structures to analyze how crime behavior evolves over time. These methods can be generalized to use for
different counties and can be greatly helpful in planning police task forces for law enforcement and crime
prevention.
KEYWORDS
Crime pattern analysis, Machine Learning, Supervised Models, Unsupervised methods
1. INTRODUCTION
Crime is a major problem that affects all parts of the world. There are many different factors that
have an effect on the rate of crime, such as education, income level, location, time of the year,
climate, and employment conditions [1]. There are different types of crime such as vehicle theft,
bribery, extortion, terrorism etc. Depending upon the nature of crime, each type has its very own
specific characteristics which need to be investigated for mitigating it. Further, some types of
crimes exhibit similarities in their nature, such as similarities in the time of the day they occur, or
similarities in the specific location where they occur. A study of these crime characteristics as
well as similarities can be immensely helpful to the law enforcement agencies to develop a better
understanding of crimes and the factors which can help in resolving these crimes and controlling
their frequency of occurrence.
Many counties have made their crime incident reports freely available online these days [2]. This
has led to a rise of interest in the research community to discover methodologies to aid in crime
pattern analysis. Several data mining and statistical analysis techniques have been researched as
well as utilized by the law enforcement agencies to help in the identification and investigation of
crimes to make cities and counties safe.
The process of crime analysis using Machine Learning involves the use of both supervised as
well as unsupervised models to gain insights from both structured and unstructured data. In this
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
84
paper we classify the currently available machine learning techniques to analyze crimes as
belonging to supervised or unsupervised methodologies. Next, we demonstrate the use of both
supervised as well as unsupervised machine learning methodologies to analyze a spatio-temporal
dataset of crimes in San Francisco (SF).
For supervised learning, we use classification techniques, namely, Logistic Regression, Random
Forest, Gradient Boosting classifier and Naive Bayes model to predict the type of crime based
upon features such as crime location and time. We illustrate the process we use in data cleaning,
feature extraction, model building and evaluation. Furthermore, we improve our model
performance by accumulating data into three super classes, namely, infractions, misdemeanors
and felonies.
For unsupervised learning, we use an unsupervised graph algorithm to find core periphery
structures to analyze SF crime data. Specifically, in a temporal dataset of crimes, core periphery
structures help us to study relationships between very dense nodes which lie in core clusters
surrounded by sparse periphery nodes. Further, the way these relationships change over time
reveal many interesting patterns in the crime datasets. We demonstrate use cases where core
periphery structures are a better suited unsupervised machine learning methodology over
clustering for analyzing patters in dense graphs originating from crime datasets.
The rest of the paper is organized as follows. In section 2, we provide a survey of trends in
supervised and unsupervised machine learning techniques used in literature for the analysis of
crime datasets. In section 3, we provide a description of the SF crime dataset. Next, we outline
the supervised learning approaches that we have used to analyze SF crime dataset in section 4.
Here, we describe our methodology used along with evaluation results and suggested
optimization techniques. In section 5, we describe the use of an unsupervised machine learning
methodology, namely, core periphery structures to help understand the evolution of crimes over
years. Finally, we provide a conclusion of our paper in section 6.
2. SURVEY OF TRENDS OF SUPERVISED AND UNSUPERVISED MACHINE
LEARNING ALGORITHMS FOR CRIME ANALYSIS
In literature, many machine learning strategies have been used to analyze crime data with case
studies using different crime datasets. In this section we provide a survey of trends of machine
learning approaches used for crime analysis. We also provide a brief summary of the machine
learning methods that we implement for the analysis of SF crime dataset.
Crimes can be classified into different categories, such as, violent crimes (e.g., murder), traffic
violence, sexual assault and cyber-crimes. Depending upon the nature of the crime, different
machine learning technologies are suitable to study those crimes [2]. Both supervised as well
unsupervised machine learning methods have been used in literature for the analysis of crime
datasets.
First, we provide a survey of supervised machine learning methods that have been used in
literature for crime analysis. In [3], a crime hotspot prediction algorithm was developed using
Linear Discriminant Analysis (LDA) and K Nearest Neighbor (KNN). Cesario et al. [4]
developed an Auto-Regressive Integrative Moving Average model (ARIMA) to build a predictive
model for crime trend forecasting in urban populations. In [5], the authors used two classification
algorithms, namely, K-nearest Neighbor (KNN) and boosted decision tree to analyze a
Vancouver Police department crime dataset. In [6], Edoka used different classification models,
such as Logistic Regression, K Nearest Neighbors and XGBoost to classify the types of crimes in
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
85
dataset of crimes from Chicago crime porter. In [7], the authors build an ensemble learning model
to predict spatial occurrences of crimes of different types. Cichosz [8] used point of interest-
based data (such as bus stops, cinema halls, etc.) from geographical information systems to build
a crime risk prediction model for urban areas.
In this paper, we use four classification methodologies, namely, logistic regression, Random
Forest, Gradient Boosting classifier and Naive Bayes model to predict the crime category in SF
crime dataset. Logistic regression and Naïve Bayes algorithms are used to build baseline models.
Next, ensemble learning based models, Gradient Boosting and Random Forest are used to build
models which combine output of multiple classifiers. Such classifiers are known to reduce
variance of the final model and require fine parameter tuning [9]. We provide a comparative
analysis of these models using evaluation measures called F1 Measure and logloss ratio [10]. We
prefer these measures over accuracy as our evaluation metric because of the imbalanced nature of
class distribution in the dataset. Further, we suggest optimizations in modeling process to
improve the achieved accuracy. The prediction of crime types using classification techniques for
SF crime dataset has been studied before in [11], [12], [13], [14], [15] and [16]. Our classification
approach is unique in the way we extract the feature zip code used for model building from
latitude, longitude. Zip code combines two features into one and is much easier to interpret for
county police officers. Further, we suggest modelling optimizations by grouping crime types into
three super classes (Infractions, Misdemeanors and Felonies). This is a standard crime grouping
used in criminal law [17].
Next, we provide a survey of unsupervised methodologies used in literature for crime pattern
analysis. Sharma et al. [18] used K-means clustering to identify patterns in cyber-crime. They
used text mining-based approaches to transform data from web pages into features to be used for
clustering. Kumar and Toshniwal [19] used K-modes clustering algorithm to group road
accidents occurring in Dehradun (India) into segments. They further used association rule mining
to link the cause of accidents along with the identified clusters. Joshi et al. [20] used K-means
clustering on a dataset of crimes from New South Wales, Australia to identify cities with high
crime rates. In [21], the authors used fuzzy c-means algorithm to identify potential crime
locations for different cognizable crimes such as burglary, robbery and theft.
While different clustering-based approaches have been used to analyze crime patterns in
literature, another unsupervised technique called core periphery structures has not yet been fully
utilized to analyze crime datasets. Core-periphery structures are suited to applications where the
entities and relationships in a crime dataset can be expressed as a network or a graph. It helps to
extract very dense core clusters surrounded by sparse periphery clusters. For example, in crime
type corruption, a network of all emails exchanged between the involved entities could be studied
using core periphery structures to find out the chief (core) players involved in corruption
surrounded by their cohorts.
In literature, it has been demonstrated that in many crime networks, core periphery structures
naturally exist, consisting of a core of nodes densely connected to one another and a surrounding
sparse periphery [22]. In [23], Xu and Chen provide a topological analysis of a terrorist network
to show the presence of a core group consisting of Osama Bin Laden and his closest personnel
who issue orders to people in the rest of the network. In [24], the authors analyzed the structure
of a drug trafficking mafia organization in Southern Calabria, Italy. Using graph centrality
measures, they identified a few most central players that had a more active role in criminal
activities than other subjects in the network. A core periphery-based model is used in [25] to
study a Czech political corruption network. The authors demonstrate different types of ties
between network entities depending upon their position in the core or periphery groups. In [26],
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
86
the authors provide a core periphery analysis of the email network of people involved in the
corruption activity that led to the collapse of Enron Corporation.
Several other network-based crime analysis techniques have been proposed in literature. In [27]
the authors use a graph-based link prediction algorithm to identify missing links in a criminal
network constructed using a dataset for an Italian criminal case against a mafia. Budur et al. [28]
developed a hybrid approach using Gradient Boosted Machine (GBM) models as well as a
weighted pagerank model to identify hidden links in crime-based networks.
In this paper, we illustrate the importance of core periphery structures in identifying patterns in
criminal networks that vary over time and space. We further demonstrate that in some situations
core periphery structures are better suited than clustering algorithms as the choice of
unsupervised learning technique in the investigation of crime datasets.
In a nutshell, as a trend we see that supervised learning approaches are more suited for tasks such
as prediction of crime type, prediction of time or geographical location of crimes where crimes
come from multiple crime types. The unsupervised learning approaches are more applicable to
crime types where actors or players are involved with explicit links or relationships amongst
them, such as email or verbal exchanges between them. Examples include crime types such as
corruption, terrorism, fraud, prostitution and kidnapping.
3. SAN FRANCISCO CRIME DATASET
We use a dataset of crimes that took place in San Francisco, CA from 2003 to 2015. The police
department of San Francisco has made its crime records data public over at [29] Further, a part of
the city’s crime data was made available as a Kaggle dataset [30] as part of an open competition
to predict crime types occurring at different places in the city. The original source of this dataset
is from San Francisco’s Open Data platform [31].
3.1. Dataset Description and Feature Extraction
The SF crime dataset varies spatially as well as temporally. The crimes that occurred in different
geographical locations of SF between the years 2003 to 2015 are documented in the dataset, with
details about the hour and day of the week of occurrence. Each crime is annotated with a crime
category, such as Embezzlement and Larceny. There are 878,049 incidents of crime or total rows
in the dataset. A snapshot of this dataset is provided in figure 1. Each row in the dataset
comprises of columns: timestamp, day of the week, description of crime, latitude, longitude,
crime type, police district, how the incident was resolved and address of the incident. Using the
timestamp feature Dates, we further extracted the derived features, hour, month and year for each
row. These are easier to interpret in the modelling algorithm. Further, we used the latitude and
longitude information to determine the zip code corresponding to each crime incident. For this
task, we used the ZipCodeDatabase from pyzipcode [32] python package. This process of feature
extraction is visualized in figure 2.
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
87
Figure 1. Snapshot of SF crime dataset
Figure 2. Feature Extraction Process from the SF Crime dataset
In figure 3, we visually present the 39 crime categories that occur in the dataset. In figures 4, 5
and 6, we demonstrate the spatial (over zip code) as well as temporal (over years and over hours)
variation in the SF crime dataset.
Given the spatial and temporal variation in the dataset, we decided to use the features Hour,
Month, Year, Days of Week, Police District, Zip code for our supervised and unsupervised
learning tasks.
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
88
Figure 3. 39 different crime categories in the SF crime dataset
Figure 4. Spatial variation (over zip codes) in the SF crime dataset
Figure 5. Temporal variation (over years) in the SF crime dataset
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
89
Figure 6. Temporal variation (over hours) in the SF crime dataset
4. SUPERVISED LEARNING METHODOLOGIES TO ANALYZE SF CRIME
DATASET
A major challenge in using supervised machine learning techniques for crime prediction is to
build scalable and efficient models that make accurate and timely predictions. Keeping this in
mind, we developed different classification models to train the SF crime dataset and performed a
comparative analysis of these models to choose a model that best fits the analysis task.
For all the models built, we used the features Hour, Month, Year, Days of Week, Police District,
and Zip code. This enables us to capture both the spatial as well as temporal variation present in
the dataset. The goal of modelling is to predict the crime type, the column “Category” in the input
dataset as the target variable used in the model. Next, we describe our modelling approaches and
methodology in detail.
4.1. Model Building
We used four classification approaches for our model building task, namely, Logistic Regression,
Random Forest, Gradient Boosting and Naive Bayes classifier [9]. In figure 7, we describe the
whole workflow that we used in the modelling process. We started our analysis with data
exploration and feature selection as described in section 3. We converted all categorical variables,
such as “days of the week” to dummy variables using one-hot encoding. Next, we performed 10-
fold cross validation on the training set, simultaneously doing parameter tuning for the input
models. All the classification algorithms treat this modelling task as a multi class classification
problem. First, we built baseline models using Naïve Bayes and Logistic Regression. Next, we
moved on to building ensemble learning models using Random Forest and Gradient Boosting
approaches. These are more powerful modelling techniques which combine multiple classifiers to
generate output and lead to variance minimization. These two algorithms require some intricate
parameter tuning. We tuned two parameters namely, “max tree depth” and “number of trees” by
trying different permutations of the parameters and choosing the ones which gave the best value
of F1-measure and logloss. These measures are described in detail in section 4.2. The developed
models were then used to get predictions for the validation set. The models built were finally
evaluated using the measures described in the next section.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
90
Figure 7. Workflow used in Model Building task for SF crime category classification
4.2. Model Evaluation
For evaluating the models built, we used two evaluation metrics, namely, F1 measure and
Logloss [10]. Let TP, FP, TN, FN denote the True Positives, False Positives, True Negatives and
False Negatives respectively in the confusion matrix constructed for the classification problem.
The formulation for the evaluation measure Accuracy is as below.
For an imbalanced dataset, like the current SF crime dataset with an imbalanced class
distribution, if we output all data points to belong to the majority class, then accuracy results will
show a misleading value of high accuracy. For this reason, we do not use accuracy evaluation
measure for our case. We use F1 Measure as formulated below.
F1 measure uses a harmonic mean of precision and recall and ensures that both are balanced in
the output. A high F1 measure indicates that the model achieves both low false positives and low
false negatives. In cases where class prediction is based upon class probability, logloss measure is
preferred over both Accuracy and F1 Measure. This is because logloss is a probabilistic measure
and it penalizes the predictions which are less certain based upon the prediction probability. It is
formulated as below.
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
91
In the above formula, yij is a binary indicator variable which denotes whether sample i belongs to
class j or not and pij indicates the probability of sample i belonging to class j. M is the total
number of classes and N is the total number of samples. The lower the value of logloss, the more
accurate is the model. Given the imbalanced nature of classes in the input SF crime dataset, using
F1 measure and log loss scores ensure that we don’t solely rely on accuracy which can be
misleading in such scenarios.
For all the models, we used 10-fold cross validation. The modelling results obtained for different
classification techniques on the validation set are presented in table 1. In this table, both Gradient
Boosting as well as Logistic Regression obtain the best overall values for F1 measure and
Logloss score. We chose Logistic Regression as our model of choice because of its faster runtime
than that of Gradient Boosting model. We submitted this Logistic Regression based model onto
Kaggle and got the rank 752 out of 2239 total submissions (in oct 2016).
Table 1. Model Evaluation results for SF crime category classification.
Algorithm F1 Measure Logloss
Logistic Regression 0.154 2.53041
Random Forest 0.190 16.7994
Gradient Boosting 0.151 2.53044
Naïve Bayes 0.154 2.8516
4.3. Model Optimization
We re-performed the model building task after grouping crime categories into three super classes.
It is a common practice in criminal law to classify crimes into three categories, namely,
Misdemeanors, Infractions and Felonies [15]. Infractions are the mildest of the crimes. Crimes
which are more serious are labeled as misdemeanors and the most serious of all crimes are
labeled as felonies. The constructed crime subgroups are displayed in table 2.
After the above categorization, we performed the modeling and validation again for our best
performing model, Logistic Regression and found that the F1 score increased to 0.59 and the log-
loss ratio reduced to 0.67. These results are very encouraging because even if the law
enforcement agencies can identify the patterns of locations and times which are more prone to
crime super classes, infractions, misdemeanors or felonies, it will be very useful for taking
concrete steps for planning police task forces for law enforcement and crime prevention.
5. UNSUPERVISED LEARNING METHODOLOGIES TO ANALYZE SF CRIME
DATASET
5.1. Motivation
Given the raw data of crimes in SF (latitude and longitude), we used contour or density plots to
study the spread of different crime types. Two of these plots are presented in figures 8 and 9 for
crime types larceny/theft and prostitution. This illustrates that crimes belonging to different
categories are localized to specific geographical areas in SF. We further plotted the normalized
crime counts for top 10 crimes to see some preliminary trends that how they vary with time
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
92
(figures 10 and 11). This spatial and temporal variation of crimes motivated us to study the
similarities between crimes of different categories to aid in the understanding of modus operandi
of these crimes over time and space. We used two graph based unsupervised learning methods
called clustering and core periphery structures to analyze a network of crimes built from the SF
crime dataset. We illustrate that in this case, core periphery structures are more suited for analysis
of dense graphs that vary over time. Next we describe our methodology used for SF crime
network construction and the clustering and core periphery results for this network.
Table 2. SF crime category Super Classes.
Crime Super Class Crime Types in this Super Class
Infraction LOITERING
Misdemeanor BAD CHECKS, BRIBERY, DISORDERLY CONDUCT,
DRUG/NARCOTIC, DRIVING UNDER THE INFLUENCE,
DRUNKENNESS, EMBEZZLEMENT, FAMILY OFFENSES,
FORGERY/COUNTERFEITING, GAMBLING, LIQUOR LAWS,
MISSING PERSON, NON-CRIMINAL, OTHER OFFENSES,
PORNOGRAPHY/OBSCENE MAT, PROSTITUTION, RECOVERED
VEHICLE, RUNAWAY, SECONDARY CODES, STOLEN PROPERTY,
SUICIDE, SUSPICIOUS OCC, TREA, TRESPASS, WARRANTS
Felony ARSON, ASSAULT, BURGLARY, EXTORTION, FRAUD,
KIDNAPPING, LARCENY/THEFT, ROBBERY, SEX OFFENSES
FORCIBLE, SEX OFFENSES NON-FORCIBLE, VANDALISM,
VEHICLE THEFT, WEAPON LAWS
Figure 8. Contour plot for spatial crime density distribution of crime type Larceny/Theft
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
93
Figure 9. Contour plot for spatial crime density distribution of crime type Prostitution
Figure 10. Variation of normalized crime count for top 10 crimes with years
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
94
Figure 11. Variation of normalized crime count for top 10 crimes with months
5.2. SF Crime Network Construction
Using the raw SF crime dataset, we constructed a graph G = (V, E) containing a vertex
corresponding to each of the 39 crime categories. The edge weights among different crime types
were constructed as the Tanimoto similarity [33] calculated between crime types using their
geographical proximity in terms of shared zip codes. Tanimoto similarity is an advanced version
of both cosine similarity as well as Jaccard coefficient. It assumes the calculation of similarity
between vectors whose attributes may or may not be binary. In case the vectors are binary, it
reduces to Jaccard coefficient. Given two crime types expressed as a vector of zip codes A and B,
the calculation of Tanimoto similarity between them is formalized as below.
We used a cutoff of 0.2 as the minimum similarity level required between any two crime
categories to establish an edge between the two. This led to the construction of a weighted graph
of crime types with weights as Tanimoto similarities. Further, one graph was constructed for each
of the years 2005, 2010 and 2015 to analyze temporal variation of crime trends. In table 3, we list
the global graph clustering coefficient [34] of each of these constructed graphs for the three
years. The global graph clustering coefficient is a ratio of the number of closed triplets to number
of all triplets in the graph. It gives a measure of density in the graph and lies between 0 and 1.
The high clustering coefficients for graphs of all years denote that all these graphs are very dense
in nature.
Table 3. Model Evaluation results for SF crime category classification.
Year for which Graph is constructed Graph clustering coefficient
2005 0.943
2010 0.962
2015 0.849
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
95
5.3. Clustering Results for SF Crime Network
We used a state-of-the-art graph clustering algorithm called ClusterONE [35] to find clusters in
all the three SF Crime graphs for years 2005, 2010 and 2015. ClusterONE uses a greedy growth-
based algorithm to grow clusters from seeds. We ran this algorithm with its default set of
parameters. The clustering results are shown in figure 12. In this figure, we can see that for the
years 2005 and 2010, all the nodes are being clustered into one big cluster. For the year 2015, all
crime categories, except Sex Offences Non-Forcible, Bad Checks, Prostitution and Extortion
form one big cluster. This behavior is due to the high graph density or cliquish nature of the
graph. In all these graphs, the width of the edges has been drawn proportional to their similarity
value. It is to be noted that the edge similarity in these graphs varies amongst different crime
types even when they are heavily connected in terms of topology.
Figure 12. Clustering results for the graph of SF crime types for the years 2005, 2010 and 2015 using
ClusterONE algorithm
Most of the graph clustering algorithms including ClusterONE [35] are good at identifying
clusters which have high topological density in the graph and give less importance to the edge
weight-based density. Thus, as the next step, we used a graph core periphery finding algorithm
called CP-MKNN [36] [37] which uses both topological density as well as edge weight density in
a graph to extract very dense core clusters surrounded by sparse periphery clusters. A core-
periphery algorithm based on the extension of ClusterONE clustering algorithm has been
proposed in [38], however, we decided to use the algorithm CP-MKNN as our choice of core
periphery algorithm because as demonstrated in [37], it is better than [38] in separating very
dense cores from their sparser surroundings. Further, there is an advantage in using core
periphery structures over simple clusters to study graphs which vary over time because we can
study how different nodes move from cores to peripheries and vice versa. This movement of data
14. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
96
can really help us in studying change dynamics, which in the current case of crime analysis can
help in understanding how crime patterns evolve over time.
5.4. Core Periphery Structures for SF Crime Network
In figure 13, we demonstrate the core periphery structures obtained by CP-MKNN [36] [37] for
the SF crime graphs for the years 2005, 2010 and 2015. We set input parameter K = 5.
Figure 13. Core Periphery structure results for the graph of SF crime types for the years 2005, 2010 and
2015 using CP-MKNN algorithm
Crimes like Vehicle Theft, Robbery and Fraud are always in the core cluster for the three years
2005, 2010 and 2015. Similarly, crimes like gambling and bribery are always in the periphery for
all the three years. There are some crime types which keep moving between the core and
periphery clusters as the years pass by. For example, the crime type Family Offences is a
peripheral crime in 2005 and 2015 and became a core crime in the year 2010. The crime type
Disorderly Conduct was in periphery in the year 2005 but a part of core cluster for the years 2010
and 2015.
This type of rich information about how crime patterns of different types of crimes change over
time with respect to each other can be highly beneficial for the law enforcement agencies to plan
their task forces for tackling anticipated crimes in different locations. Further, this information
can also help in improving awareness among the general public about the changing nature of
crimes in their neighborhoods and help them be safe.
6. CONCLUSION
In this paper, we provide an extensive survey of trends of both supervised as well as unsupervised
machine learning techniques that exist for the analysis of crimes. We use a dataset of crimes in
15. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
97
San Francisco to build supervised and unsupervised learning strategies to extract useful patterns
and trends from it.
Specifically, for supervised algorithms, we proposed a novel way of feature extraction and built
models using Logistic Regression, Gradient Boosting, Naive Bayes and Random Forest. Based
upon model evaluation results using F1 measure and Logloss score, Gradient Boosting and
Logistic Regression performed the best amongst all models. Moreover, Logistic Regression ran
much faster than the Gradient Boosting model. We further optimized this model by grouping
crime categories into super classes: “Infraction”, “Misdemeanor” and “Felony”. Our modelling
and optimization workflow can be used as a general framework to predict crime categories/ super
classes in crime incident reports from various counties. This will be highly beneficial to the law
enforcement agencies to do a risk analysis of different locations and times which are more prone
to crimes of different types.
In terms of unsupervised learning methodologies, we illustrated the use of core periphery
structures to analyze patterns in crimes that vary over time and space. We provided examples of
how certain crimes move between cores and peripheries over time while some crime types
remain stable over years. Such a behavioral study of crimes can be extremely beneficial to the
police to plan their task forces based upon the changing nature of crimes.
In a nutshell, we provide a demonstration of both supervised as well unsupervised strategies for
the analysis of crimes in San Francisco. Our modelling and optimization workflow can be used as
a general framework to predict crime categories and analyze crime behaviors using crime
incident reports from various counties all over the world. Our approaches can be immensely
helpful in planning measures for crime prevention and raising awareness among the citizens of a
county.
As a future research direction, for the unsupervised approach, we could calculate an advanced
version of distance between two crime types taking into account the actual distance in miles
between the zip codes where crimes occur. This will improve upon the accuracy of the patterns
found using clustering and core periphery structures. This will further help us in finetuning the
clusters of crime which in turn can be used as super classes in the supervised learning
optimization approach to improve its accuracy and applicability.
REFERENCES
[1] H. Adel, M. Salheen, and R. Mahmoud, “Crime in relation to urban design. Case study: the greater
Cairo region,” Ain Shams Eng. J., vol. 7, no. 3, pp. 925-938, 2016.
[2] S. Prabakaran, and S. Mitra, “Survey of analysis of crime detection techniques using data mining and
machine learning,” J. Physics: Conference Series, vol. 1000, no. 1, 2018.
[3] Q. Zhang, P. Yuan, Q. Zhou, and Z. Yang, “Mixed spatial-temporal characteristics based crime hot
spots prediction,” in IEEE 20th Intl. Conf. on Comput. Supported Cooperative Work in Des.
(CSCWD), 2016, pp. 97-101.
[4] E. Cesario, C. Catlett, and D. Talia, “Forecasting crimes using autoregressive models,” in IEEE 14th
Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence
and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and
Technology Congress (DASC/PiCom/DataCom/CyberSciTech, 2016, pp.795-802.
[5] S. Kim, P. Joshi, P.S. Kalsi, and P. Taheri, “Crime analysis through machine learning,” in IEEE 9th
Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON),
2018, pp. 415-420.
[6] N.O. Edoka, “Crime Incidents Classification Using Supervised Machine Learning Techniques:
Chicago,” PhD. Diss., Dublin, National College of Ireland, 2020.
16. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
98
[7] Y. Lamari, B. Freskura, A. Abdessamad, S. Eichberg, and S. de Bonviller, “Predicting Spatial Crime
Occurrences through an Efficient Ensemble-Learning Model.” ISPRS Int. Journal of Geo-Info, vol. 9,
no. 11, 2020.
[8] P. Cichosz, “Urban Crime Risk Prediction Using Point of Interest Data,” ISPRS Int. J. Geo-Info., vol.
9, no. 7, 2020.
[9] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1, no. 10, New
York: Springer series in statistics, 2001.
[10] K. Ramasubramanian, and A. Singh, "Machine Learning Model Evaluation." Machine Learning
Using R, Apress, Berkeley, CA, pp. 425-464, 2017.
[11] M. Vaquero Barnadas, “Machine learning applied to Crime Prediction,” BS thesis, Universitat
Politcnica de Catalunya, 2016.
[12] A. Chandrasekar, A. Sunder Raj, and P. Kumar. “Crime prediction and classification in San Francisco
City,” URL http://cs229. stanford.edu/proj2015/228_report.pdf, 2015.
[13] I. Pradhan, “Exploratory Data Analysis and Crime Prediction in San Francisco,” MS Project, San
Jose State University, USA, 2018.
[14] S. T. Ang, W. Wang, and S. Chyou, “San Francisco crime classification,” UCSD Project Report URL
https://cseweb.ucsd.edu/classes/wi15/cse255-a/reports/fa15/037.pdf, 2015.
[15] Y. Abouelnaga, “San Francisco crime classification,” arXiv preprint arXiv:1607.03626, 2016.
[16] X. Wu, “An informative and predictive analysis of the San Francisco police department crime data,”
Ph.D. Diss., UCLA, 2016.
[17] “Classifications of Crime”, FindLaw, URL https://criminal.findlaw.com/criminal-law-
basics/classifications-of-crimes.html
[18] A. Sharma, and S. Sharma, “An intelligent analysis of web Crime Data using Data Mining,” Int. J.
Eng. and Innov. Tech. (IJEIT), vol. 2, no. 3, 2012.
[19] S. Kumar, and D. Toshniwal, “A Data Mining framework to analyze Road Accident Data,” J. Big
Data, vol. 2, no. 1, pp. 26, 2015.
[20] A. Joshi, A. Sai Sabitha, and T. Choudhury, “Crime analysis using K- means clustering,” in 3rd Int.
Conf. on Comp. Intell. and Net. (CINE), pp. 33-39, 2017.
[21] B. Sivanagaleela, and S. Rajesh, “Crime Analysis and Prediction Using Fuzzy C-Means Algorithm,”
in 3rd Int. Conf. on Trends in Electronics and Informatics (ICOEI), 2019, pp. 595-599.
[22] C. Morselli, Inside criminal networks, vol. 8, New York, Springer, 2009.
[23] J. Xu, and H. Chen, “The topology of dark networks,” Communications of the ACM, vol. 51, no. 10,
pp. 58-65, 2008.
[24] F. Calderoni, “The structure of Drug Trafficking mafias: the Ndrangheta and cocaine,” Crime, law
and social change, vol. 58, no. 3, pp. 321-349, 2012.
[25] T. Divik, J. K. Dijkstra, and T. A. B. Snijders, “Structure, multiplexity, and centrality in a corruption
network: the Czech Rath affair,” Trends in Organized Crime, vol. 22, no. 3, pp. 274-297, 2019.
[26] A. Kantamneni, “Identifying Communities as Core-Periphery Structures in Evolving Networks,” MS
Thesis University of Cincinnati, 2016.
[27] G. Berlusconi, F. Calderoni, N. Parolini, N., M. Verani, and C. Piccardi, “Link prediction in criminal
networks: A tool for criminal intelligence analysis,” PloS one, vol. 11, no. 4, 2016.
[28] E. Budur, S. Lee, and V. S. Kong, “Structural analysis of Criminal Network and predicting hidden
links using Machine Learning,” URL arXiv:1507.05739, 2015.
[29] “San Francisco Crime Data and Arrest Information”, Civic Hub, URL
https://www.civichub.us/ca/san-francisco/gov/police-department/crime-data.
[30] “San Francisco Crime Classification”, Kaggle, URL https://www.kaggle.com/c/sf-crime.
[31] “Police Department Incident Reports”, DataSF, URL https://data.sfgov. org/Public-Safety/Police-
Department-Incident-Reports-Historical-2003/tmnf-yvry/.
[32] N. Van Gheem, “Pyzipcode 1.0. 2016," URL https://pypi.org/project/pyzipcode/, 2016.
[33] A.A. Goshtasby, "Similarity and dissimilarity measures," Image registration. Springer, London, pp. 7-
66, 2012.
[34] A. Barrat, M. Barthelemy, R. Pastor-Satorras and A. Vespignani, "The architecture of complex
weighted networks," in Proc. Nat. Acad. Sci., 2004, vol. 101, no. 11, pp. 3747-3752.
[35] T. Nepusz, H. Yu, and A. Paccanaro, “Detecting overlapping protein complexes in Protein-Protein
Interaction Networks,” Nature Methods, vol. 9, pp. 471-472, 2012.
[36] D. Sardana, “Analysis of Meso-scale Structures in Weighted Graphs,” PhD diss., University of
Cincinnati, 2017.
17. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
99
[37] D. Sardana and R. Bhatnagar, “Graph Algorithm to find Core Periphery structures using Mutual K-
Nearest Neighbors” Int. J. Art. Intell. (IJAIA): in press, 2021.
[38] D. Sardana and R. Bhatnagar, “Core Periphery structures in weighted graphs using Greedy growth,”
in IEEE/WIC/ACM International Conference on Web Intelligence, 2016, pp. 1-8.
AUTHORS
Divya Sardana: Dr. Divya Sardana is a Senior Data Scientist at Teradata Corp.,
Santa Clara, CA, USA. She has a PhD in Computer Science from the University of
Cincinnati, OH US. Her research targets development of scalable machine learning
and graph algorithms for the analysis of complex datasets in interdisciplinary
domains of data science such as Bioinformatics, Sociology and Smart
Manufacturing.
Shruti Marwaha: Dr Shruti Marwaha is a Research Engineer in the Department of
Cardiovascular Medicine. She received her PhD in Systems Biology and
Physiology from University of Cincinnati in 2015 and has been working as a
Bioinformatician with Stanford Center for Undiagnosed Diseases since 2016. Her
primary role involves analysis and management of genomics and immunology
data. She focuses on benchmarking variant callers, optimizing tools for variant
prioritization and analysis of single cell immune data.
Raj Bhatnagar: Dr. Raj Bhatnagar is a Professor at the department of EECS at
University of Cincinnati, Cincinnati, OH, USA. His research interests encompass
developing algorithms for data mining and pattern recognition problems in various
domains including Bioinformatics, Geographic Information Systems,
Manufacturing, and Business.