This teaches how to implement a data science project using R.
This video tutorial teaches how to implement a data science project (R version). Youtube video link https://goo.gl/GZmJpC PowerPoint Slides: https://goo.gl/EQPdZE. R Script codes & datasets: https://goo.gl/wRFohE
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
This teaches how to implement a data science project using Python.
You can watch the youtube video via this link https://goo.gl/Mi4aJH
Jupyter notebook: https://goo.gl/AxRMe3
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Day by day data is increasing, and most of the data stored in a database after manual transformations and derivations. Scientists can facilitate data intensive applications to study and understand the behaviour of a complex system. In a data intensive application, a scientific model facilitates raw data products to produce new data products and that data is collected from various sources such as physical, geological, environmental, chemical and biological etc. Based on the generated output, it is important to have the ability of tracing an output data product back to its source values if that particular output seems to have an unexpected value. Data provenance helps scientists to investigate the origin of an unexpected value. In this paper our aim is to find a reason behind the unexpected value from a database using query inversion and we are going to propose some hypothesis to make an inverse query for complex aggregation function and multiple relationship (join, set operation) function.
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
This teaches how to implement a data science project using Python.
You can watch the youtube video via this link https://goo.gl/Mi4aJH
Jupyter notebook: https://goo.gl/AxRMe3
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Day by day data is increasing, and most of the data stored in a database after manual transformations and derivations. Scientists can facilitate data intensive applications to study and understand the behaviour of a complex system. In a data intensive application, a scientific model facilitates raw data products to produce new data products and that data is collected from various sources such as physical, geological, environmental, chemical and biological etc. Based on the generated output, it is important to have the ability of tracing an output data product back to its source values if that particular output seems to have an unexpected value. Data provenance helps scientists to investigate the origin of an unexpected value. In this paper our aim is to find a reason behind the unexpected value from a database using query inversion and we are going to propose some hypothesis to make an inverse query for complex aggregation function and multiple relationship (join, set operation) function.
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
I presented this at a 2014 Tableau Conference session with Albert Wong.
Netflix relies on data to make decisions ranging from buying and recommending content, to improving the streaming experience on devices.
This presentation shares our Big Data analytics architecture and the tools used to make data accessible throughout our business, focusing on how Tableau fits into our organization and why it aligns well with our culture.
Microsoft azure data fundamentals (dp 900) practice tests 2022SkillCertProExams
• For a full set of 450+ questions. Go to
https://skillcertpro.com/product/microsoft-azure-data-fundamentals-dp-900-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Machine learning for sensor Data AnalyticsMATLABISRAEL
במצגת זאת נראה כיצד עושים Machine Learning בסביבת MATLAB. נציג מספר יכולות ואפליקציות מובנות ההופכות את תהליך למידת המכונה ליעיל ומהיר יותר – כלים כמו ה-Classification Learner, ה-Regression Learner ו-Bayesian Optimization. בהסתמך על מידע המתקבל מחיישני סמארטפון, נבנה מערכת סיווג המזהה את הפעילות שמבצע המשתמש – הליכה, טיפוס במדרגות, שכיבה, וכו'
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
Introduction to Data Science, Prerequisites (tidyverse), Import Data (readr), Data Tyding (tidyr),
pivot_longer(), pivot_wider(), separate(), unite(), Data Transformation (dplyr - Grammar of Manipulation): arrange(), filter(),
select(), mutate(), summarise()m
Data Visualization (ggplot - Grammar of Graphics): Column Chart, Stacked Column Graph, Bar Graph, Line Graph, Dual Axis Chart, Area Chart, Pie Chart, Heat Map, Scatter Chart, Bubble Chart
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
Development of a Deep Learning people detection algorithm applied to UAV video stream for Search&Rescue applications.
In the Alps, mountain accidents and missing persons are incidents that occur very often, with peaks in the summer months due to the increased tourist arrivals. Typically, the missing person is located through calls, coordinated with the help of fellow hikers. When this is not possible, the mission focuses on locating the injured/missing person as quickly as possible to increase the chances of survival. Traditional methods many times turn out to be slow and require the use of a huge amount of resources in order to scan the largest portion of area, in the quickest time. For this reason, the focus in recent years has been directed toward experimenting how to employ drones to support Search&Rescue operations. At the present time, the Alpine and Speleological Rescue Corp of South Tyrol has been experimentally adopting the use of drones to assist in the search for missing persons. In particular, the unit is equipped with a MAVTech Q4X drone with a high-resolution RGB 30x camera. This device allows the surveillance of a very wide area, from a different perspective, and also enables monitoring of impervious and harsh locations. However, the huge variety and complexity of scenarios makes human manual recognition difficult for the operator, requiring high concentration, especially in long-lasting tasks, with negative consequences on target detection. In this context, an automatic Deep Learning algorithm that aids the operator in his work can be very effective. The proposed solution, developed in collaboration with the University of Trento, is based on YoloV3, a one-stage object-detector that has exhibited good performance on many datasets employed in literature. The results show that the AI algorithm is capable of learning effectively from training sets. A solution was successively implemented directly on the drone that takes advantage of a small computation architecture (NVIDIA Jetson Nano). This architecture was chosen because of its very small size and weight, and its ability to provide GPU acceleration with very low power consumption. To summarize, the proposed system processes frames from the UAV payload camera and sends any detection to the ground. The detections are shown on the monitor of a MAVTech PCS (Payload Control System), on which the payload video stream used by the camera operator during search operations is displayed.
Pragmatic South African Strategies in the Era of Artificial IntelligenceDr Sulaimon Afolabi
Pragmatic South African Strategies in the Era of Artificial Intelligence: Adaptable by other African Nations Excerpts from the Report of the Presidential Commission on the Fourth Industrial Revolution.Excerpts from the Report of the Presidential Commission on the Fourth Industrial Revolution
This material teaches object detection with deep learning by Dr Sulaimon Afolabi.
1. Understanding object detection
2. Variants of object detection
3. Use cases of object detection
4. Object detection APIs
5. Object detection implementation steps
To learn more about Africa4AI go to
Africa4AI.com
More Related Content
Similar to Implementing a data science project (R Version) Part1
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
I presented this at a 2014 Tableau Conference session with Albert Wong.
Netflix relies on data to make decisions ranging from buying and recommending content, to improving the streaming experience on devices.
This presentation shares our Big Data analytics architecture and the tools used to make data accessible throughout our business, focusing on how Tableau fits into our organization and why it aligns well with our culture.
Microsoft azure data fundamentals (dp 900) practice tests 2022SkillCertProExams
• For a full set of 450+ questions. Go to
https://skillcertpro.com/product/microsoft-azure-data-fundamentals-dp-900-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Machine learning for sensor Data AnalyticsMATLABISRAEL
במצגת זאת נראה כיצד עושים Machine Learning בסביבת MATLAB. נציג מספר יכולות ואפליקציות מובנות ההופכות את תהליך למידת המכונה ליעיל ומהיר יותר – כלים כמו ה-Classification Learner, ה-Regression Learner ו-Bayesian Optimization. בהסתמך על מידע המתקבל מחיישני סמארטפון, נבנה מערכת סיווג המזהה את הפעילות שמבצע המשתמש – הליכה, טיפוס במדרגות, שכיבה, וכו'
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial:
1. What is Data Science?
2. Job Roles in Data Science
3. Components of Data Science
4. Concepts of Statistics
5. Power of Data Visualization
6. Introduction to Machine Learning using R
7. Supervised & Unsupervised Learning
8. Classification, Clustering & Recommenders
9. Text Mining & Time Series
10. Deep Learning
To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2
Introduction to Data Science, Prerequisites (tidyverse), Import Data (readr), Data Tyding (tidyr),
pivot_longer(), pivot_wider(), separate(), unite(), Data Transformation (dplyr - Grammar of Manipulation): arrange(), filter(),
select(), mutate(), summarise()m
Data Visualization (ggplot - Grammar of Graphics): Column Chart, Stacked Column Graph, Bar Graph, Line Graph, Dual Axis Chart, Area Chart, Pie Chart, Heat Map, Scatter Chart, Bubble Chart
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
Development of a Deep Learning people detection algorithm applied to UAV video stream for Search&Rescue applications.
In the Alps, mountain accidents and missing persons are incidents that occur very often, with peaks in the summer months due to the increased tourist arrivals. Typically, the missing person is located through calls, coordinated with the help of fellow hikers. When this is not possible, the mission focuses on locating the injured/missing person as quickly as possible to increase the chances of survival. Traditional methods many times turn out to be slow and require the use of a huge amount of resources in order to scan the largest portion of area, in the quickest time. For this reason, the focus in recent years has been directed toward experimenting how to employ drones to support Search&Rescue operations. At the present time, the Alpine and Speleological Rescue Corp of South Tyrol has been experimentally adopting the use of drones to assist in the search for missing persons. In particular, the unit is equipped with a MAVTech Q4X drone with a high-resolution RGB 30x camera. This device allows the surveillance of a very wide area, from a different perspective, and also enables monitoring of impervious and harsh locations. However, the huge variety and complexity of scenarios makes human manual recognition difficult for the operator, requiring high concentration, especially in long-lasting tasks, with negative consequences on target detection. In this context, an automatic Deep Learning algorithm that aids the operator in his work can be very effective. The proposed solution, developed in collaboration with the University of Trento, is based on YoloV3, a one-stage object-detector that has exhibited good performance on many datasets employed in literature. The results show that the AI algorithm is capable of learning effectively from training sets. A solution was successively implemented directly on the drone that takes advantage of a small computation architecture (NVIDIA Jetson Nano). This architecture was chosen because of its very small size and weight, and its ability to provide GPU acceleration with very low power consumption. To summarize, the proposed system processes frames from the UAV payload camera and sends any detection to the ground. The detections are shown on the monitor of a MAVTech PCS (Payload Control System), on which the payload video stream used by the camera operator during search operations is displayed.
Pragmatic South African Strategies in the Era of Artificial IntelligenceDr Sulaimon Afolabi
Pragmatic South African Strategies in the Era of Artificial Intelligence: Adaptable by other African Nations Excerpts from the Report of the Presidential Commission on the Fourth Industrial Revolution.Excerpts from the Report of the Presidential Commission on the Fourth Industrial Revolution
This material teaches object detection with deep learning by Dr Sulaimon Afolabi.
1. Understanding object detection
2. Variants of object detection
3. Use cases of object detection
4. Object detection APIs
5. Object detection implementation steps
To learn more about Africa4AI go to
Africa4AI.com
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Introduction to AI for Nonprofits with Tapp Network
Implementing a data science project (R Version) Part1
1. Ahmad B. Abdullahi Ahmed Olanrewaju Bilikisu AderintoAkinyomade Owolabi
Olalekan OlapejuKamoldeen Abiona Oluwabunmi OgunnowoBusayo Coker
Version
2. Ahmad Bello Abdullahi Ahmed Olanrewaju Bilikisu Aderinto
Olalekan OlapejuKamoldeen Abiona Oluwabunmi OgunnowoBusayo Coker
Akinyomade Owolabi
Meteorologist,
Nigerian Meteorological Agency (NiMet),
Abuja
Senior Systems Analyst,
Management information systems unit
University of Ibadan, Ibadan
Head of operation,
Pakino Nigeria Ltds
Principal Consultant,
Cheetahsoft Consulting Limited
Abuja
System Engineer,
Computer Warehouse Group (CWG)
Assistant Superintendent of Corps II
Nigeria Security & Civil Defence Corps
Education Officer I
(Mathematicss & Further Mathematics)
Lagos Education District IVs
Programmes Officer,
New Nigeria Foundation
to TEAM
3. Arthur Samuel (1959)
Machine Learning is the field
Of study that gives computers
the ability to learn without
being explicitly programmed.
8. Project Description & Checklist
The Description
To use machine learning
techniques to perform
exploratory and predictive
analyses on crime data.
9. Project Description, Resources & Checklist
The Datasets
Additional data
(to be sourced later)
Dataset D
?
!
Data on the location
(i.e. geographical
coordinates) of the
police stations across
the country.
Dataset C
Data on the names of
police station and the
populationthat fall
under their
jurisdiction.
Dataset B
Data on crime
reported across the
country and the
respective police
stations
(2015/ 2016).
Dataset A
10. Project Description & Checklist
Checklist
Checklist 1
Is it a supervised, unsupervised or reinforcement machine
learning project?
13. Outcome feature is known
Task driven
Fits data
Its goal is to predict values in
continuous (regression) or categorical
(classification) format
Re-Inforcement
Learning
Unsupervised
Learning
Supervised
Learning
Outcome feature is unknown.
Data driven
Clusters data
Its goal is to find patterns
(clustering) in the data.
Outcome feature is unknown.
Circumstance driven.
Decides on data
Its goal is to learn how to decide
under a given circumstance.
14. Id Province Police Station Population Burglary
AB123 Gauteng Dunnottar 10479 141
AB123 North West Mmabatho 134138 773
Id Province Police Station Population Frequent Crime
AB123 Gauteng Dunnottar 10479 Burglary
AB123 North West Mmabatho 134138 Arson
Label
Supervised Learning
Labelled Data
Label
15. Id Province Police Station Population Burglary Crime Type
AB123 Gauteng Dunnottar 10479 141 Burglary
AB123 North West Mmabatho 134138 773 Arson
Unsupervised Learning
Unlabelled Data
16. Project Description & Checklist
Checklist
Checklist 1
Checklist 2
Is it a supervised or unsupervised machine learning project?
Is it a classification or regression task?
17. Id Province Police Station Population Burglary
AB123 Gauteng Dunnottar 10479 141
AB123 North West Mmabatho 134138 773
Regression
Id Province Police Station Population Frequent Crime
AB123 Gauteng Dunnottar 10479 Burglary
AB123 North West Mmabatho 134138 Arson
Classification
Supervised Learning
Labelled Data
The values are
continuous
The values are
categorical
18. Project Description, Resources & Checklist
Checklist
Checklist 1
Checklist 2
Is it a supervised, unsupervised or reinforcement machine
learning project?
Is it a classification or regression task?
Checklist 3 Identify the target feature or features to be clustered
Checklist 4 Can I get extra data or feature to boost my project?
19. Project Description, Resources & Checklist
Checklist 5
Checklist 6
What are the available solutions to the problem?
How do I intend to measure the performance of my model?
Checklist 7 How will my solution be deployed and utilised?
Checklist
22. Data Loading, Merging & Visualisation
Data Location
Computer | Server | Web | Cloud.
Where is the dataset located?
Data Form
Numeric | Text | Image | Audio | Video.
The dataset is what form? Alpha-
Data Size
byte, megabyte, gigabyte or terabyte.
How big is the dataset? Is the size in kilo
Analysis Platform
Can I analyse it on my computer or I need to engage the
Data Flow
as a stream or in batches?
Is it a real time data? Does it come
Data Loading Checklist
service of cloud based computing provider e.g. Microsoft Azure,
Amazon web service (AWS), google cloud etc.
23. Data Loading, Merging & Visualisation
Data Loading Steps
Step 1
RStudioStart Menu
Start RStudio
It is assumed that you have already installed RStudio
24.
25.
26.
27. This pane is for writing
codes
This pane is for writing
codes.
This shows the loaded
data
This for packages, plots etc
28. Data Loading, Merging & Visualisation
Data Loading Steps
Step 3 library("dplyr")
library("pastecs")
library("ggplot2")
Load the packages
Step 4 setwd("C:Project_AnalyticsSA_Crime_Analysis")
Set the working directory
Step 5
Dataset_A<-read.csv("datasetDataset_A.csv")
Load the data
Step 2 install.packages("dplyr")
install.packages("pastecs")
install.packages("ggplot2")
Import the necessary R packages
29. Data Loading, Merging & Visualisation
Project Data Loading
Viewing the top 6 Records
DatasetA
The dataset is in csv (comma delimited) format
Dataset A - Crime Reported and Police Station
# Loading the dataset
Dataset_A <- read.csv("Dataset_A.csv")
# Loading the dataset
head(Dataset_A)
#Sorting the records using 'Police_Station'
Dataset_A[Dataset_A$Police_Station,]
31. Data Loading, Merging & Visualisation
Reshaping the dataset
DatasetA
Province Police_Station Crime_Category Period_2015_2016
Eastern Cape Aberdeen All theft not mentioned elsewhere 51
Eastern Cape Aberdeen Theft out of or from motor vehicle 7
Eastern Cape Aberdeen Theft of motor vehicle and motorcycle 2
Eastern Cape Aberdeen Stock-theft 20
Long Format
Province Police_Station All theft not
mentioned elsewhere
Theft out of or from
motor vehicle
Theft of motor vehicle
and motorcycle
Stock-theft
Eastern Cape Aberdeen 51 7 2 20
Wide Format
32. Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Reshaping (Pivoting) the dataset from "long" to "wide" format
Dataset_A_Wide <- spread(Dataset_A, Crime_Category, Period_2015_2016)
head(Dataset_A_Wide, n=5)
Viewing the top 5 Records
33. Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Viewing the properties of the reshaped
str(Dataset_A_Wide)
34. Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Check the datasets for duplicates
This is a major checklist before merging this dataset with the other datasets.
length(duplicated(Dataset_A_Wide$Police_Station))
[1] 1143
35. Data Loading, Merging & Visualisation
Project Data Loading
Dataset B - Police Stations and the Population that they Cover
DatasetB
Viewing the top 5 Records
The dataset is in xlsx (MS excel) format
#Load the library
library("xlsx")
head(Dataset_B, n = 5)
Police_Station population_estimate
1 ABERDEEN 9866.916
2 ACORNHOEK 127623.360
3 ACTONVILLE 52830.848
4 ADDO 20938.325
5 ADELAIDE 13587.573
install.packages("xlsx")
#Sort the dataset
Dataset_B[Dataset_B$Police_Station,]
#Load the dataset
Dataset_B <- read.xlsx (“Dataset_B.xlsx")
NB: You need to Install java and set JAVA_HOME for it to work. Download java via the following link
http://www.oracle.com/technetwork/java/javase/downloads/jdk9-downloads-3848520.html
36. Data Loading, Merging & Visualisation
Project Data Loading
DatasetB
Viewing the attributes of the features
Check the datasets for duplicates
str(Dataset_B)
length(duplicated(Dataset_B$Police_Station))
[1] 1140
37. Data Loading, Merging & Visualisation
Project Data Loading
Dataset C - Police Stations and their Geo-Coordinates
DatasetC
Viewing the top 6 Records
The dataset is in tsv (tab delimited) format
#Load the dataset
Dataset_C <- read.table("Dataset_C.tsv", header=TRUE,sep='t')
#Sort the dataset
Dataset_C[Dataset_C$Police_Station,]
38. Data Loading, Merging & Visualisation
Project Data Loading
DatasetC
Viewing the attributes of the features
Check the datasets for duplicates
39. Total Records = 1142
Feature
Police_Station
LongitudeY
LatitudeX
Dataset C
Total Records = 1140
Feature
Police_Station
population_estimate
Dataset B
Total Records = 1143
Feature
Province
Police_Station
+27 features
Dataset A
40. Data Loading, Merging & Visualisation
Datasets Merging
Province
Police_Station
Crime_Category
Period_2015_2016
Police_Station
population_estimate
Police_Station
LongitudeY
LatitudeX
1143
1140 1142
41. Data Loading, Merging & Visualisation
Datasets Merging
Merging Dataset A & B
Note: Dataset A contains more records than Dataset B. Hence, Dataset A is the universal dataset.
paste("Size of Dataset A wide =" , nrow(Dataset_A_Wide)
paste("Size of Dataset B =" , nrow(Dataset_B))
paste("Size of Dataset C =" , nrow(Dataset_C))
Size of Dataset A_Wide = 1143
Size of Dataset B = 1140
#Left Join
Dataset_A_and_B <- left_join(Dataset_A_Wide, Dataset_B, by="Police_Station")
42. Data Loading, Merging & Visualisation
Datasets Merging
Merging Dataset A_B with Dataset C
Merging …
paste("Size of Dataset A_B =" , nrow(Dataset_A_B))
paste("Size of Dataset C =" , nrow(Dataset_C))
Size of Dataset A_B = 1143
Size of Dataset C = 1142
#Left Join
Dataset_A_B_C <- left_join(Dataset_A_B, Dataset_C, by="Police_Station")
43. Please subscribe to my youtube channel for the
other versions
And like the video on linkedin and youtube