SlideShare a Scribd company logo
1 of 49
Download to read offline
Air Pollution in Nova
Scotia: Analysis and
Predictions
Carlo Carandang Zhengyang Ma
Navjot Kaur Sahil Behl
Funto Akinyemi Maroof Rahman
Air Pollution in NS: Problem
Air Pollution
● Detrimental to health (asthma, breathing problems)
● Detrimental to environment
● Detrimental to NS economy
Air Pollution in NS: Business Case
● Recent push for green and smart cities
● NS economy based on tourism and nature
● To carry out these green initiatives, Policy Makers need:
○ accurate and convenient access to current and future air pollution conditions
○ to highlight pollution problems areas
○ to measure the effects of interventions to reduce pollution
○ to predict future pollution problems
2018 Open Data Hackathon
● Analyzed air particulate pollution datasets from NS Open Data Portal
● Produced time series prediction on concatenated table from the datasets
R Shiny App
https://sahil4.shinyapps.io/nsopendata/
Further Analysis of Air Pollution Data
● After Hackathon, wanted to further analyze the data
● Built a Data Warehouse in Oracle DB for Business Intelligence
● Cleaned, extracted, and transformed the data
● Loaded the data into a data warehouse
○ Dimension Tables loading into a Fact Table
● Fact table connected directly to Tableau for visualization
Technical: ETL and Data Warehouse
Technologies
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
Oracle SQL Developer
SQL*Loader
MS Excel
Cloud Dataprep
Tableau
Draw.io
Data Selection: Motivation
● We chose these datasets as it contained enough data for us to extract insights
into a business problem
● When searching, many of the open datasets available had much sparcity or was
not populated fully, and also the datasets were small (small number of rows and
columns)
● We ended up extracting the air particulate pollution datasets from the Nova
Scotia Open Data Portal, as there were several dataset on air particulate
matter from all around the Province
● We chose these datasets as it gave us a chance to extract multiple datasets and
perform ETL and analytics on them.
Obtaining the Data
To obtain the data, the following procedure was followed:
1. Go to https://data.novascotia.ca
2. Click on button ‘Data Catalogue’
3. In the search bar, search for ‘pollution air particulate’
4. You will see 30 results: sort by ‘Most Relevant’
5. You want the first 12 results (ignore the 11th result, which is a lookup table)
This will leave you with 11 csv files for analysis.
csv Description: 44 MB; 643,453 rows total
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Kentville_BAM.csv 498 KB 8,785 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_BAM.csv 4.7 MB 69,958 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Port Hawkesbury_BAM.csv 4.1 MB 55,559 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sable_Island_BAM.csv 4.6 MB 64,496 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Pictou_BAM.csv 6.8 MB 104,033 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_BAM.csv 3.4 MB 50,752 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_TEOM.csv 4 MB 60,691 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_TEOM.csv 790 KB 12,494 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Aylesford_BAM.csv 4.7 MB 68,025 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_BAM.csv 6.7 MB 96,433 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_TEOM.csv 3.8 MB 60,514 rows
Data Description
Column Sample Data Description
Date_Time 07/24/1998 03:00:00 PM date-time stamp
Pollutant PM2.5 fine particulate matter (microns)
Unit µg/m3 micrograms per cubic metre
Station Sydney location of instrument
Instrument TEOM instrument type
Average 24 average concentration
Time period coverage: 2001-01-01 to 2016-12-31
Cleaning
We chose to perform preprocessing of the data in Cloud Dataprep. We uploaded the
11 csv files that were extracted from the Nova Scotia Open Data Portal
Import the Data and Add to Flow
Recipe to Remove Rows with Null Values
Remove Rows with Null Values
Remove Rows with ‘InVld’
Remove Rows with ‘NoData’
Remove Rows with ‘<Samp’
Extraction: csv to staging tables
-- Extract csv to staging tables via Excel Macro using SQL Loader.
-- First, create upload table:
CREATE TABLE "DWUSER"."UPLOAD_TABLE"
( "DATE_TIME" VARCHAR2(250 BYTE),
"POLLUTANT" VARCHAR2(250 BYTE),
"UNIT" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"INSTRUMENT" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126) ) ;
Extraction: csv to staging tables
Next, utilize Excel Macro to load csv’s via SQL Loader:
Extraction: csv to staging tables
-- Run SQL*Loader via control files, in conjunction with Excel Macros:
LOAD DATA
INFILE *
APPEND
INTO TABLE HLFX_BAM_MONTHLY_AVG
Fields terminated by '~'
Trailing Nullcols
(DATE_TIME,POLLUTANT,STATION,AVERAGE)
BEGINDATA
9-Jan-2008~PM2.5~Halifax~4.814259
-- Load all 11 csv files via this method
Testing
Log file of initial extraction of csv files into Oracle DB table:
SQL*Loader: Release 11.2.0.2.0 - Production on Fri Mar 9 14:05:54 2018
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Control File: C:JUNKINTREVERSE.CTL
Data File: C:JUNKINTREVERSE.CTL
Bad File: C:JUNKINTREVERSE.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Testing: continued
Table HLFX_BAM_MONTHLY_AVG, loaded from every logical record.
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
DATE_TIME FIRST * ~ CHARACTER
POLLUTANT NEXT * ~ CHARACTER
STATION NEXT * ~ CHARACTER
AVERAGE NEXT * ~ CHARACTER
Table HLFX_BAM_MONTHLY_AVG:
132 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Transformation
-- Upload table transformed into tables partitioned by Station (location):
CREATE TABLE "DWUSER"."EVTB_HALIFAX_UPLOAD"
( "DATE_TIME" TIMESTAMP (6),
"POLLUTANT" VARCHAR2(250 BYTE),
"UNIT" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"INSTRUMENT" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126));
-- Repeat this for the 7 other Station Locations
Load Dim Tables and Fact Table
CREATE TABLE "DWUSER"."EVTB_FACT"
( "TIME_ID" VARCHAR2(250 BYTE),
"DATE_ID" VARCHAR2(250 BYTE),
"STATION_ID" VARCHAR2(250 BYTE),
"INSTRUMENT_ID" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126),
"ID" VARCHAR2(250 BYTE),
CONSTRAINT "PK_FACT" PRIMARY KEY ("ID"));
Load Dim Tables and Fact Table
CREATE TABLE "DWUSER"."EVTB_DIM_STATION"
( "STATION_ID" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"DESCRIPTION" VARCHAR2(250 BYTE),
CONSTRAINT "PK_STATION" PRIMARY KEY ("STATION_ID"));
-- Repeat this for DIM_DATE, DIM_INSTRUMENT, and DIM_TIME
Visualization
Fact table and Dim tables from Oracle database connected directly to Tableau for
visualization

More Related Content

What's hot

Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...BJ Jang
 
Grid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objectsGrid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objectsYunsu Lee
 
Weather scraper for your data warehouse
Weather scraper for your data warehouseWeather scraper for your data warehouse
Weather scraper for your data warehouseFru Louis
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedVasia Kalavri
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 
Presentation TDG #2
Presentation TDG #2Presentation TDG #2
Presentation TDG #2edofrederix
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing exampleTianyue Wang
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_frameworksamuelhard
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Big Data Spain
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems researchVasia Kalavri
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsMaxim Grinev
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsFujio Turner
 

What's hot (14)

Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
 
Grid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objectsGrid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objects
 
Weather scraper for your data warehouse
Weather scraper for your data warehouseWeather scraper for your data warehouse
Weather scraper for your data warehouse
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Horizons doc
Horizons docHorizons doc
Horizons doc
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
Presentation TDG #2
Presentation TDG #2Presentation TDG #2
Presentation TDG #2
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing example
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_framework
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 

Similar to Air Pollution in NS: Analysis and Predictions

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineDatabricks
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihoodAashish Jain
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...Ryousei Takano
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Jeremy Schneider
 
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...Jean-Claude Meteodyn
 

Similar to Air Pollution in NS: Analysis and Predictions (20)

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Module Owb Tuning
Module Owb TuningModule Owb Tuning
Module Owb Tuning
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Ns fundamentals 1
Ns fundamentals 1Ns fundamentals 1
Ns fundamentals 1
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Sorting_project_2.pdf
Sorting_project_2.pdfSorting_project_2.pdf
Sorting_project_2.pdf
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?
 
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacionalio-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
 
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
 

More from Carlo Carandang

Metyrosine and Psychosis
Metyrosine and PsychosisMetyrosine and Psychosis
Metyrosine and PsychosisCarlo Carandang
 
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...Carlo Carandang
 
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion SyndromeMetyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion SyndromeCarlo Carandang
 
Velocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent PsychosisVelocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent PsychosisCarlo Carandang
 
Clinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with DepressionClinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with DepressionCarlo Carandang
 
Data Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical TrialsData Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical TrialsCarlo Carandang
 
Teen Depression and Suicide
Teen Depression and SuicideTeen Depression and Suicide
Teen Depression and SuicideCarlo Carandang
 
Pediatric Bipolar Disorder
Pediatric Bipolar DisorderPediatric Bipolar Disorder
Pediatric Bipolar DisorderCarlo Carandang
 
SSRIs and Suicidality in Youth
SSRIs and Suicidality in YouthSSRIs and Suicidality in Youth
SSRIs and Suicidality in YouthCarlo Carandang
 
The Neurobiology of Adolescent Development
The Neurobiology of Adolescent DevelopmentThe Neurobiology of Adolescent Development
The Neurobiology of Adolescent DevelopmentCarlo Carandang
 
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...Carlo Carandang
 
Clinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergenciesClinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergenciesCarlo Carandang
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 
AI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewAI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
 
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...Carlo Carandang
 
Paxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical AnalysisPaxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical AnalysisCarlo Carandang
 
How The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For AnxietyHow The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For AnxietyCarlo Carandang
 

More from Carlo Carandang (20)

Psychosis in Youth
Psychosis in YouthPsychosis in Youth
Psychosis in Youth
 
Metyrosine and Psychosis
Metyrosine and PsychosisMetyrosine and Psychosis
Metyrosine and Psychosis
 
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
 
Anxiety Disorders
Anxiety DisordersAnxiety Disorders
Anxiety Disorders
 
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion SyndromeMetyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
 
Velocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent PsychosisVelocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent Psychosis
 
Clinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with DepressionClinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with Depression
 
Data Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical TrialsData Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical Trials
 
Teen Depression and Suicide
Teen Depression and SuicideTeen Depression and Suicide
Teen Depression and Suicide
 
Pediatric Bipolar Disorder
Pediatric Bipolar DisorderPediatric Bipolar Disorder
Pediatric Bipolar Disorder
 
SSRIs and Suicidality in Youth
SSRIs and Suicidality in YouthSSRIs and Suicidality in Youth
SSRIs and Suicidality in Youth
 
The Neurobiology of Adolescent Development
The Neurobiology of Adolescent DevelopmentThe Neurobiology of Adolescent Development
The Neurobiology of Adolescent Development
 
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
 
Clinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergenciesClinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergencies
 
Computer Anxiety
Computer AnxietyComputer Anxiety
Computer Anxiety
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
AI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewAI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and Overview
 
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
 
Paxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical AnalysisPaxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical Analysis
 
How The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For AnxietyHow The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For Anxiety
 

Recently uploaded

Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 

Recently uploaded (17)

Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 

Air Pollution in NS: Analysis and Predictions

  • 1. Air Pollution in Nova Scotia: Analysis and Predictions Carlo Carandang Zhengyang Ma Navjot Kaur Sahil Behl Funto Akinyemi Maroof Rahman
  • 2. Air Pollution in NS: Problem Air Pollution ● Detrimental to health (asthma, breathing problems) ● Detrimental to environment ● Detrimental to NS economy
  • 3. Air Pollution in NS: Business Case ● Recent push for green and smart cities ● NS economy based on tourism and nature ● To carry out these green initiatives, Policy Makers need: ○ accurate and convenient access to current and future air pollution conditions ○ to highlight pollution problems areas ○ to measure the effects of interventions to reduce pollution ○ to predict future pollution problems
  • 4. 2018 Open Data Hackathon ● Analyzed air particulate pollution datasets from NS Open Data Portal ● Produced time series prediction on concatenated table from the datasets
  • 6. Further Analysis of Air Pollution Data ● After Hackathon, wanted to further analyze the data ● Built a Data Warehouse in Oracle DB for Business Intelligence ● Cleaned, extracted, and transformed the data ● Loaded the data into a data warehouse ○ Dimension Tables loading into a Fact Table ● Fact table connected directly to Tableau for visualization
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Technical: ETL and Data Warehouse
  • 20. Technologies Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production Oracle SQL Developer SQL*Loader MS Excel Cloud Dataprep Tableau Draw.io
  • 21.
  • 22. Data Selection: Motivation ● We chose these datasets as it contained enough data for us to extract insights into a business problem ● When searching, many of the open datasets available had much sparcity or was not populated fully, and also the datasets were small (small number of rows and columns) ● We ended up extracting the air particulate pollution datasets from the Nova Scotia Open Data Portal, as there were several dataset on air particulate matter from all around the Province ● We chose these datasets as it gave us a chance to extract multiple datasets and perform ETL and analytics on them.
  • 23. Obtaining the Data To obtain the data, the following procedure was followed: 1. Go to https://data.novascotia.ca 2. Click on button ‘Data Catalogue’ 3. In the search bar, search for ‘pollution air particulate’ 4. You will see 30 results: sort by ‘Most Relevant’ 5. You want the first 12 results (ignore the 11th result, which is a lookup table) This will leave you with 11 csv files for analysis.
  • 24. csv Description: 44 MB; 643,453 rows total Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Kentville_BAM.csv 498 KB 8,785 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_BAM.csv 4.7 MB 69,958 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Port Hawkesbury_BAM.csv 4.1 MB 55,559 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sable_Island_BAM.csv 4.6 MB 64,496 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Pictou_BAM.csv 6.8 MB 104,033 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_BAM.csv 3.4 MB 50,752 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_TEOM.csv 4 MB 60,691 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_TEOM.csv 790 KB 12,494 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Aylesford_BAM.csv 4.7 MB 68,025 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_BAM.csv 6.7 MB 96,433 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_TEOM.csv 3.8 MB 60,514 rows
  • 25. Data Description Column Sample Data Description Date_Time 07/24/1998 03:00:00 PM date-time stamp Pollutant PM2.5 fine particulate matter (microns) Unit ¬µg/m3 micrograms per cubic metre Station Sydney location of instrument Instrument TEOM instrument type Average 24 average concentration Time period coverage: 2001-01-01 to 2016-12-31
  • 26.
  • 27. Cleaning We chose to perform preprocessing of the data in Cloud Dataprep. We uploaded the 11 csv files that were extracted from the Nova Scotia Open Data Portal
  • 28. Import the Data and Add to Flow
  • 29. Recipe to Remove Rows with Null Values
  • 30. Remove Rows with Null Values
  • 31. Remove Rows with ‘InVld’
  • 32. Remove Rows with ‘NoData’
  • 33. Remove Rows with ‘<Samp’
  • 34.
  • 35. Extraction: csv to staging tables -- Extract csv to staging tables via Excel Macro using SQL Loader. -- First, create upload table: CREATE TABLE "DWUSER"."UPLOAD_TABLE" ( "DATE_TIME" VARCHAR2(250 BYTE), "POLLUTANT" VARCHAR2(250 BYTE), "UNIT" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "INSTRUMENT" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126) ) ;
  • 36. Extraction: csv to staging tables Next, utilize Excel Macro to load csv’s via SQL Loader:
  • 37. Extraction: csv to staging tables -- Run SQL*Loader via control files, in conjunction with Excel Macros: LOAD DATA INFILE * APPEND INTO TABLE HLFX_BAM_MONTHLY_AVG Fields terminated by '~' Trailing Nullcols (DATE_TIME,POLLUTANT,STATION,AVERAGE) BEGINDATA 9-Jan-2008~PM2.5~Halifax~4.814259 -- Load all 11 csv files via this method
  • 38. Testing Log file of initial extraction of csv files into Oracle DB table: SQL*Loader: Release 11.2.0.2.0 - Production on Fri Mar 9 14:05:54 2018 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Control File: C:JUNKINTREVERSE.CTL Data File: C:JUNKINTREVERSE.CTL Bad File: C:JUNKINTREVERSE.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Bind array: 64 rows, maximum of 256000 bytes Continuation: none specified Path used: Conventional
  • 39. Testing: continued Table HLFX_BAM_MONTHLY_AVG, loaded from every logical record. Insert option in effect for this table: APPEND TRAILING NULLCOLS option in effect Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- DATE_TIME FIRST * ~ CHARACTER POLLUTANT NEXT * ~ CHARACTER STATION NEXT * ~ CHARACTER AVERAGE NEXT * ~ CHARACTER Table HLFX_BAM_MONTHLY_AVG: 132 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null.
  • 40.
  • 41.
  • 42.
  • 43. Transformation -- Upload table transformed into tables partitioned by Station (location): CREATE TABLE "DWUSER"."EVTB_HALIFAX_UPLOAD" ( "DATE_TIME" TIMESTAMP (6), "POLLUTANT" VARCHAR2(250 BYTE), "UNIT" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "INSTRUMENT" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126)); -- Repeat this for the 7 other Station Locations
  • 44.
  • 45.
  • 46. Load Dim Tables and Fact Table CREATE TABLE "DWUSER"."EVTB_FACT" ( "TIME_ID" VARCHAR2(250 BYTE), "DATE_ID" VARCHAR2(250 BYTE), "STATION_ID" VARCHAR2(250 BYTE), "INSTRUMENT_ID" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126), "ID" VARCHAR2(250 BYTE), CONSTRAINT "PK_FACT" PRIMARY KEY ("ID"));
  • 47. Load Dim Tables and Fact Table CREATE TABLE "DWUSER"."EVTB_DIM_STATION" ( "STATION_ID" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "DESCRIPTION" VARCHAR2(250 BYTE), CONSTRAINT "PK_STATION" PRIMARY KEY ("STATION_ID")); -- Repeat this for DIM_DATE, DIM_INSTRUMENT, and DIM_TIME
  • 48.
  • 49. Visualization Fact table and Dim tables from Oracle database connected directly to Tableau for visualization