SlideShare a Scribd company logo
1 of 15
Banana Team
Monthly Challenge
AirPollution case
How are we going to
present ?
• Step by step that we made so far
• What approach we have chosen for each step
• Why ?
• How ? (We mean the technical part here)
Step 1 – Import the data
• Import all the datasets and look at
the classes
• Load the map of Sofia to have a look
(package “ggmap”)
• Left – join the Metadata with the EEA
Step 2 – dealing with the official
measurements data
• Split the EEA data based on the
measurement time – hours vs days
• Interpolate the P10 measurments using the
imputeTS package na.Kalman method
• Check the stations on the map and the p10
over 50
Step 3 – dealing with the citizen
data
• Clean the data:
o Remove records without geohash
o Keep the records only in Sofia
o Remove the duplicates by geohash and
time – take the mean value
o Basic stats
o Remove mismeasurements
o What about p10 measurements ?
Step 4 - Clustering
• First try making clusters by k-
mean_15
• Look at the mean p10
• Re-clustering
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0 21.63583 16.93738 10.33807 14.38976 31.32357 18.35733 24.68484 16.94701 13.45075 21.0008 19.82045 31.77396 13.46899 16.24168
2 21.63583 0 25.97364 19.77782 16.85756 45.06835 27.13509 13.92111 14.59554 16.11776 9.288892 31.58869 50.05801 10.16962 27.00873
3 16.93738 25.97364 0 14.14331 13.37783 25.97717 9.780476 31.29258 17.91099 15.01486 25.56202 14.19198 28.37356 9.888615 11.32324
4 10.33807 19.77782 14.14331 0 10.00163 31.33025 16.06305 24.82776 13.11336 9.919847 19.05987 18.8565 31.28343 14.82302 13.82602
5 14.38976 16.85756 13.37783 10.00163 0 33.73205 16.2738 23.15428 9.026164 7.731286 16.09706 20.48184 35.4638 6.021929 15.32186
6 31.32357 45.06835 25.97717 31.33025 33.73205 0 27.85597 47.90453 36.44666 35.81961 45.70246 26.07021 30.99674 24.73839 25.60471
7 18.35733 27.13509 9.780476 16.06305 16.2738 27.85597 0 31.90716 20.3111 16.95018 26.84071 15.10771 29.36166 8.734842 12.66861
8 24.68484 13.92111 31.29258 24.82776 23.15428 47.90453 31.90716 0 21.10077 21.95312 13.51846 35.76466 54.16118 10.00073 31.62236
9 16.94701 14.59554 17.91099 13.11336 9.026164 36.44666 20.3111 21.10077 0 11.85946 14.36245 24.69296 39.84126 6.502456 19.50443
10 13.45075 16.11776 15.01486 9.919847 7.731286 35.81961 16.95018 21.95312 11.85946 0 14.85418 20.44611 36.36994 6.799423 16.2648
11 21.0008 9.288892 25.56202 19.05987 16.09706 45.70246 26.84071 13.51846 14.36245 14.85418 0 31.18561 50.07563 7.678595 26.74282
12 19.82045 31.58869 14.19198 18.8565 20.48184 26.07021 15.10771 35.76466 24.69296 20.44611 31.18561 0 26.10775 12.07865 13.47317
13 31.77396 50.05801 28.37356 31.28343 35.4638 30.99674 29.36166 54.16118 39.84126 36.36994 50.07563 26.10775 0 14.60216 25.03165
14 13.46899 10.16962 9.888615 14.82302 6.021929 24.73839 8.734842 10.00073 6.502456 6.799423 7.678595 12.07865 14.60216 0 10.58576
15 16.24168 27.00873 11.32324 13.82602 15.32186 25.60471 12.66861 31.62236 19.50443 16.2648 26.74282 13.47317 25.03165 10.58576 0
What the model will be ?
What conclusions we made ?
Banana Team
Thank you for the attention !

More Related Content

More from Data Science Society

More from Data Science Society (20)

Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large Corporations
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
 
Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017
 
Network Analysis Public Procurement
Network Analysis Public ProcurementNetwork Analysis Public Procurement
Network Analysis Public Procurement
 
Computer vision and image processing for dental products
Computer vision and image processing for dental productsComputer vision and image processing for dental products
Computer vision and image processing for dental products
 
Crowdsourced hedge funds
Crowdsourced hedge funds Crowdsourced hedge funds
Crowdsourced hedge funds
 
Wavelet analysis of financial datasets
Wavelet analysis of financial datasetsWavelet analysis of financial datasets
Wavelet analysis of financial datasets
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 

Recently uploaded

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 

Air sofia case solution - Team Banana (1st speakers)

  • 2. How are we going to present ? • Step by step that we made so far • What approach we have chosen for each step • Why ? • How ? (We mean the technical part here)
  • 3. Step 1 – Import the data • Import all the datasets and look at the classes • Load the map of Sofia to have a look (package “ggmap”) • Left – join the Metadata with the EEA
  • 4. Step 2 – dealing with the official measurements data • Split the EEA data based on the measurement time – hours vs days • Interpolate the P10 measurments using the imputeTS package na.Kalman method • Check the stations on the map and the p10 over 50
  • 5.
  • 6.
  • 7.
  • 8. Step 3 – dealing with the citizen data • Clean the data: o Remove records without geohash o Keep the records only in Sofia o Remove the duplicates by geohash and time – take the mean value o Basic stats o Remove mismeasurements o What about p10 measurements ?
  • 9.
  • 10. Step 4 - Clustering • First try making clusters by k- mean_15 • Look at the mean p10 • Re-clustering
  • 11.
  • 12. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 0 21.63583 16.93738 10.33807 14.38976 31.32357 18.35733 24.68484 16.94701 13.45075 21.0008 19.82045 31.77396 13.46899 16.24168 2 21.63583 0 25.97364 19.77782 16.85756 45.06835 27.13509 13.92111 14.59554 16.11776 9.288892 31.58869 50.05801 10.16962 27.00873 3 16.93738 25.97364 0 14.14331 13.37783 25.97717 9.780476 31.29258 17.91099 15.01486 25.56202 14.19198 28.37356 9.888615 11.32324 4 10.33807 19.77782 14.14331 0 10.00163 31.33025 16.06305 24.82776 13.11336 9.919847 19.05987 18.8565 31.28343 14.82302 13.82602 5 14.38976 16.85756 13.37783 10.00163 0 33.73205 16.2738 23.15428 9.026164 7.731286 16.09706 20.48184 35.4638 6.021929 15.32186 6 31.32357 45.06835 25.97717 31.33025 33.73205 0 27.85597 47.90453 36.44666 35.81961 45.70246 26.07021 30.99674 24.73839 25.60471 7 18.35733 27.13509 9.780476 16.06305 16.2738 27.85597 0 31.90716 20.3111 16.95018 26.84071 15.10771 29.36166 8.734842 12.66861 8 24.68484 13.92111 31.29258 24.82776 23.15428 47.90453 31.90716 0 21.10077 21.95312 13.51846 35.76466 54.16118 10.00073 31.62236 9 16.94701 14.59554 17.91099 13.11336 9.026164 36.44666 20.3111 21.10077 0 11.85946 14.36245 24.69296 39.84126 6.502456 19.50443 10 13.45075 16.11776 15.01486 9.919847 7.731286 35.81961 16.95018 21.95312 11.85946 0 14.85418 20.44611 36.36994 6.799423 16.2648 11 21.0008 9.288892 25.56202 19.05987 16.09706 45.70246 26.84071 13.51846 14.36245 14.85418 0 31.18561 50.07563 7.678595 26.74282 12 19.82045 31.58869 14.19198 18.8565 20.48184 26.07021 15.10771 35.76466 24.69296 20.44611 31.18561 0 26.10775 12.07865 13.47317 13 31.77396 50.05801 28.37356 31.28343 35.4638 30.99674 29.36166 54.16118 39.84126 36.36994 50.07563 26.10775 0 14.60216 25.03165 14 13.46899 10.16962 9.888615 14.82302 6.021929 24.73839 8.734842 10.00073 6.502456 6.799423 7.678595 12.07865 14.60216 0 10.58576 15 16.24168 27.00873 11.32324 13.82602 15.32186 25.60471 12.66861 31.62236 19.50443 16.2648 26.74282 13.47317 25.03165 10.58576 0
  • 13.
  • 14. What the model will be ? What conclusions we made ?
  • 15. Banana Team Thank you for the attention !