SlideShare a Scribd company logo
Citizen Science, VGI, Geo-
CrowdSourcing, Big Geo Data: how
they matter to the FOSS4G Community
Maria Antonia Brovelli
Politecnico di Milano – Como Campus
Dept. of Civil and Environmental Engineering (DICA) - Italy
K Seoul Hotel
Grand Ballroom B
18 September 2015
POLITECNICO DI MILANO
GEOlab - COMO Campus
2Citizen science
Citizen science: scientific work
undertaken by members of the
general public, often in
collaboration with or under the
direction of professional scientists
and scientific institutions.
June 2014
It is a fairly new name but an old practice
3Volunteered Geographic Information
The name was coined in 2007 (Goodchild),
but it was already a real practice.
Geopaparazzi. Because not all paparazzis are evil!
4
OSM
Streets and
Buildings
Milan and
Seoul
2006-2015
Volunteered Geographic Information
5OpenStreetMap at FOSS4G Europe 2015
Peter Mooney , Luca Delucchi and Marco Minghini
6Checking the quality
Comparison of OpenStreetMap and authoritative road
network datasets
http://131.175.143.84/WPS/
7Indoor Mapping at FOSS4G Europe
Nicola Dorigatti and
Ludovico Biagi
http://www.i-locate.eu/the-open-source-toolkit/
8Emotional Maps: participatory sensing
How do we feel the city?
Photo by Thomas Leuthard Haosheng Huang
9Emotional Mapping at FOSS4G Europe 2015
Haosheng Huang and Eleonora Ciceri
Available at Google Play or Apple Store: emomap
Download the code:
https://github.com/cartogroup/emomap
10Other participatory sensing applications
Architectural barriersArchitectural barriers
Cultural elementsCultural elements
Street furnitureStreet furniture BiodiversityBiodiversity
11
12OSAKA Bike Parking Report
Available at Google Play : ODKCollect
To connect ODK Collect to the ODK Aggregate
server:
• hit the Options device button, then select General
Settings
• select Configure platform settings and insert:
– URL: http://131.175.143.49/ODKAggregate
– Username: maria - Password: osaka2015
132D viewer
http://geomobile.como.polimi.it/Osaka/
14Policrowd2.0 Osaka
http://viaregina2.como.polimi.it/Osaka/
http://geomobile.como.polimi.it/policrowd2.0/
15Architecture (Policrowd2.0)
16Volunteer thinking
Vijay Charan Venkatachalam and Irene Celino
http://bit.ly/foss4game
DUSAF (Lombardy)
17
1. Look at
the pixel
within the
blue
border
2. For that
pixel, choose
the most
suitable land
cover
category
3. Watch
the time!
4. Win scores and
badges and beat
your friends!
18Geocrowdsourced information
19
DATA AND INFORMATION
BELONG TO PEOPLE!
20Geo Big Data: Satellites
21
Sensors are everywhere and they are the
electronic skin of the Earth
Geo Big Data: Sensors
Citizens as
sensors
22Example 1: EarthServer
 Agile Analytics on 1+ Petabyte space/time
datacubes
• Earth Science (3D sat image timeseries, 4D weather);
Planetary Science
 Open standards, open source
• OGC WCS + WCPS, integrated data/
metadata search
• rasdaman + NASA WorldWind + more
 Intercontinental initiative:
EU+US+AUS
 www.earthserver.eu
EarthServer: Datacubes at Your Fingertips
Peter Baumann & al,
Jacobs Univ. Bremen -Germany
23Example 2: support for massive datasets in GRASS
Moderate-resolution Imaging
Spectroradiometer (MODIS)
Land Surface Temperature:
Data from 2000-today
4 Earth coverages per day
Processing of 17,000 maps of 415
million pixels each (250 m size)
●
In total 300 nodes with 600 Gb RAM
●
132 TB of raw disk space, XFS,
GlusterFS
●
Circa 2 Tflops/s
●
Scientific Linux operating system,
blades headless
●
Queue system for job management
(Grid Engine), used for GRASS jobs
●
Computational time for all data:
1 month with LST-algorithm V2.0
●
Computational time for one LST day:
3 hours on 2 nodes
Markus Neteler & al,
FEM-Italy
24Geo Big Data: social media
http://www.internetlivestats.com/
9,890 Tweets sent in 1 second
2,528 Instagram photos uploaded in 1 second
2,153 Tumblr posts in 1 second
1,843 Skype calls in 1 second
29,290 GB of Internet traffic in 1 second
50,232 Google searches in 1 second
106,299 YouTube videos viewed in 1 second
2,420,172 Emails sent in 1 second
25Geo Big Data: the example of Milano GRID
●
Two months of data, with a temporal step of 10
minutes
●
Grid of 100 x 100 cells with size = 235 m
https://dandelion.eu/datamine/open-big-data/
26
●
Received SMS: a Call Detail Record (CDR) is generated each
time a user receives an SMS
●
Sent SMS: a CDR is generated each time a user sends an SMS
●
Incoming Calls: a CDR is generated each time a user receives
a call
●
Outgoing Calls: CDR is generated each time a user issues a
call
●
Internet connections: a CDR is generate each time
– a user starts an internet connection
– a user ends an internet connection
– during the same connection one of the following limits
is reached:​
• 15 minutes from the last generated CDR
• 5 MB from the last generated CDR
●
Geolocalized Tweets (Anonymized twitter users)
Geo Big Data: Milano GRID
27Sensing the City - 1
28Sensing the City - 1
Students: Emanuele Mariani, Jacopo Mossina;
Supervisors: Giorgio Zamboni, Maria A Brovelli
http://131.175.143.84/BGDV/
29Sensing the City - 2
Student: Anna Trofimova
Supervisors: Carolina Arias, Maria A Brovelli
http://landcover.como.polimi.it/socialmedia_rasdaman/
30Sensing the City - 2
Filtering with date and land coverage classes
31Sensing the City - 3
NETCDF
Carolina Arias, Maria A Brovelli,
Simone Corti, Giorgio Zamboni
http://landcover.como.polimi.it/BigNetCDF/
EST-WA
32GeoForAll Geocrowd
Geocrowdsourcing CitizenScience FOSS4G
Peter Mooney & Maria A Brovelli
http://www.geoforall.org/http://www.geoforall.org/
http://wiki.osgeo.org/wiki/Geocrowdsourcing_CitizenScience_FOSS4G
33
Thanks to all people of my team contributing on these
topics: Carolina Arias, Ludovico Biagi, Marco Brambilla,
Eleonora Ciceri, Simone Corti, Eylul Candan Kilsedar,
Emanuele Mariani, Marco Minghini, Monia Molinari,
Jacopo Mossina,Gabriele Prestifilippo, Anna Trofimova,
Vijay Charan Venkatachalam, Giorgio Zamboni
Thanks to Peter Baumann, Irene Celino, Luca De Lucchi,
Nicola Dorigatti, Haosheng Huang, Hayashi Hirofumi,
Yoshida Daisuke, Peter Mooney, Markus Neteler,
Venkathes Raghavan
maria.brovelli@polimi.it
Thanks for your attention!
34
현지 조직위원회 에게 감사
당신은 중대하다 !

More Related Content

Citizen science, vgi, geo crowd sourcing, big geo data how they matter to the foss4g community

  • 1. Citizen Science, VGI, Geo- CrowdSourcing, Big Geo Data: how they matter to the FOSS4G Community Maria Antonia Brovelli Politecnico di Milano – Como Campus Dept. of Civil and Environmental Engineering (DICA) - Italy K Seoul Hotel Grand Ballroom B 18 September 2015 POLITECNICO DI MILANO GEOlab - COMO Campus
  • 2. 2Citizen science Citizen science: scientific work undertaken by members of the general public, often in collaboration with or under the direction of professional scientists and scientific institutions. June 2014 It is a fairly new name but an old practice
  • 3. 3Volunteered Geographic Information The name was coined in 2007 (Goodchild), but it was already a real practice. Geopaparazzi. Because not all paparazzis are evil!
  • 5. 5OpenStreetMap at FOSS4G Europe 2015 Peter Mooney , Luca Delucchi and Marco Minghini
  • 6. 6Checking the quality Comparison of OpenStreetMap and authoritative road network datasets http://131.175.143.84/WPS/
  • 7. 7Indoor Mapping at FOSS4G Europe Nicola Dorigatti and Ludovico Biagi http://www.i-locate.eu/the-open-source-toolkit/
  • 8. 8Emotional Maps: participatory sensing How do we feel the city? Photo by Thomas Leuthard Haosheng Huang
  • 9. 9Emotional Mapping at FOSS4G Europe 2015 Haosheng Huang and Eleonora Ciceri Available at Google Play or Apple Store: emomap Download the code: https://github.com/cartogroup/emomap
  • 10. 10Other participatory sensing applications Architectural barriersArchitectural barriers Cultural elementsCultural elements Street furnitureStreet furniture BiodiversityBiodiversity
  • 11. 11
  • 12. 12OSAKA Bike Parking Report Available at Google Play : ODKCollect To connect ODK Collect to the ODK Aggregate server: • hit the Options device button, then select General Settings • select Configure platform settings and insert: – URL: http://131.175.143.49/ODKAggregate – Username: maria - Password: osaka2015
  • 16. 16Volunteer thinking Vijay Charan Venkatachalam and Irene Celino http://bit.ly/foss4game DUSAF (Lombardy)
  • 17. 17 1. Look at the pixel within the blue border 2. For that pixel, choose the most suitable land cover category 3. Watch the time! 4. Win scores and badges and beat your friends!
  • 20. 20Geo Big Data: Satellites
  • 21. 21 Sensors are everywhere and they are the electronic skin of the Earth Geo Big Data: Sensors Citizens as sensors
  • 22. 22Example 1: EarthServer  Agile Analytics on 1+ Petabyte space/time datacubes • Earth Science (3D sat image timeseries, 4D weather); Planetary Science  Open standards, open source • OGC WCS + WCPS, integrated data/ metadata search • rasdaman + NASA WorldWind + more  Intercontinental initiative: EU+US+AUS  www.earthserver.eu EarthServer: Datacubes at Your Fingertips Peter Baumann & al, Jacobs Univ. Bremen -Germany
  • 23. 23Example 2: support for massive datasets in GRASS Moderate-resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature: Data from 2000-today 4 Earth coverages per day Processing of 17,000 maps of 415 million pixels each (250 m size) ● In total 300 nodes with 600 Gb RAM ● 132 TB of raw disk space, XFS, GlusterFS ● Circa 2 Tflops/s ● Scientific Linux operating system, blades headless ● Queue system for job management (Grid Engine), used for GRASS jobs ● Computational time for all data: 1 month with LST-algorithm V2.0 ● Computational time for one LST day: 3 hours on 2 nodes Markus Neteler & al, FEM-Italy
  • 24. 24Geo Big Data: social media http://www.internetlivestats.com/ 9,890 Tweets sent in 1 second 2,528 Instagram photos uploaded in 1 second 2,153 Tumblr posts in 1 second 1,843 Skype calls in 1 second 29,290 GB of Internet traffic in 1 second 50,232 Google searches in 1 second 106,299 YouTube videos viewed in 1 second 2,420,172 Emails sent in 1 second
  • 25. 25Geo Big Data: the example of Milano GRID ● Two months of data, with a temporal step of 10 minutes ● Grid of 100 x 100 cells with size = 235 m https://dandelion.eu/datamine/open-big-data/
  • 26. 26 ● Received SMS: a Call Detail Record (CDR) is generated each time a user receives an SMS ● Sent SMS: a CDR is generated each time a user sends an SMS ● Incoming Calls: a CDR is generated each time a user receives a call ● Outgoing Calls: CDR is generated each time a user issues a call ● Internet connections: a CDR is generate each time – a user starts an internet connection – a user ends an internet connection – during the same connection one of the following limits is reached:​ • 15 minutes from the last generated CDR • 5 MB from the last generated CDR ● Geolocalized Tweets (Anonymized twitter users) Geo Big Data: Milano GRID
  • 28. 28Sensing the City - 1 Students: Emanuele Mariani, Jacopo Mossina; Supervisors: Giorgio Zamboni, Maria A Brovelli http://131.175.143.84/BGDV/
  • 29. 29Sensing the City - 2 Student: Anna Trofimova Supervisors: Carolina Arias, Maria A Brovelli http://landcover.como.polimi.it/socialmedia_rasdaman/
  • 30. 30Sensing the City - 2 Filtering with date and land coverage classes
  • 31. 31Sensing the City - 3 NETCDF Carolina Arias, Maria A Brovelli, Simone Corti, Giorgio Zamboni http://landcover.como.polimi.it/BigNetCDF/ EST-WA
  • 32. 32GeoForAll Geocrowd Geocrowdsourcing CitizenScience FOSS4G Peter Mooney & Maria A Brovelli http://www.geoforall.org/http://www.geoforall.org/ http://wiki.osgeo.org/wiki/Geocrowdsourcing_CitizenScience_FOSS4G
  • 33. 33 Thanks to all people of my team contributing on these topics: Carolina Arias, Ludovico Biagi, Marco Brambilla, Eleonora Ciceri, Simone Corti, Eylul Candan Kilsedar, Emanuele Mariani, Marco Minghini, Monia Molinari, Jacopo Mossina,Gabriele Prestifilippo, Anna Trofimova, Vijay Charan Venkatachalam, Giorgio Zamboni Thanks to Peter Baumann, Irene Celino, Luca De Lucchi, Nicola Dorigatti, Haosheng Huang, Hayashi Hirofumi, Yoshida Daisuke, Peter Mooney, Markus Neteler, Venkathes Raghavan maria.brovelli@polimi.it Thanks for your attention!
  • 34. 34 현지 조직위원회 에게 감사 당신은 중대하다 !

Editor's Notes

  1. each Earth Science partner will host datacubes worth at least 1 PB, consisting of 3D satellite image timeseries and 4D weather data. The Mars service (Moon & others to be added) naturally don't reveal so many data, it will be 20+ TB. Following "Any query, any time", requests will get distributed in clouds and across data centers. A brand new feature of open-source rasdaman community, coming from the rasdaman GmbH company, to be released next week! Tested with 1,000+ Amazon nodes.
  2. Slide 1: Support for massive spatial datasets in GRASS GIS GRASS GIS 7 has been notably improved for the processing of massive spatial datasets. This includes improvement of the vector and raster data engines. Slide 2: EuroLST: MODIS LST daily time series Temperature is a main driver for most ecological processes, and temperature time series provide key environmental indicators for various applications and research fields. High spatial and temporal resolution are crucial for detailed analyses in various fields of research. The problem of optical and thermal remote sensing are clouds which often obscure parts or even the entirety of satellite images (see upper left map, showing the mosaik of land surface temperature (LST) data of a particular day in Europe). The team of Markus Neteler at Fondazione Edmund Mach in Italy developed a new method to reconstruct the complete time series for land surface temperature (LST) at continental scale from daily MODIS LST products, flown onboard the Terra and Aqua NASA satellites. These two satellites are generating four Earth coverages per day! The reconstruction, i.e. gap filling of the data is done by outlier detection, then a temporal gap filling (upper right picture) trying to find pixels in time by looking back and forth for a few days. This is followed by a multiple regression with auxiliary maps in order to generate a synthetic LST map for each satellite overpass (lower right image). All done in GRASS GIS 7, about 5 billion pixels are involved for each map reconstruction. Eventually, both the temporary results (upper and lower right images) are merged into the final LST map (lower left image). In total 17,000 daily LST maps of each 415 million pixels have been generated and then aggregated to daily averages, monthly, seasonal and annual means. Slide 3: EuroLST: MODIS LST daily time series Example for a single reconstructed mosaik out of the 17,000 maps. .... as per slide ... Slide 4: EuroLST: MODIS LST daily time series Another example: the hot summer of 2003 in Europe is still reflected in the overly warmed up big lakes in January 2004. The temperature excess leads to better overwintering conditions for parasites and disease vectors which are studied by the team at Fondazione Edmund Mach. Slide 5: BIOCLIM from reconstructed MODIS LST at 250m pixel resolution Example for BIOCLIM data derived from the daily reconstructed LST data. These BIOCLIM-like European LST maps following the “Bioclim” definition (Hutchinson et al., 2009) were derived from 10 years of reconstructed MODIS LST as GeoTIFF files, 250m pixel resolution, in EU LAEA projection. The data are available online at http://gis.cri.fmach.it/eurolst-bioclim/ Slide 6: FEM-GIS Cluster overview This machine is used for the MODIS LST processing. The I/O is so high that the internal 10Gb/s network cannot handle more than 20 jobs in parallel (since each job processes more than 5 billion pixels). GRASS GIS 7 has been optimized to be able to handle this amount of data easily. Slide 7: FEM-GIS Cluster architecture The FEM-GIS Cluster has been created over several years which explains its heterogeneous structure. In the lower right part there is distributed low cost storage system consisting of 8 microservers with 4 disks each, which are seen as a single storage through the GlusterFS file system.