This document discusses methods for accessing, processing, and extracting knowledge from geo-referenced human activity data. It describes challenges in modeling geospatial data from different sources and accessing data through spatial hierarchy models. It also covers processing paradigms for knowledge extraction, including spatial workflow patterns and temporal dynamics in communities from data sources like tweets. Visualization and interaction techniques are discussed, including moving toward 3D web-based visualization using technologies like WebGL. Feature extraction from data is highlighted as informing risk assessment knowledge.
The document discusses storing terrestrial LiDAR data in a spatial database framework. It describes setting up a PostgreSQL database with a PostGIS extension to store LiDAR point cloud data in a hierarchical folder structure based on survey dates and locations. Issues with large data uploads are addressed through experiments comparing the PostgreSQL COPY method to the pg_bulkload method, finding pg_bulkload significantly faster for importing large LiDAR datasets. The spatial database allows efficient querying of LiDAR data by location or other attributes.
The document discusses feature extraction from lidar data, including road extraction and roadside feature extraction. It outlines algorithms for extracting road edges with over 90% accuracy, and detecting poles, trees, and other roadside features in a fully automated manner. Ongoing work focuses on improving pole extraction and developing classifiers for different feature types like signs and light posts.
This document provides guidance on how to give effective presentations. It emphasizes that audiences are often bored by typical presentations and suggests developing skills in areas like visual presenting, storytelling, preparation and simplification. The document encourages focusing presentations on scope and depth rather than details, using repetition to reinforce key points, and incorporating images, whitespace and consistency in design. Presenters are advised to think about the audience and their needs or resistance rather than just the presentation content or tools.
The document discusses multi-thematic spatial databases for efficiently storing, accessing, processing, and visualizing large volumes of geospatial data from multiple sources and sensors. It describes experience with designing databases to handle terabytes of temporal, multi-sensor data using spatial indexing. The goals are a unified approach for multi-thematic data storage, efficient data handling, and enabling searches across time, space and attributes while incorporating visualizations.
The document discusses LiDAR processing for road network asset inventory. It outlines an algorithm developed for extracting road edges from LiDAR point clouds without manual input. It also discusses using the extracted road edges to develop a road surface extraction algorithm. Pole detection and extraction methods are also examined. The goal is to develop automated feature extraction from mobile mapping LiDAR and image data for road inventory purposes.
With the global drive towards Building Information Modelling (BIM) compliance gathering pace, we have seen an increased requirement for highly accurate and detailed geospatial data within utilities and engineering projects. This has resulted in a corresponding increase in the application of LIDAR technology within these sectors.
This document discusses digital hologram image processing techniques. It begins with an introduction to digital holography and why image processing is needed to extract 3D information from digital holograms. Key topics covered include reconstructing digital holograms, focusing and segmentation techniques, and removing unwanted twin images and other artifacts from reconstructions. The document provides an overview of recording digital holograms and sources of error, as well as outlining various image processing approaches that can be applied.
A web platform and a methodology to promote a collaborative development of co...damarcant
This document proposes a web platform called Context Cloud and an associated methodology called Situation-Driven Development to facilitate collaborative development of context-aware systems. Context Cloud allows domain experts and programmers to define contexts, situations, and rules through a visual interface to detect situations and adapt systems. An evaluation found the platform and methodology eased collaboration and enabled non-programmers to develop context-aware systems.
The document discusses storing terrestrial LiDAR data in a spatial database framework. It describes setting up a PostgreSQL database with a PostGIS extension to store LiDAR point cloud data in a hierarchical folder structure based on survey dates and locations. Issues with large data uploads are addressed through experiments comparing the PostgreSQL COPY method to the pg_bulkload method, finding pg_bulkload significantly faster for importing large LiDAR datasets. The spatial database allows efficient querying of LiDAR data by location or other attributes.
The document discusses feature extraction from lidar data, including road extraction and roadside feature extraction. It outlines algorithms for extracting road edges with over 90% accuracy, and detecting poles, trees, and other roadside features in a fully automated manner. Ongoing work focuses on improving pole extraction and developing classifiers for different feature types like signs and light posts.
This document provides guidance on how to give effective presentations. It emphasizes that audiences are often bored by typical presentations and suggests developing skills in areas like visual presenting, storytelling, preparation and simplification. The document encourages focusing presentations on scope and depth rather than details, using repetition to reinforce key points, and incorporating images, whitespace and consistency in design. Presenters are advised to think about the audience and their needs or resistance rather than just the presentation content or tools.
The document discusses multi-thematic spatial databases for efficiently storing, accessing, processing, and visualizing large volumes of geospatial data from multiple sources and sensors. It describes experience with designing databases to handle terabytes of temporal, multi-sensor data using spatial indexing. The goals are a unified approach for multi-thematic data storage, efficient data handling, and enabling searches across time, space and attributes while incorporating visualizations.
The document discusses LiDAR processing for road network asset inventory. It outlines an algorithm developed for extracting road edges from LiDAR point clouds without manual input. It also discusses using the extracted road edges to develop a road surface extraction algorithm. Pole detection and extraction methods are also examined. The goal is to develop automated feature extraction from mobile mapping LiDAR and image data for road inventory purposes.
With the global drive towards Building Information Modelling (BIM) compliance gathering pace, we have seen an increased requirement for highly accurate and detailed geospatial data within utilities and engineering projects. This has resulted in a corresponding increase in the application of LIDAR technology within these sectors.
This document discusses digital hologram image processing techniques. It begins with an introduction to digital holography and why image processing is needed to extract 3D information from digital holograms. Key topics covered include reconstructing digital holograms, focusing and segmentation techniques, and removing unwanted twin images and other artifacts from reconstructions. The document provides an overview of recording digital holograms and sources of error, as well as outlining various image processing approaches that can be applied.
A web platform and a methodology to promote a collaborative development of co...damarcant
This document proposes a web platform called Context Cloud and an associated methodology called Situation-Driven Development to facilitate collaborative development of context-aware systems. Context Cloud allows domain experts and programmers to define contexts, situations, and rules through a visual interface to detect situations and adapt systems. An evaluation found the platform and methodology eased collaboration and enabled non-programmers to develop context-aware systems.
Presentation as held at the "Workshop on Knowledge Evolution and Ontology Dynamics" co-located with ISWC 2011. Related to the paper http://ceur-ws.org/Vol-784/evodyn1.pdf
Situation driven development: a methodology for the development of context-aw...damarcant
This document proposes a methodology called Situation-Driven Development for creating context-aware systems. It involves domain experts and programmers collaborating using a platform called Context Cloud. Context Cloud allows defining context information and situations through a web interface. It then generates outputs to adapt a system's behavior based on the identified situation. An evaluation found the methodology and platform made developing context-aware systems quicker and easier, and facilitated collaborative work between technical and domain experts.
Advancements In Visualization Of Remotely Sensed 3D DataMerrick & Company
The document discusses advancements in visualizing remotely sensed 3D data. It describes historical methods for managing large volumes of 3D point data, such as LiDAR and photogrammetry data, which included rasterizing, loading entirely into memory, decimation, and streaming from disk. A new advanced spatial indexing solution is introduced that allows for real-time rendering of massive 3D point clouds with unlimited size, compressed data, and low computer resource usage while preserving the full resolution and detail of the original data.
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
- Computer vision has improved with more data and processing power, but global scene understanding remains challenging.
- The document proposes a multidisciplinary approach combining CNNs and human visual cognition to better model scene understanding, with the goal of applications like autonomous vehicles.
- It describes experiments observing how humans and primates recognize scenes to inform modeling, incorporating global and local descriptors with relationships. This approach aims to advance scene understanding capabilities.
Geospatial Rectification of Web Transactions and Data SecurityPhoenix TS
Presentation from Mr. Tim Loomis, a Senior Systems Engineer at National Oceanic and Atmospheric Administration (NOAA).
He will address the implications for managing imagery data that has geographical components per pixel and bring up a broader discussion on what the move toward geospatial rectification of web transactions means for security and data management at our Meetup in February 10th here - http://www.meetup.com/Tech-Roots/events/219644408/
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
This document discusses techniques for large scale data mining using genetics-based machine learning. It begins by defining what "large scale" means in the context of data mining, including datasets with many records, high dimensionality, class imbalance, and many classes. It then discusses how evolutionary algorithms are naturally parallel and suited for large scale problems. The challenges of data mining at large scales are outlined, particularly related to data handling and representation. Finally, the document introduces several kaleidoscopic techniques for large scale data mining using genetic-based machine learning, including efficiency enhancement techniques like windowing, exploiting regularities in the data, fitness surrogates, and hybrid methods, as well as hardware acceleration techniques and parallelization models.
3D Objects in Wat Makutkasattriyaram's e-Museum: Progress, Experiences, and A...Rachabodin Suwannakanthi
Presentation on the topic "3D Objects in Wat Makutkasattriyaram's e-Museum: Progress, Experiences, and Applied Technologies" in PNC 2006 conference, Seoul, South Korea.
Context is everything, from the clothing you choose in the morning to the dinner menu you plan based on available ingredients and time. The word on the street is that DITA maps are the express context designed to drive builds for particular deliverables and conditionality for DITA topics. That is partly true, but it is not the whole story.
For one thing, maps are far more versatile than just as build directives. Moreover, DITA topic processing can get its cues from contexts other than maps. And therein hangs the premise of Going Mapless.
To get our own context for this presentation, we start with a quick review of the original architectural definition of DITA and then trace the popular information architectures and tools that have grown up with the standard as we currently know it. Then Don introduces some scenarios where DITA could be useful if freed from the the prevailing map-driven processing paradigm, and he walks you through some available methods and solutions for using DITA in these unconventional ways.
This presentation was given at Information Development World on October 2, 2015.
The document discusses various aspects of designing effective learning objects (LOs) for teaching mathematics at the secondary school level. It covers topics like LO metadata, instructional design approaches, storyboarding, feedback mechanisms, usability testing, and technical considerations. The key goals of LOs are to transition students from passive to active learning, bridge the digital divide, and close gaps in understanding through interactive, personalized instructional experiences.
This document discusses collaborative filtering and recommender systems. It begins with an overview of non-relational databases and graph databases. It then discusses collaborative filtering, including calculating similarity scores between users or items, predicting ratings for unseen items, and making recommendations. Specific methods discussed include Euclidean distance, Pearson correlation, and user-based filtering. The goal of collaborative filtering is to increase sales, market share, and targeted advertising by making personalized recommendations to users.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
This document discusses how GIS organizations can maximize the benefits of LiDAR data. It describes what LiDAR is, different LiDAR systems, and challenges of working with large LiDAR datasets. Traditionally, LiDAR data was managed on a project-by-project basis, but an enterprise GIS workflow allows data to be accessible for multiple applications and users. Example applications shown include forestry, natural resource management, energy, and emergency management. The presentation concludes that cloud computing, web services, and open data sharing are enabling easier and more collaborative use of LiDAR data within GIS.
The document discusses principles of computer vision and its applications. It is a lecture by Dr. Vanessa Camilleri from the University of Malta on computer vision fundamentals and techniques. The key topics covered include object detection methods, stages of computer vision like image acquisition and processing, and examples of computer vision applications in various domains like manufacturing, healthcare, transportation and more.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
Slide presentasi ini dibawakan oleh Anne Regina pada Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDIO pada tanggal 14 Mei 2016.
Presentation at Southern California Code Camp July 2013 in San Diego. This talk presents you with basic concepts in world of big data and data science, with focus on relational databases, noSQL, MapReduce, machine learning, and data visualization, along with demos of MapReduce in action and Pig on Hadoop. The purpose of this presentation is to get you familiar with terminologies and concepts in data science, and whet your appetite for further exploration into the world of big data. This presentation is adapted from an online course by Coursera with similar title and scope
The document describes the Social Informatics Data Grid (SIDGrid), which aims to:
1) Integrate heterogeneous datasets over time, place, and type through a shared data and service interface and common problems/theories.
2) Develop tools for collecting, storing, retrieving, annotating, and analyzing synchronized multi-modal data on computational grids.
3) The SIDGrid architecture allows streaming of video, audio and time series data across distributed datasets using time alignment, database, and grid computing standards. It provides search and analysis tools to browse over 4,000 projects containing various media files.
Project Matsu aimed to provide persistent data resources and elastic computing for disaster relief by making imagery available for processing using large-scale cloud computing. It evaluated three approaches: 1) Using Hadoop and MapReduce to split images and process parts in parallel; 2) Using Hadoop streaming with Python to preprocess images into a single file and process line-by-line; and 3) Using the Sector distributed file system to keep images together on nodes and applying user-defined functions to process images without splitting. The goal was to enable change detection on images from different times to assist relief workers.
LinkedIn is a large professional social network with 50 million users from around the world. It faces big data challenges at scale, such as caching a user's third degree network of up to 20 million connections and performing searches across 50 million user profiles. LinkedIn uses Hadoop and other scalable architectures like distributed search engines and custom graph engines to solve these problems. Hadoop provides a scalable framework to process massive amounts of user data across thousands of nodes through its MapReduce programming model and HDFS distributed file system.
More Related Content
Similar to Geo-referenced human-activity-data; access, processing and knowledge extraction
Presentation as held at the "Workshop on Knowledge Evolution and Ontology Dynamics" co-located with ISWC 2011. Related to the paper http://ceur-ws.org/Vol-784/evodyn1.pdf
Situation driven development: a methodology for the development of context-aw...damarcant
This document proposes a methodology called Situation-Driven Development for creating context-aware systems. It involves domain experts and programmers collaborating using a platform called Context Cloud. Context Cloud allows defining context information and situations through a web interface. It then generates outputs to adapt a system's behavior based on the identified situation. An evaluation found the methodology and platform made developing context-aware systems quicker and easier, and facilitated collaborative work between technical and domain experts.
Advancements In Visualization Of Remotely Sensed 3D DataMerrick & Company
The document discusses advancements in visualizing remotely sensed 3D data. It describes historical methods for managing large volumes of 3D point data, such as LiDAR and photogrammetry data, which included rasterizing, loading entirely into memory, decimation, and streaming from disk. A new advanced spatial indexing solution is introduced that allows for real-time rendering of massive 3D point clouds with unlimited size, compressed data, and low computer resource usage while preserving the full resolution and detail of the original data.
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
- Computer vision has improved with more data and processing power, but global scene understanding remains challenging.
- The document proposes a multidisciplinary approach combining CNNs and human visual cognition to better model scene understanding, with the goal of applications like autonomous vehicles.
- It describes experiments observing how humans and primates recognize scenes to inform modeling, incorporating global and local descriptors with relationships. This approach aims to advance scene understanding capabilities.
Geospatial Rectification of Web Transactions and Data SecurityPhoenix TS
Presentation from Mr. Tim Loomis, a Senior Systems Engineer at National Oceanic and Atmospheric Administration (NOAA).
He will address the implications for managing imagery data that has geographical components per pixel and bring up a broader discussion on what the move toward geospatial rectification of web transactions means for security and data management at our Meetup in February 10th here - http://www.meetup.com/Tech-Roots/events/219644408/
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
This document discusses techniques for large scale data mining using genetics-based machine learning. It begins by defining what "large scale" means in the context of data mining, including datasets with many records, high dimensionality, class imbalance, and many classes. It then discusses how evolutionary algorithms are naturally parallel and suited for large scale problems. The challenges of data mining at large scales are outlined, particularly related to data handling and representation. Finally, the document introduces several kaleidoscopic techniques for large scale data mining using genetic-based machine learning, including efficiency enhancement techniques like windowing, exploiting regularities in the data, fitness surrogates, and hybrid methods, as well as hardware acceleration techniques and parallelization models.
3D Objects in Wat Makutkasattriyaram's e-Museum: Progress, Experiences, and A...Rachabodin Suwannakanthi
Presentation on the topic "3D Objects in Wat Makutkasattriyaram's e-Museum: Progress, Experiences, and Applied Technologies" in PNC 2006 conference, Seoul, South Korea.
Context is everything, from the clothing you choose in the morning to the dinner menu you plan based on available ingredients and time. The word on the street is that DITA maps are the express context designed to drive builds for particular deliverables and conditionality for DITA topics. That is partly true, but it is not the whole story.
For one thing, maps are far more versatile than just as build directives. Moreover, DITA topic processing can get its cues from contexts other than maps. And therein hangs the premise of Going Mapless.
To get our own context for this presentation, we start with a quick review of the original architectural definition of DITA and then trace the popular information architectures and tools that have grown up with the standard as we currently know it. Then Don introduces some scenarios where DITA could be useful if freed from the the prevailing map-driven processing paradigm, and he walks you through some available methods and solutions for using DITA in these unconventional ways.
This presentation was given at Information Development World on October 2, 2015.
The document discusses various aspects of designing effective learning objects (LOs) for teaching mathematics at the secondary school level. It covers topics like LO metadata, instructional design approaches, storyboarding, feedback mechanisms, usability testing, and technical considerations. The key goals of LOs are to transition students from passive to active learning, bridge the digital divide, and close gaps in understanding through interactive, personalized instructional experiences.
This document discusses collaborative filtering and recommender systems. It begins with an overview of non-relational databases and graph databases. It then discusses collaborative filtering, including calculating similarity scores between users or items, predicting ratings for unseen items, and making recommendations. Specific methods discussed include Euclidean distance, Pearson correlation, and user-based filtering. The goal of collaborative filtering is to increase sales, market share, and targeted advertising by making personalized recommendations to users.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
This document discusses how GIS organizations can maximize the benefits of LiDAR data. It describes what LiDAR is, different LiDAR systems, and challenges of working with large LiDAR datasets. Traditionally, LiDAR data was managed on a project-by-project basis, but an enterprise GIS workflow allows data to be accessible for multiple applications and users. Example applications shown include forestry, natural resource management, energy, and emergency management. The presentation concludes that cloud computing, web services, and open data sharing are enabling easier and more collaborative use of LiDAR data within GIS.
The document discusses principles of computer vision and its applications. It is a lecture by Dr. Vanessa Camilleri from the University of Malta on computer vision fundamentals and techniques. The key topics covered include object detection methods, stages of computer vision like image acquisition and processing, and examples of computer vision applications in various domains like manufacturing, healthcare, transportation and more.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
Slide presentasi ini dibawakan oleh Anne Regina pada Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDIO pada tanggal 14 Mei 2016.
Presentation at Southern California Code Camp July 2013 in San Diego. This talk presents you with basic concepts in world of big data and data science, with focus on relational databases, noSQL, MapReduce, machine learning, and data visualization, along with demos of MapReduce in action and Pig on Hadoop. The purpose of this presentation is to get you familiar with terminologies and concepts in data science, and whet your appetite for further exploration into the world of big data. This presentation is adapted from an online course by Coursera with similar title and scope
The document describes the Social Informatics Data Grid (SIDGrid), which aims to:
1) Integrate heterogeneous datasets over time, place, and type through a shared data and service interface and common problems/theories.
2) Develop tools for collecting, storing, retrieving, annotating, and analyzing synchronized multi-modal data on computational grids.
3) The SIDGrid architecture allows streaming of video, audio and time series data across distributed datasets using time alignment, database, and grid computing standards. It provides search and analysis tools to browse over 4,000 projects containing various media files.
Project Matsu aimed to provide persistent data resources and elastic computing for disaster relief by making imagery available for processing using large-scale cloud computing. It evaluated three approaches: 1) Using Hadoop and MapReduce to split images and process parts in parallel; 2) Using Hadoop streaming with Python to preprocess images into a single file and process line-by-line; and 3) Using the Sector distributed file system to keep images together on nodes and applying user-defined functions to process images without splitting. The goal was to enable change detection on images from different times to assist relief workers.
LinkedIn is a large professional social network with 50 million users from around the world. It faces big data challenges at scale, such as caching a user's third degree network of up to 20 million connections and performing searches across 50 million user profiles. LinkedIn uses Hadoop and other scalable architectures like distributed search engines and custom graph engines to solve these problems. Hadoop provides a scalable framework to process massive amounts of user data across thousands of nodes through its MapReduce programming model and HDFS distributed file system.
Similar to Geo-referenced human-activity-data; access, processing and knowledge extraction (20)
Geo-referenced human-activity-data; access, processing and knowledge extraction
1. Geo-Referenced Human-Activity-Data;
Access, Processing and Knowledge
Extraction
Paul Lewis
(paul.lewis@nuim.ie)
Dr. Conor McElhinney,
Dr. Alexei Pozdnoukhov,
Dr. Christian Kaiser,
Fergal Walsh
Tuesday 31st May 2011
University of Bremen
2. Outline
• Geospatial Data Accessibility
• Modelling Challenges
• Spatial Hierarchy Model
• Access Process Examples
• Processing Paradigms and Knowledge Extraction
• Knowledge Extraction Decision Processes
• Spatial Workflow Patterns
• Temporal Dynamics in Communities
• Taking on the Tweets
• Feature extraction informs Risk Assessment Knowledge
• Data and Knowledge Visualisation
• Web Integration
• Urban Model Data Extraction
• Web geospatial knowledge extraction visualisation
• Wrap-Up
3. Data Geospatial-Accessibility
• Methodologies employed to enable and access the data’s geospatial
content – generate the geography then access the geography
1. Raw Data Access - Function of Data Source Complexity
• LiDAR/Imagery - 2D or 3D
• Web-Content Air Quality Sensor Weather Measurements VGI Feed (e.g. Twitter) Surveillance Camera
• Push SMS Web Page XML Video
• Polling
• Streaming Push Polling Stream Stream
Data
Data Receiver Stream Handler Stream Handler
Crawler
2. Creating the Geospatial Content
• MMS – GeoSpatial content is Inherent at very high resolution
• Geocrowd - Semantics need to be well understood in non explicit
context, twitter location(?)
4. Accessing through Spatial Hierarchy Models
• Spatial Hierarchy Modelling
• MMS context uses a spatial extent modelling approach
• Geocrowd will define this process on a content type access model
Constrained Workflow
LiDAR folder
Survey 10 Apr Survey 5 Dec Survey 2 May
Block 1 Block 1 Block 1
Block 2 Block 2 Block 2
.......
Block 3 Block 3 Block 3
. . .
.
Block N .
Block N .
Block N
. . .
MetaData: Geo Bounds, date, MetaData: Geo Bounds, date, MetaData: Geo Bounds, date,
processing done processing done processing done
5. Accessing through Spatial Hierarchy Models
• Optimising Data Accessibility in a circular data generation model
• Intelligent Query Access now enabled for
• Temporal
• Spatial
• Attributes
• etc……
Acquire
Query Store
Data Model Spatially
Visualisation Model
6. Accessing through Spatial Hierarchy Models
• Optimising Data Accessibility in a circular data generation model
• Intelligent Query Access now enabled for
• Temporal
• Spatial
• Attributes
• etc……
Acquire
Query Store
Data Model Spatially
Visualisation Model
7. Accessing through Spatial Hierarchy Models
• Optimising Data Accessibility in a circular data generation model
• Intelligent Query Access now enabled for
• Temporal
• Spatial
• Attributes
• etc……
Acquire
Query Store
Data Model Spatially
Visualisation Model
8. Accessing through Spatial Hierarchy Models
• Optimising Data Accessibility in a circular data generation model
• Intelligent Query Access now enabled for
• Temporal
• Spatial
• Attributes
• etc……
Acquire
Query Store
Data Model Spatially
Visualisation Model
9. Predictive Geospatial Data Access Modelling
• i2maps (Dr. Alexei Pozdnoukhov, NCG)
• Real Time Weather Prediction
10. Geospatial Data Processing
• Knowledge Extraction informs Decision Support Processes
• What does this mean in a processing context?
• A paradigm that is??
• Centralised - Distributed
• Spatial, Temporal…….
• MMS Context is Constrained to static data Survey Processing
• Temporal at best and partially, but (un)intentionally, Spatial.
• Not collected independent of decision expectations
• High Level Decisions
• Alternative model approaches
• Geocrowd (Dictionary of Models)
11. Geospatial Data Processing
• Where we could go with this at a physical level
• CLOUD
• Distributed processing, parallelism, scalability, flexibility
• Parallelism
• SDBMS access takes 1 sec.
• But processing takes 60 sec.
• Scalability
• Processing scales to data model
updating – Weather, Twitter
• Storage scales model to data acquisition
– Lidar/Imagery
• This enables a Spatially-lead Workflow model at a knowledge level
• Allows for fast information extraction
• Allows for future knowledge extraction
12. Flows of calls form communities
North-South divide: typical destinations of calls from cells “ “
14. Dynamics of links:
community tracking in time
• Time/Space Clustering of Mobile Communications Network Cells
(Fergal Walsh, NCG)
15. First steps: Twitter at NCG
Preliminary work on the content-rich data streams:
• Real-time Twitter feed is monitored
• Geo-referencing is done by tweet, user, or location
• Activity levels processed and visualised with heat maps
• Tags and messages are saved
Natural language processing:
some experience in NER task with NLTK, SENNA packages
More work needed to get messages semantics and relate
topics to activities.
23. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
24. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
25. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
26. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
27. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
28. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
29. Visualisation and Interaction
• Where things are at…
• Desktop tools for visualisation are well defined, developed and
implemented
• Where things are going…
• Browser support boundaries constantly being expanded
• WebGL for 3D visualisation
• Is this the future?
• i2maps thinks so and will continue to implement this
paradigm
30. MMS GeoSpatial Data Framework
• Fully Interactive Browser Implementation for Geo-Referenced
Environment modelling data
• Access, Processing and Visualisation
31. To Wrap-Up
• MMS Work completed in 1.5 years
• With 1.5 people years
• i2maps is a long-term open source project.
Next releases: July 31st, for OSGeo LiveDVD and a FOSS4G
workshop at Denver, on September 13th.
Going Forward
Research problems we’d like to solve within Geocrowd at NCG:
1) Relate activity levels to content-rich data sources to enhance
interpretability
2) Make it computationally efficient and scalable (Internet-scale)