This document discusses EventShop, a system for real-time macro situation recognition from heterogeneous data streams. EventShop ingests data from sources like sensors, social media, satellites and uses it to detect situations. It represents spatial data as "emages" on a grid and uses operators to detect situations. The document outlines EventShop's architecture, collaboration with NICT on tools like EventWarehouse and Sticker, work on multi-granularity emages and predictive analytics, and applications for social life networks.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Our goal is to create a web application that would give insights to its user about the crime scenario and its various aspects in Chicago.
Our application will contain:
A search box/drop down list where user can select a district.
Geospatial analysis using ArcGIS maps and visualizations that are embedded into the web app which will be dynamically updated to show most interesting patterns or heat maps for that district.
Statistical analysis and visualizations on historical data to the user.
Prediction of the date when the next crime will happen and its probability.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Our goal is to create a web application that would give insights to its user about the crime scenario and its various aspects in Chicago.
Our application will contain:
A search box/drop down list where user can select a district.
Geospatial analysis using ArcGIS maps and visualizations that are embedded into the web app which will be dynamically updated to show most interesting patterns or heat maps for that district.
Statistical analysis and visualizations on historical data to the user.
Prediction of the date when the next crime will happen and its probability.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Here are some of the things our Data Analytics team can doLoren Moss
Using tools like Alteryx, AWS Quicksight, and methods such as RegEx, JSON, Python, SQL and SPARQL we can help extract the knowledge hidden in your data. www.unidodigital.com
The purpose of this study is to develop a system which will assist a user to determine if a location can be entitled as a “Safe” residence or not. The output will be based on an analysis carried out on the local crime history of the city. This involves examining a huge geolocation data and zeroing down to a single area. The area with majority crime incidents will be highlighted as Unsafe. Clicking/hovering on a single record will display name, associated crime and its rank depending on number of crimes occurred. Big Data Hadoop and Hive systems are implemented in Azure for the analysis.
Keywords: Hadoop, Big Data, Hive, Azure
Using Sensors to Bridge the Gap between Real Places and their Web-based Repre...iammyr
With the proliferation of Smart Cities, more and more live data sources such as webcam feeds and physical sensor information are publicly accessible over the Web. However, these sources are typically decoupled from normal websites, and are therefore not within the scope of traditional online search using Web search engines. In this paper, we focus on websites that refer to physical locations (e.g., restaurants, hotel, shops) for which live sensor data and information might be available. We propose G-SENSING, our platform for the seamless integration of live data into the normal browsing experience of online users. In a nutshell, we provide a browser add-on that injects sensor information into Google result pages for each result that refers to a physical place. Our backend infrastructure consists of a data repository connecting websites to physical locations, as well as a data source for sensor information based on Linked Data principles. In our evaluation, we first show that websites referring to places are a very common phenomenon, thus motivating the potential benefits of G-SENSING. Furthermore, we show that our system only adds a small overhead to normal bandwidth requirements when browsing the Web.
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
Enterprises across all sectors have invested heavily in big data infrastructure (Hadoop, Impala, Spark, Kafka, etc.) to turn data into insights into business value. Clusters are getting bigger, more complex and employing more and more data scientists and engineers. As a result, it is increasingly challenging for Data Ops teams to operate and maintain these clusters to meet business requirements and performance SLAs. For instance, a single SQL query may fail or take a long time to complete for various reasons, such as SQL-level inefficiencies, data skew, missing and stale statistics, pool-level resource configurations, such that a resource-hogging query could impact the entire application stack on that cluster. A critical capability to scale application performance is to do cluster-wide tuning. Examples include: tune the default application configurations so that all applications would benefit from that change, tune the pool-level resource allocations, identify wide-impact issues like slow nodes and too many small files, and many others. Cluster-level tuning requires considering more factors, and has a risk of significantly worsening cluster performance; however, it is often done via trial and error with educated guesswork, if attempted at all. We employ machine learning and AI techniques to make cluster-level tuning easier, more data-driven, and more accurate. This talk will describe our methodology to learn from various sources of data such as the workload, the cluster and pool resources, metastore, etc., and provide recommendations for cluster defaults for application and pool resource configurations. We will also present a case study where a customer applied unravel tuning recommendations and achieved 114% increase in the number of applications running per day while using 47% fewer vCore-Hours and 15% fewer containers.
Speaker: Eric Chu
This is a project idea of creating a smart soccer telegraph for the benefit of those with limited access to high bandwidth internet (e.g. developing countries). The project combines several technologies to digitalise the game of soccer and provide augmented visualisation of the game from semantic stream data.
Data Philly Meetup for 2/19/2013 on geospatial data science with crime data and applications of GeoTrellis to solve challenges related to large data sets.
Crowd sourced intelligence built into search over hadooplucenerevolution
Presented by Ted Dunning, Chief Application Architect, MapR
& Grant Ingersoll, Chief Technology Officer, LucidWorks
Search has quickly evolved from being an extension of the data warehouse to being run as a real time decision processing system. Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. In such a system, intelligent or trend-setting behavior of some users is reflected back at other users. More importantly, the mathematics of evaluating these models can be hidden in a conventional search engine like SolR, making the system easy to build and deploy. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
This talk was built around the example of NOAA Big Data Project, in which Planet OS is a partner with Amazon Web Services. The aim of NOAA Big Data Project is to bring NOAA's atmospheric and oceanic data to the cloud, make it discoverable and machine-readable.
This presentation outlines some of the organizational and technical challenges that exist in the project, as well as potential solutions and ideas to approach this set of challenges.
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Here are some of the things our Data Analytics team can doLoren Moss
Using tools like Alteryx, AWS Quicksight, and methods such as RegEx, JSON, Python, SQL and SPARQL we can help extract the knowledge hidden in your data. www.unidodigital.com
The purpose of this study is to develop a system which will assist a user to determine if a location can be entitled as a “Safe” residence or not. The output will be based on an analysis carried out on the local crime history of the city. This involves examining a huge geolocation data and zeroing down to a single area. The area with majority crime incidents will be highlighted as Unsafe. Clicking/hovering on a single record will display name, associated crime and its rank depending on number of crimes occurred. Big Data Hadoop and Hive systems are implemented in Azure for the analysis.
Keywords: Hadoop, Big Data, Hive, Azure
Using Sensors to Bridge the Gap between Real Places and their Web-based Repre...iammyr
With the proliferation of Smart Cities, more and more live data sources such as webcam feeds and physical sensor information are publicly accessible over the Web. However, these sources are typically decoupled from normal websites, and are therefore not within the scope of traditional online search using Web search engines. In this paper, we focus on websites that refer to physical locations (e.g., restaurants, hotel, shops) for which live sensor data and information might be available. We propose G-SENSING, our platform for the seamless integration of live data into the normal browsing experience of online users. In a nutshell, we provide a browser add-on that injects sensor information into Google result pages for each result that refers to a physical place. Our backend infrastructure consists of a data repository connecting websites to physical locations, as well as a data source for sensor information based on Linked Data principles. In our evaluation, we first show that websites referring to places are a very common phenomenon, thus motivating the potential benefits of G-SENSING. Furthermore, we show that our system only adds a small overhead to normal bandwidth requirements when browsing the Web.
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
Enterprises across all sectors have invested heavily in big data infrastructure (Hadoop, Impala, Spark, Kafka, etc.) to turn data into insights into business value. Clusters are getting bigger, more complex and employing more and more data scientists and engineers. As a result, it is increasingly challenging for Data Ops teams to operate and maintain these clusters to meet business requirements and performance SLAs. For instance, a single SQL query may fail or take a long time to complete for various reasons, such as SQL-level inefficiencies, data skew, missing and stale statistics, pool-level resource configurations, such that a resource-hogging query could impact the entire application stack on that cluster. A critical capability to scale application performance is to do cluster-wide tuning. Examples include: tune the default application configurations so that all applications would benefit from that change, tune the pool-level resource allocations, identify wide-impact issues like slow nodes and too many small files, and many others. Cluster-level tuning requires considering more factors, and has a risk of significantly worsening cluster performance; however, it is often done via trial and error with educated guesswork, if attempted at all. We employ machine learning and AI techniques to make cluster-level tuning easier, more data-driven, and more accurate. This talk will describe our methodology to learn from various sources of data such as the workload, the cluster and pool resources, metastore, etc., and provide recommendations for cluster defaults for application and pool resource configurations. We will also present a case study where a customer applied unravel tuning recommendations and achieved 114% increase in the number of applications running per day while using 47% fewer vCore-Hours and 15% fewer containers.
Speaker: Eric Chu
This is a project idea of creating a smart soccer telegraph for the benefit of those with limited access to high bandwidth internet (e.g. developing countries). The project combines several technologies to digitalise the game of soccer and provide augmented visualisation of the game from semantic stream data.
Data Philly Meetup for 2/19/2013 on geospatial data science with crime data and applications of GeoTrellis to solve challenges related to large data sets.
Crowd sourced intelligence built into search over hadooplucenerevolution
Presented by Ted Dunning, Chief Application Architect, MapR
& Grant Ingersoll, Chief Technology Officer, LucidWorks
Search has quickly evolved from being an extension of the data warehouse to being run as a real time decision processing system. Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. In such a system, intelligent or trend-setting behavior of some users is reflected back at other users. More importantly, the mathematics of evaluating these models can be hidden in a conventional search engine like SolR, making the system easy to build and deploy. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
This talk was built around the example of NOAA Big Data Project, in which Planet OS is a partner with Amazon Web Services. The aim of NOAA Big Data Project is to bring NOAA's atmospheric and oceanic data to the cloud, make it discoverable and machine-readable.
This presentation outlines some of the organizational and technical challenges that exist in the project, as well as potential solutions and ideas to approach this set of challenges.
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
In this webinar, Srinath Perera, director of research at WSO2, will discuss
Big data landscape: concepts, use cases, and technologies
Real-time analytics with WSO2 CEP
Batch analytics with WSO2 BAM
Combining batch and real-time analytics
Introducing WSO2 Machine Learner
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
A Web of Things Based Eco-System for Urban Computing - Towards Smarter CitiesAndreas Kamilaris
Environmental awareness and knowledge may help people to take more informed decisions in their everyday lives, ensuring their health and safety. The Web of Things enables embedded sensors to become easily deployed in urban areas for environmental monitoring such as air quality, electromagnetism, radiation, etc. In this presentation, we propose an eco-system for urban computing which combines the concept of the Web of Things, together with big data analysis and event processing, towards the vision of smarter cities that offer real-time information to their habitants about the urban environment. We touch upon near real-time web-based discovery of sensory services, citizen participation, semantic technologies and mobile computing, helping people to take more informed everyday decisions when interacting with their urban landscape. We then present a case study where we demonstrate the feasibility and usefulness of this eco-system to the everyday lives of citizens.
This research has been supported by the P-SPHERE project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skodowska-Curie grant agreement No 665919.
A presentation pertaining to the integration of real-time data to the cloud with significant potential in the areas of Industrial IT,Real-time sensor information processing and Smart grids applied to various vertical industries. This is related to my blog post at www.cloudshoring.in
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extensionRoberto Messora
Real time monitoring and Internet of Things are key success factors in many business activities.
In this presentation we will show how we solved a common issue in managing a large number of different types of event per second that contain some sort of geographical information.
We built a processing pipeline leveraging the high ingestion capabilities of Windows Azure Event Hub and Stream Analytics, then applying location analytics procedures with ArcGIS GeoEvent Processor.
In this way we can select just the informations we need to be processed by the ArcGIS platform, reducing the number of events and normalizing data content.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
2. 6/6/2016 2
Web
Location Based
Mobile Applications
Ongoing Archived
Database System
satelliteCloud
resources
Environmental
Sensor Devices
Internet of Things
Social Media
Billions of
geo-location and
time based devices
Social Life
Network
Real-time
Information sharing
&
decision making
Experts
People
Governmental
Agencies
Situations
[Jain 2011] Social Life Network
3. Examples of (Specific) System
in SLN approach
6/6/2016 3
one-touch SOS
Emergency SituationDaily Situation
Social Life Network
Connect People to real-world Resources
effectively, efficiently, and promptly
in given Situations.
4. EventShop : Global Situation Detection
Situation
Recognition
Evolving
Global Situation
….
Data
Ingestion
and
aggregation
Database Systems
Satellite
Environmental
Sensor Devices
Social Network
Internet of Things
6/6/2016 4
00
Need- Resource Matcher
Recommendation
Engine
Actionable
Information
Resources
Needs
Personal
Situation
Recognition
Personal EventShop: Personal Situation Detection
Evolving
Personal Situation
Data
Ingestion
Wearable Sensors
Calendar
Location….
DataSources
5. History of EventShop
• Building as part of SLN framework
• Environment and visualization tool for analyzing
heterogeneous data streams in macro scale
• Help non (CS) technical experts in various domains to easily
conduct experiments for detecting real-world situations
• Representing geo-spatial data in grid structure called E-mage
• Generic set of operators for detecting situations
• Pioneers: Vivek Singh (MIT), Mingyan Gao (Google)
6/6/2016 5
6. EventShop UI
11/13/2013 6
Example Notification / Alerts:
You are currently in the area where there is a high chance of flooding,
these are available shelters within 10 miles around you.
Space
Time Situation
Resources
People
7. Current State and Next Steps
• Enhance EventShop Architecture
• Collaboration Research (with NICT):
– Sticker 3D visualization tool,
– EventWarehouse
• Multi Granularity E-mage
• Predictive Analytics
• SLN Use Case
6/6/2016 7
8. OutputIngestor
Data Source
Parser
Data Adapter
Emage
Generator
(+resolution mapper)
Processing
EvShop Storage
Query
Parser
Query
Rewriter
Event Stream Processing
Executor
Action Parser
Register Data Source Register Continuous Query
Situation
Emage
Visualization
(Dashboard)
Actuator
Communication
Action Control
Event Property &
Other Information
(e.g., spatio-temporal
pattern)
ᴨ
ᴨ
µ
Data Access Manager
Live Stream
Archived Stream
Situation Stream
EventShop
Architecture
6/6/2016
Physical
Data Source
(e.g., sensor
streams, geo-image
streams)
Logical Data Source
(e.g., preprocessing
data streams, social
media streams)
Raw Event
10. EvS Input ManagerExternal Event
Preprocessing
(EvWarehouse)
Real-Time Sensor Streams
e.g., Cloud Satellite Pictures,
Gridding Data
Real-Time Sensor Streams
e.g., Wind Speed, Traffic Flow
Real-Time Sensors
Event Model
Wrapper 1D
STT to Emage
Event Model
Wrapper 2D
Data Adapter Emage Generator
Emage
Emage
Factory
STT
Emage
Raw Social Media Streams
e.g., Twitter, News RSS Feed
Near Real-Time Sensors
Event Model
Wrapper
STT to Emage
Data Adapter Emage Generator
Emage
Emage
Factory
STT
Topic Event
Detection
Abnormal
Event
Detection
Raw Sensor Streams
e.g., PM2.5 data
“EventModel” Streams
e.g., suddenly change
of data trend
within time window
Emage Store
STT Store
Metadata Store
EventSource
Parser Interface
ES internal storage
(Optional)
RealTime
Emage Streams
NearRealTime
Emage Streams
Processing
Manager
ES Descriptor
ES Control
(Start/Stop/
View ES)
Users Input
Data/Events Flow
Theme
AdapterType
SourceURL
TimeWindow
Parameters
InitialResolution
AggregationFunc
Metadata
6/6/2016 10
11. Stream Processing Engine
Operators Manager
Built-in
Operators
User-Defined
Operators
ᴨ
ᴨ
µ
Data
Access
ᴨ
ᴨ
µ
Data
Access
ᴨ
ᴨ
µ
Data
Access
Input Manager
Event Stream Executor
Operators
Nodes
Storage
AsterixDB, SciDB,
MongoDB
Emage Store
STT Store
Metadata Store
Query Parser
Interface
Query
Descriptor
Query Control
(Start/Stop/ View)
Real-time/
near real-time
Emage Streams
Archived
Emage Streams Situation Streams
Emage
Interpolation
Function
Emage
Conversion
Final Resolution,
Interpolation Func
Parameter Operators
Operators
Store Parameters
Retrieve Parameters
Query
Rewriter
Execution Plan
6/6/2016 11
12. Current State and Next Steps
• Enhance EventShop Architecture
• Collaboration Research (with NICT):
– Sticker 3D visualization tool,
– EventWarehouse
• Multi Granularity E-mage
• Predictive Analytics
• SLN Use Case
6/6/2016 12
14. EvShop and EvWarehouse Interface
1. Retrieve EventModel stream
– Option1: EvShop periodically sends request to EvWH
to access new events stored in EventModel table
(MPQL)
– Option 2: EvWH pushes new events to EvShop
(listener)
2. Access EventModel stream’s metadata
3. Create new EventModel Stream
6/6/2016 14
15. Example of MPQL
SELECT MIN(observation),MAX(observation),SUM(observation), AVG(observation)
FROM LiveERestflCO2Sensor
GROUP BY
TIME('2013-10-01T00:00:00','2013-10-02T00:00:00', 12 HOUR ),
SPACE( 130.0,30.0,140.0,40.0, 5,5 )
6/6/2016 15
SELECT observation
FROM STREAM LiveERestflCO2Sensor
16. Current State and Next Steps
• Enhance EventShop Architecture
• Collaboration Research (with NICT):
– Sticker 3D visualization tool,
– EventWarehouse
• Multi Granularity E-mage
• Predictive Analytics
• SLN Use Case
6/6/2016 16
17. Multi Granularity E-mage
• data is created and collected in different forms
• different sensors cover different sized spaces,
produce data at different rates
• data is produced and consumed at different
spatial, temporal, and symbolic granularities
6/6/2016 17
18. Pyramid of E-mage Resolution
6/6/2016 18
Level Stel Size
1 78 km
2 39 km
3 19.6 km
4 9.8 km
5 4.9 km
6 2.4 km
7 1.2 km
8 611 m
9 306 m
10 153 m
11 76 m
12 39 m
13 19 m
14 10 m
15 5 m
16 2.4 m
17 1.2 m
18 60 cm
19 30 cm
20 15 cm
Inspired by the most popular service like
Google Maps, Bing Maps, and OGC WMTS
They provide the standard of the
granularity level of the world map
19. Multi Granularity E-mage
6/6/2016 19
Time
t1 t2 t3 t4
Space
DS1: update every 10 mins
DS2: update every 5 mins
DS3: update every 30 mins
The situation model
is processed every 10 mins
E-mage spatial transformation are categorized into two main types
1) Coarse2Fine: nearest-neighbor interpolation, linear interpolation,
bilinear interpolation, and split uniform.
2) Fine2Coarse: summation, maximum value, minimum value, average,
majority.
20. Multi Granularity E-mage
• How to dynamically adjust appropriate
granularity?
– Guarantee the quality of the results
– Data error propagation
• Uncertainty of data stream, data loss during data
conversion, etc.
– Source selection
6/6/2016 20
21. Rasterization Errors Prediction
• The regression model depicts the relationships
between rasterization errors and their affecting factors
– Equal area conversion (EAC) algorithm is used for
rasterization of vector polygons
– Rasterization errors calculated from Error Evaluation
Method Based on Grid Cells (EEM-BGC)
– The factors includes both the complexity of polygons
perimeter index (e.g., density of arcs length (DA) and
density of polygon (DP)) and the size of gird cells (SG).
6/6/2016 21
Relative area error = Area before conversion – Area after conversion
Area Before conversion
)ln(456.931.0418.0499.58 SGDPDAE
For vector data of county level boundary of Beijing
[Liao 2012] Error Prediction for Vector to Raster Conversion
Based on Map Load and Cell Size
22. Current State and Next Steps
• Enhance EventShop Architecture
• Collaboration Research (with NICT):
– Sticker 3D visualization tool,
– EventWarehouse
• Multi Granularity Emage
• Predictive Analytics
• SLN Use Case
6/6/2016 22
24. Current State and Next Steps
• Enhance EventShop Architecture
• Collaboration Research (with NICT):
– Sticker 3D visualization tool,
– EventWarehouse
• Multi Granularity Emage
• Predictive Analytics
• SLN Use Case
6/6/2016 24
25. 6/6/2016 25
Calendar PESi
FMB (Individual’s Feeling)
Accelerometer
Location
Fitness Data
(Nike, Fitbit) Data
Ingestion &
Aggregation
Heart Rate
Location (Move)
Food Log
FMB
(People’s Feeling, Location)
ESOzone
CO2
SO2
PM 2.5
Pollen (Tree, Grass)
Air Quality Index
Data
Ingestion &
Aggregation
Social Media
(News, Tweets)
Weather
Macro
Situation Recognition
Predictive Analytics
Personal
Situation Recognition
Persona
Asthma Allergy App Server
Data Collection
MacroSituationPersonalSituation
Need and Resources
Recommendation
The Web now has enormous volume of heterogeneous data being continuously reported by different sensors and humans from different locations.
The web is become a universal medium or data, information, and knowledge exchange
Real world phenomena are now being observed by multiple media streams which are available in real-time over the web, and increasingly the majority of these has space and time semantic .
We believe:
A significant fraction of the data regularly created is location-sensitive data streams.
Many emerging applications are related to taking actions in real time and depend on emerging ‘situations’ and contexts.
Lots of data about emerging situations and contexts is already available and more is becoming available.
Examples: Disaster Management detect effected area
Health environment hazard
What is the point to create generic framework?????
Definition of Situation and Events????
Situation is actionable abstraction of observed spatio-temporal descriptor
Internet of things
Data base system
Global sensor (fields sensors)
Satellite
Social network
Mobile application
IaaS is the most basic level of the cloud computing service models. It offers the virtual (as well as physical) machines, servers, storage options, load balancers, networks, and more.
PaaS is next in line, focusing more on operating systems, databases, web servers, development tools, etc. This is where IT development happens.
Situation recognition is a central component of an SLN. Situations are the result of interactions among several related events. Events are the results of some happenings that are due to signficant state changes. Based on the situation at a place, the system Identies needs and available resources
to satisfy those needs. The situation recognition is always in a context of a specfic application and so are all other operations. The data sources used by the system are also those publicly available or specically made available in the context of the application. The system closes the loop by sending
actuation or action information to appropriate needs and resources as a result of the matching. A SLN system is shown in Figure 2.
Here, not only people, but other objects like mobile applications (e.g. body activity monitors if allowed by the owner), databases, and the Internet of Things (e.g., trac sensors) also observe, store and report information about the state of entities in the world. In this setting, we conceive of a world where (a) a signifcant body of information today come from sensors, (b) the number of sensors is huge and the number of events generated by them are even larger, (c) a large fragment of data, both human and device generated, have associated locational information, (d) most situation and needs assessment decisions are for controlling and managing real time and evolving situations, and (e)keeping pace with the real-time nature of our problem space, planning and decision processes need to be viewed like a real-time control system that interoperates with the publish- subscribe and update-propagation model of standard social networks. A user update in a social network is analyzed to create a microevent (or a personal event), which is then fed to the situation recognizer.
The situation recognizer evaluates this microevent with respect to other events from different sources and creates an action (e.g., a message, a recommendation, an alert) that goes back to the sender or a potential resource that can service the needs of the original message sender.
Given the geo-spatial continuity, we believe that a spatial grid structure is naturally suitable for
representing various geo-spatial data, where each cell of the grid stores value of observations at
the corresponding geo-location and in turn represents evolving situation at a location in space.
We adopt the grid structure, and call it E-mage (an event data based analog of image) [19]. The
Emage Generator: Transform data from Data Adapter into Emage representation and is responsible for both making this data directly available to the executor, as well as writing it to the disk recent buffer, and emage resolution mapper,
The queries run in the executor and can access the live data directly from emage generator and the historical data from the disk.
Data access method is used to handle disk overload – data reduction/ user define function etc. and create Emage stream from disk. In addition, other information such as spatial and temporal pattern, and other properties
Query rewriter -> source selection
Depending on the applications, geospatial data streams used in models may be needed at different
bounding boxes and resolutions. For instance, users studying traffic patterns near Los Angeles area
may require pollution data at level of every 50 yards for every 30 seconds. However, experts who
study climate of US may need the same data for the whole US every 10 miles for every day.
I will only talk about operators over grid data structure -> but tomorrow you will hear more talk from Ish in other models.
Stream processing -> tuple based
GIS -> grid, graph, line
Context on grid array -> advantages, unique behavior or characteristic
In the physical sensor networks, sensors are built to observe the real world environment; for example, space satellite, remote sensing, laser scanning, acoustic sensing, motion sensing and camera sensing. Most of the information is time series of measurements. A sensor reports a measurement over a given time period, while its coverage area is often fixed and promoted to the metadata. The measurement area can be represented in variety of GIS structures including point (latitude, longitude coordinate), vector polygon (region), vector line (arc), and raster (grid) areas. In actuated network, sensors report data only when they have been triggered or detect an event.
In the logical sensor networks, geospatial data are generated from the cyber world to represent events in the real world. The data are reported mostly by human via variety types of service such as location based service, social network sensing (e.g., Twitter, Facebook, Flickr), statistical reports, and news. Since these data are naturally available in unstructured format and could have significant noise and missing data, it is nontrivial how to extract meaningful information from them. Many researchers have studied and contributed into this aspect including data mining, entity extraction, topic discovery, and sentiment analysis.
Accessing External Data
Accessing Internal Data (via EvS Internal Storage)
First, the current version favors data input in grid or raster form. We realize that much data is
created and collected in different forms. Also, data is produced and consumed at different spatial,
temporal, and symbolic granularities. Different sensors cover different sized spaces, produce data
at different rates. Many data sources use geo-political concepts and relationships in producing
and representing data. This means that we must use GIS structures as input as well as output
while doing most computations still in grid format. This introduces many computational as well
as quality of results related issues.
In addition, data streams used in these models are generated by data sources available on the Web,
access of which usually suffers from various processing and network constraints.
Reckless use of these data services without careful planning will eventually make it impossible for the system to access
data streams. Also, duplicate access to data sources and redundant computation can also waste
huge amounts of machine and network resources.
In GIS, the diverse Web Map Service (WMS) specification [1] of the Open Geospatial Consortium (OGC)
Zoom level is just for presen
Every Stel at any detail level represents a single fixed ground location.
The ground resolution indicates the distance on the ground that’s represented by a single Stel. For example, at a ground resolution of 10 meters/Stel, each Stel represents a ground distance of 10 meters. The ground resolution varies depending on the level of detail and the latitude at which it’s measured. tation or visualization framework I would like to bring this concept/standard to our computational framework to represent the real world in Emage.
The factors includes density of arcs length, density of polygons, and the size of gird cells.
The first two represents the complexity of the polygon.
Calculate EEM-BGC
General rasterization erros, Das, DPs were calculated by using the ARCGIS software,
The relationships were analyze
density of arcs length (DA) -> m/0.001km^2
density of polygon (DP) -> n/100km^2
size of gird cells (SG) -> km
EventShop integrates heterogeneous real-time data streams and detects actionable situations. Sit-uation recognition based on latest observations alone is too late for taking appropriate actions to prevent critical events. It is important to guide people based on expected situations in the near future, rather than on the situation that just occurred.
Secondly, it is good to recognize situations after they happened. But it is much better if we can predict situations, event just a bit in advance, and act accordingly. This is the essence of closed loop control systems. Application of predictive analytics in EventShop will increase its applicability. We address these issues in this proposal.
The real-time data streams are in nature of time series. Autore-gressive (AR), moving average (MA), and autoregressive moving average (ARMA) models [95] are often used for time series prediction. Multivariate spatio-temporal autoregressive model takes into account both spatial and temporal correlations but requires to estimation of a large number of parameters. In order to reduce the number of parameters, [89] and [97] consider the spatial cor-relation in neighbors of distance. EventShop involves complex relationship between large number of data streams, so the selection
of data streams correlated to predicted target is very important to improve the prediction accuracy.
We propose to include feature selection in EventShop. There are many feature selection methods,
such as L1 norm, L2 norm, group lasso and so on. Here, we use Lasso, penalized regression model
(4) to eciently solve the feature selection problem.
The data streams over the web (e.g. tweets, weather.gov feeds) are translated into a unified format.
Based on application logics, multiple spatio-temporal analysis operators are formed to generate different situation recognition models.
The uniform data streams are continuously fed into the models to detect and recognize real-time situations.
Then, the detected situations (e.g. ‘flu outbreak’ in New England) can be combined with user parameters (e.g. ‘high temperature’, and location).
Again, based on application logic, the situation based controller is constructed to send out personalized action alerts (e.g. ‘Report to CDC center on 4th street’) to each individual.
In addition, the analytic reports are also available to a central analyst who can then take large-scale (state, nation, corporate, or world-wide) decisions.
There are two main components in EventShop framework which are Data Ingestor and Stream Processing Engine.
A workflow of moving from heterogeneous raw data streams to actionable situations is shown here.
In the Data Ingestor component, original raw spatio-temporal data from the Web are translated into unified STT (Space-Time-Theme) format along with their numeric values using an appropriate Data Wrapper.
Based on users’ defined spatio-temporal resolutions, the system aggregates each STT stream to form an E-mage stream.
These E-mage streams are then transferred to the Stream Processing Engine component for processing.
Based on situation recognition model determined by the domain expert, appropriate operators are applied on the E-mage streams to detect situation.
The final step is a segmentation operation that uses domain knowledge to assign appropriate class to each pixel on the E-mage. This classification results in a segmentation of an E-mage into areas characterized by the situation there.
Once we know the situation, appropriate actions can be taken.