Siripen Pongpaichet is a PhD candidate at UC Irvine studying complex event stream processing, multimedia information systems, and large scale data management. Their research focuses on EventShop, an open source platform for detecting situations from heterogeneous data streams. EventShop represents data spatially using "E-mages" and detects situations through stream processing operators. It aims to bridge semantic concepts and low-level data to provide actionable information. Challenges include efficient data ingestion and developing a stream processing engine to recognize situations from data streams.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Our goal is to create a web application that would give insights to its user about the crime scenario and its various aspects in Chicago.
Our application will contain:
A search box/drop down list where user can select a district.
Geospatial analysis using ArcGIS maps and visualizations that are embedded into the web app which will be dynamically updated to show most interesting patterns or heat maps for that district.
Statistical analysis and visualizations on historical data to the user.
Prediction of the date when the next crime will happen and its probability.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Overview from NASA's 2015 Space Apps Challenge, which is an Innovation Incubator program that allows NASA to test out new concepts and learn how citizens engage with NASA's open data.
Data Science is the new black! However, becoming a data scientist requires knowledges in various areas. This slide discuss what one should learn to become a data scientist.
How large-scale image analytics (near-real time analysis of satellite images, machine learning) could help (re-)insurer anticipate natural catastrophes and estimate damages more precisely
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
Enterprises across all sectors have invested heavily in big data infrastructure (Hadoop, Impala, Spark, Kafka, etc.) to turn data into insights into business value. Clusters are getting bigger, more complex and employing more and more data scientists and engineers. As a result, it is increasingly challenging for Data Ops teams to operate and maintain these clusters to meet business requirements and performance SLAs. For instance, a single SQL query may fail or take a long time to complete for various reasons, such as SQL-level inefficiencies, data skew, missing and stale statistics, pool-level resource configurations, such that a resource-hogging query could impact the entire application stack on that cluster. A critical capability to scale application performance is to do cluster-wide tuning. Examples include: tune the default application configurations so that all applications would benefit from that change, tune the pool-level resource allocations, identify wide-impact issues like slow nodes and too many small files, and many others. Cluster-level tuning requires considering more factors, and has a risk of significantly worsening cluster performance; however, it is often done via trial and error with educated guesswork, if attempted at all. We employ machine learning and AI techniques to make cluster-level tuning easier, more data-driven, and more accurate. This talk will describe our methodology to learn from various sources of data such as the workload, the cluster and pool resources, metastore, etc., and provide recommendations for cluster defaults for application and pool resource configurations. We will also present a case study where a customer applied unravel tuning recommendations and achieved 114% increase in the number of applications running per day while using 47% fewer vCore-Hours and 15% fewer containers.
Speaker: Eric Chu
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Examples of Applied Semantic Technologies to Solve Variety Challenge of Big Data: Application of Semantic Sensor Network
(SSN) Ontology
Pramod Anantharam - Kno.e.sis
AGENDA
* Demo 1: Use Amazon Rekognition API connected to a camera for near real-time image analyses.
* Demo 2: Install and extend Google TensorFlow to retrain a classifier to classify custom images.
* Demo 3: Use AutoML.
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...European Data Forum
Selected Talk by Allan Hanbury, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Our goal is to create a web application that would give insights to its user about the crime scenario and its various aspects in Chicago.
Our application will contain:
A search box/drop down list where user can select a district.
Geospatial analysis using ArcGIS maps and visualizations that are embedded into the web app which will be dynamically updated to show most interesting patterns or heat maps for that district.
Statistical analysis and visualizations on historical data to the user.
Prediction of the date when the next crime will happen and its probability.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Overview from NASA's 2015 Space Apps Challenge, which is an Innovation Incubator program that allows NASA to test out new concepts and learn how citizens engage with NASA's open data.
Data Science is the new black! However, becoming a data scientist requires knowledges in various areas. This slide discuss what one should learn to become a data scientist.
How large-scale image analytics (near-real time analysis of satellite images, machine learning) could help (re-)insurer anticipate natural catastrophes and estimate damages more precisely
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
Enterprises across all sectors have invested heavily in big data infrastructure (Hadoop, Impala, Spark, Kafka, etc.) to turn data into insights into business value. Clusters are getting bigger, more complex and employing more and more data scientists and engineers. As a result, it is increasingly challenging for Data Ops teams to operate and maintain these clusters to meet business requirements and performance SLAs. For instance, a single SQL query may fail or take a long time to complete for various reasons, such as SQL-level inefficiencies, data skew, missing and stale statistics, pool-level resource configurations, such that a resource-hogging query could impact the entire application stack on that cluster. A critical capability to scale application performance is to do cluster-wide tuning. Examples include: tune the default application configurations so that all applications would benefit from that change, tune the pool-level resource allocations, identify wide-impact issues like slow nodes and too many small files, and many others. Cluster-level tuning requires considering more factors, and has a risk of significantly worsening cluster performance; however, it is often done via trial and error with educated guesswork, if attempted at all. We employ machine learning and AI techniques to make cluster-level tuning easier, more data-driven, and more accurate. This talk will describe our methodology to learn from various sources of data such as the workload, the cluster and pool resources, metastore, etc., and provide recommendations for cluster defaults for application and pool resource configurations. We will also present a case study where a customer applied unravel tuning recommendations and achieved 114% increase in the number of applications running per day while using 47% fewer vCore-Hours and 15% fewer containers.
Speaker: Eric Chu
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Examples of Applied Semantic Technologies to Solve Variety Challenge of Big Data: Application of Semantic Sensor Network
(SSN) Ontology
Pramod Anantharam - Kno.e.sis
AGENDA
* Demo 1: Use Amazon Rekognition API connected to a camera for near real-time image analyses.
* Demo 2: Install and extend Google TensorFlow to retrain a classifier to classify custom images.
* Demo 3: Use AutoML.
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...European Data Forum
Selected Talk by Allan Hanbury, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Architecture patterns using In-memory data systemsemmanuelbernard
You know about in-memory data systems like Redis, Infinispan, etc. But while they can do lots of things, it's hard to know when to use them.
This presentation is here to describe when to use them and in which circumstance.
Disasters Happen. We need to manage them to minimize the loss to life and property. Disaster management has been received much attention, but has not been touched much by the latest technology. This paper presents an approach to manage disasters using latest and popular technology. We are interested in building a community of researchers who are interested in developing such tools.
The Critical Role of Spatial Data in Today's Data EcosystemSafe Software
In today's data-driven landscape, integrating spatial data is becoming increasingly crucial for organizations aiming to harness the full potential of their data. Spatial data offers unique insights based on location, making it a fundamental component for addressing various challenges across different sectors, including urban planning, environmental sustainability, public health, and logistics.
Our webinar delves into the indispensable role of spatial data in data management and analysis. We'll showcase how omitting spatial data from your data strategy not only weakens your data infrastructure, but also limits the depth of your insights. Through real-world case studies, we'll highlight the transformative impact of spatial data, demonstrating its ability to uncover complex patterns, trends, and relationships.
Join us for this introductory-level webinar as we explore the critical importance of spatial data integration in driving strategic decision-making processes. By the end of the webinar, you'll gain a renewed perspective on how spatial data is essential for confronting and overcoming challenges across various domains.
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
In this webinar, Srinath Perera, director of research at WSO2, will discuss
Big data landscape: concepts, use cases, and technologies
Real-time analytics with WSO2 CEP
Batch analytics with WSO2 BAM
Combining batch and real-time analytics
Introducing WSO2 Machine Learner
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
CONFidence 2014: Davi Ottenheimer Protecting big data at scalePROIDEA
We are meant to measure and manage data with more precision than ever before using Big Data. But companies are getting Hadoopy often with little or no consideration of security. Are we taking on too much risk too fast? This session explains how best to handle the looming Big Data risk in any environment. Better predictions and more intelligent decisions are expected from our biggest data sets, yet do we really trust systems we secure the least? And do we really know why "learning" machines continue to make amusing and sometimes tragic mistakes? Infosec is in this game but with Big Data we appear to be waiting on the sidelines. What have we done about emerging vulnerabilities and threats to Hadoop as it leaves many of our traditional data paradigms behind? This presentation, based on the new book "Realities of Big Data Security" takes the audience through an overview of the hardest big data protection problem areas ahead and into our best solutions for the elephantine challenges here today.
A presentation pertaining to the integration of real-time data to the cloud with significant potential in the areas of Industrial IT,Real-time sensor information processing and Smart grids applied to various vertical industries. This is related to my blog post at www.cloudshoring.in
CEP: Event-Decision Architecture for PredictiveBusiness, July 2006Tim Bass
CEP: Event-Decision Architecture for PredictiveBusiness, Centre for Strategic Infocomm Technologies (CSIT), Singapore July 26, 2006, Tim Bass, CISSP, Principal Global Architect, Director, TIBCO Software Inc.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
1. UCIRVINE
Donald Bren School of Information and Computer Sciences
Siripen Pongpaichet
PhD Candidate,
Academic Advisor Prof. Ramesh Jain
Contact:
spongpai@uci.edu
Interest:
complex event stream processing,
multimedia information system,
large scale data management,
having fun doing research
2. Fundamental Problem
Web 1.0
Connecting People to Documents
Web 2.0
Connecting People to People
“Social Life Network”
Connecting Needs to Resources
Effectively, Efficiently, and Promptly
In given situations.
4. EventShop : Global Situation Detection
Situation
Recognition
Evolving Global Situation
Personal
Situation
Recognition
Personal EventShop
Evolving Personal Situation
Need- Resource Matcher
Recommendation
Engine
PersonaDatabase
Resources
Needs
Data
Ingestion
Wearable Sensors
Calendar
Location….
DataSources
….
Data
Ingestion
and
aggregation
Database Systems
Satellite
Environmental
Sensor Devices
Social Network
Internet of Things
Actionable Information
5. Big Challenges
• Data Ingestion to efficiently extract data from
the Web and make them available for later
computation is not-trivial.
• Stream Processing Engine to bridge the
semantic gap between high level concept of
situations and low level data streams.
• Situation Recognition as the next step in
concept recognition.
6. History of EventShop
• Building as part of SLN framework
• Environment and visualization tool for analyzing
heterogeneous data streams in macro scale
• Help non (CS) technical experts in various domains to easily
conduct experiments for detecting real-world situations
• Representing geo-spatial data in grid structure called E-mage
• Generic set of operators for detecting situations
• Pioneers: Vivek Singh (Rutgers University), Mingyan Gao
(Google), Ish Rishabh (Live Nation Entertainment)
6/6/2016 7
7. EventShop UI
11/13/2013 8
Example Notification / Alerts:
You are currently in the area where there is a high chance of flooding,
these are available shelters within 10 miles around you.
Space
Time Situation
Resources
People
8. OutputIngestor
Data Source
Parser
Data Adapter
Emage
Generator
(+resolution mapper)
Processing
EvShop Storage
Query
Parser
Query
Rewriter
Event Stream Processing
Executor
Action Parser
Register Data Source Register Continuous Query
Situation
Emage
Visualization
(e.g., Sticker
from NICT)
Actuator
Communication
Action Control
Event Property &
Other Information
(e.g., spatio-temporal
pattern)
ᴨ
ᴨ
µ
Data Access Manager
Live Stream
Archived Stream
Situation Stream
EventShop
Architecture
6/6/2016
Physical
Data Source
(e.g., sensor
streams, geo-image
streams)
Logical Data Source
(e.g., preprocessing
data streams, social
media streams)
Raw Event
EventWarehouse
NICT - Japan
9. • STT Observation is represented as:
STT = <latitude,longitude,timeStamp,theme,value>
Point(40,-76), TimeStamp(12-12-12 12:00:00PT), Shelter-Availability,
1600
• A flow of STTs becomes a STT Stream:
STT Stream = {STT0, ..., STTi, ...}
• E-mage is represented as:
E-mage = <SW,NE,latUnit,longUnit,TimeStamp,Theme,2D Grid>
SW(40,-125), NE(50,-115), 0.1latUnit, 0.1longUnit, TimeStamp(12-12-12
12:00:00PT), Shelter-Availability, [0,0,0, 1000, 2000, …]
• A flow of E-mages forms an E-mage Stream:
E-mage Stream = {E-mage0, ..., E-magei, ...}
• The cell together with STT information is called stel (spatio-temporal element),
stel = <SW,NE,latUnit,longUnit,timeStamp,theme,value>
EventShop Data Representation
11. Input: EvWH
High change
PM2.5 Event
Input: Twitter
Allergy Event
Input: AirNow
PM2.5 Level
Input: AirNow
Air Quality Index
Raw
Allergy
Tweets
Count
#of
Tweets
PM2.5
Emage
AQI
Emage
Processing
C
A
S
Output“Sticker” Allergy
Risk Level
Interactive MAP
Alert Message
via CPCC Apps
Email
Notification Situation
PM2.5
Change
Event
Properties
Segmentation: Threshold
Average
N Normalization N N
Correlation
Requirement of an unified Event Model
by UCI/NICT 14
14. Multi-Spatio-Temporal
Bounding Boxes and Granularities
• “Pyramid of E-mage” resolution
is introduced to represent the
real world in E-mage at different
(zoom) levels.
• Each Stel (a pixel in the E-mage)
represents a single fixed ground
location.
• Precision vs Computational Cost
15. Rasterization and Error Propagation
• Data Error Factors:
– Uncertainty of data stream
– Data loss during data aggregation
– Uncertainty during data conversion
– Data error during data conversion
• To design the situation recognition model, we
need to find the new cost evaluation method
that will consider both data accuracy and
computational cost.
16. Enrich Personalized Asthma Risk
• Predict air quality at air quality measuring
sites.
• Interpolate air quality at the locations not
covered by measuring sites.
• Predict personalized asthma risk by using
EventShop and Personal EventShop.
18. EventShop : Global Situation Detection
Situation
Recognition
Evolving Global Situation
Personal
Situation
Recognition
Personal EventShop
Evolving Personal Situation
Need- Resource Matcher
Recommendation
Engine
PersonaDatabase
Resources
Needs
Data
Ingestion
Wearable Sensors
Calendar
Location….
DataSources
….
Data
Ingestion
and
aggregation
Database Systems
Satellite
Environmental
Sensor Devices
Social Network
Internet of Things
Actionable Information
19. 6/6/2016 22
Calendar PESi
FMB (Individual’s Feeling)
Accelerometer
Location
Fitness Data
(Nike, Fitbit) Data
Ingestion &
Aggregation
Heart Rate
Location (Move)
Food Log
FMB
(People’s Feeling, Location)
ESOzone
CO2
SO2
PM 2.5
Pollen (Tree, Grass)
Air Quality Index
Data
Ingestion &
Aggregation
Social Media
(News, Tweets)
Weather
Macro
Situation Recognition
Predictive Analytics
Personal
Situation Recognition
Persona
Asthma Allergy App Server
Data Collection
MacroSituationPersonalSituation
Need and Resources
Recommendation
20. More +++
• Website
– http://eventshop:8004/sln
• Demo
– http://auge.ics.uci.edu/eventshop
• Open Source
– https://github.com/eventshop
• Collaborations
Editor's Notes
During the first-generation of the Web (Web 1.0) in 1990s, the focus was primarily on building the Web, and making it accessible by connecting people to online documents. In the the second-generation (Web 2.0) in 2000s, the growth of social networking sites, wikis, communication tools and folksonomies brought a new experience to our society, by connecting people to people. In addition, the emergence of the mobile Internet and mobile devices was a significant platform driving the adoption and growth of the Web. Effectively, the Web became a universal medium for data, information, and knowledge exchange. In the third-generation Web (Web 3.0), the innovation shifted toward upgrading the back-end Web infrastructure level making the Web more connected, more exposed, and more intelligent. The Web is transformed from a network of separately siloed applications and content repositories to a more seamless and interoperable as a whole. The Web now can be used to establish a new network called “Social Life Networks (SLN)” [11] by connecting people to real-world (life) resources for decision making at both individual and societal levels.
Mashing-up is a website or application that combines content from more than one source into an integrated experience
The data used in a mash-up is usually accessed via an API. The goal is to bring together in a single place data sources that tend live in their won data silos.
Only visual integration, and do not provide any sophisticated analysis capabilities.
Mobile apps that allow users to contribute to the society and share information to the love one in their network during the crisis situation
Life 360 let family set up a private network, then with a click of a button, they can let each other know where they are and if they're safe.
You can also enable background tracking so everyone in your private network can continuously share their locations with one another. The app also has a panic alert feature you can activate to immediately contact family members via text, email and a voice call to give your location at the moment you need help. There are also options for regular feature phones. For family members without a phone, there is an additional GPS device that can be provided for a fee.
SOS+
Waze: crowd-sourcing which allow user to report the current situation as you can see these type of data include both space time and theme
None of them can really connect people to the real world resources based on detected situation. It is time consuming by manually browsing on the web.
The comprehensive development tools and computational frameworks for effectively combining and processing these available heterogeneous streams are lacking
Bring Predictive here.
The data on the Web are not only generated in different media format (e.g., KML, JSON, image, table, and sensor signal), but the properties of them are also very different (e.g., measuring weather, and traffic).
Given the geo-spatial continuity, we believe that a spatial grid structure is naturally suitable for
representing various geo-spatial data, where each cell of the grid stores value of observations at
the corresponding geo-location and in turn represents evolving situation at a location in space.
We adopt the grid structure, and call it E-mage (an event data based analog of image) [19]. The
Emage Generator: Transform data from Data Adapter into Emage representation and is responsible for both making this data directly available to the executor, as well as writing it to the disk recent buffer, and emage resolution mapper,
The queries run in the executor and can access the live data directly from emage generator and the historical data from the disk.
Data access method is used to handle disk overload – data reduction/ user define function etc. and create Emage stream from disk. In addition, other information such as spatial and temporal pattern, and other properties
Query rewriter -> source selection
In the physical sensor networks, sensors are built to observe the real world environment; for example, space satellite, remote sensing, laser scanning, acoustic sensing, motion sensing and camera sensing. Most of the information is time series of measurements. A sensor reports a measurement over a given time period, while its coverage area is often fixed and promoted to the metadata. The measurement area can be represented in variety of GIS structures including point (latitude, longitude coordinate), vector polygon (region), vector line (arc), and raster (grid) areas. In actuated network, sensors report data only when they have been triggered or detect an event.
In the logical sensor networks, geospatial data are generated from the cyber world to represent events in the real world. The data are reported mostly by human via variety types of service such as location based service, social network sensing (e.g., Twitter, Facebook, Flickr), statistical reports, and news. Since these data are naturally available in unstructured format and could have significant noise and missing data, it is nontrivial how to extract meaningful information from them. Many researchers have studied and contributed into this aspect including data mining, entity extraction, topic discovery, and sentiment analysis.
Accessing External Data
Accessing Internal Data (via EvS Internal Storage)
Building block of the 2D grid in an E-mage is a single cell.