Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present a novel training data creation process for training sequence labeling models. Our automatic training data creation process utilizes instance level domain knowledge (e.g., locations in a city, possible event terms). We compare this automated annotation process to a state-of-the-art tool that needs manually created training data and show that it has comparable performance in annotation tasks. An aggregation algorithm is then presented for event extraction from annotated text. We carry out a comprehensive evaluation of the event annotation and event extraction on a real-world dataset consisting of event reports and tweets collected over four months from San Francisco Bay Area. The evaluation results are promising and provide insights into the utility of social stream for extracting city events.
- The document describes a method for understanding city traffic dynamics by utilizing sensor data that measures average speed and link travel time, as well as textual data from tweets and official traffic reports.
- It builds statistical models to learn normal traffic patterns from historical sensor data and identifies anomalies, then correlates anomalies with relevant traffic events extracted from tweets and reports.
- The method was evaluated on data collected for the San Francisco Bay Area, and it was able to scale to large real-world datasets by exploiting the problem structure and using Apache Spark for distributed processing. Events extracted from social media provided complementary information to sensor data for explaining traffic anomalies.
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
1) The document discusses a semantics-based approach to machine perception that uses semantic web technologies to derive abstractions from sensor data using background knowledge on the web.
2) It addresses three primary issues: annotation of sensor data, developing a semantic sensor web, and enabling semantic perception intelligence at the edge on resource-constrained devices.
3) The approach represents background knowledge and sensor observations using ontologies, and uses deductive and abductive reasoning over these representations to interpret sensor data at multiple levels of abstraction.
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013
Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data
Related paper at: http://www.knoesis.org/library/resource.php?id=1903
Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Amit Sheth
Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth
Video of the talk at: http://youtu.be/2991W7OBLqU
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
Transforming big data into smart data involves deriving value from harnessing the volume, variety, and velocity of big data using semantics and the semantic web. This allows making sense of big data by providing actionable information that improves decision making. Examples discussed include a healthcare application called kHealth that uses personal sensor data along with population level data to provide personalized and timely health recommendations and interventions for conditions like asthma.
- The document describes a method for understanding city traffic dynamics by utilizing sensor data that measures average speed and link travel time, as well as textual data from tweets and official traffic reports.
- It builds statistical models to learn normal traffic patterns from historical sensor data and identifies anomalies, then correlates anomalies with relevant traffic events extracted from tweets and reports.
- The method was evaluated on data collected for the San Francisco Bay Area, and it was able to scale to large real-world datasets by exploiting the problem structure and using Apache Spark for distributed processing. Events extracted from social media provided complementary information to sensor data for explaining traffic anomalies.
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
1) The document discusses a semantics-based approach to machine perception that uses semantic web technologies to derive abstractions from sensor data using background knowledge on the web.
2) It addresses three primary issues: annotation of sensor data, developing a semantic sensor web, and enabling semantic perception intelligence at the edge on resource-constrained devices.
3) The approach represents background knowledge and sensor observations using ontologies, and uses deductive and abductive reasoning over these representations to interpret sensor data at multiple levels of abstraction.
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013
Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data
Related paper at: http://www.knoesis.org/library/resource.php?id=1903
Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Amit Sheth
Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth
Video of the talk at: http://youtu.be/2991W7OBLqU
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
Transforming big data into smart data involves deriving value from harnessing the volume, variety, and velocity of big data using semantics and the semantic web. This allows making sense of big data by providing actionable information that improves decision making. Examples discussed include a healthcare application called kHealth that uses personal sensor data along with population level data to provide personalized and timely health recommendations and interventions for conditions like asthma.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Presented at the Panel on
Sensor, Data, Analytics and Integration in Advanced Manufacturing, at the Connected Manufacturing track of Bosch-USA organized "Leveraging Public-Private Partnerships for Regional Growth Summit". Panel statement: Sensors, data and analytics are the core of any smart manufacturing system. What are the main challenges to create actionable outputs, replicate systems and scale efficiency gains across industries?
Moderator: Thomas Stiedl, Bosch
Panelists:
1. Amit Sheth, Wright State University
2. Howie Choset, Carnegie Melon University
3. Nagi Gebraeel, Georgia Institute of Technology
4. Brian Anthony, Massachusetts Institute of Technology
5. Yarom Polosky, Oak Ridget National Laboratory
For in-depth look:
Smart IoT: IoT as a human agent, human extension, and human complement
http://amitsheth.blogspot.com/2015/03/smart-iot-iot-as-human-agent-human.html
Semantic Gateway: http://knoesis.org/library/resource.php?id=2154
SSN Ontology: http://knoesis.org/library/resource.php?id=1659
Applications of Multimodal Physical (IoT), Cyber and Social Data for Reliable and Actionable Insights: http://knoesis.org/library/resource.php?id=2018
Smart Data: Transforming Big Data into Smart Data...: http://wiki.knoesis.org/index.php/Smart_Data
Historic use of the term Smart Data (2004): http://www.scribd.com/doc/186588820
NG2S: A Study of Pro-Environmental Tipping Point via ABMsKan Yuenyong
A study of tipping point: much less is known about the most efficient ways to reach such transitions or how self-reinforcing systemic transformations might be instigated through policy. We employ an agent-based model to study the emergence of social tipping points through various feedback loops that have been previously identified to constitute an ecological approach to human behavior. Our model suggests that even a linear introduction of pro-environmental affordances (action opportunities) to a social system can have non-linear positive effects on the emergence of collective pro-environmental behavior patterns.
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
1) Amit Sheth presented on how knowledge can help machines better understand big data.
2) He discussed challenges like understanding implicit entities, analyzing drug abuse forums, and understanding city traffic using sensors and text.
3) Sheth argued that knowledge graphs and ontologies can help interpret diverse data types and provide contextual understanding to help solve real-world problems.
This is a brief a brief review of current multi-disciplinary and collaborative projects at Kno.e.sis led by Prof. Amit Sheth. They cover research in big social data, IoT, semantic web, semantic sensor web, health informatics, personalized digital health, social data for social good, smart city, crisis informatics, digital data for material genome initiative, etc. Dec 2015 edition.
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Big Data Spain
Data from the media allows to enrich our analysis and to incorporate these insights into our models to capture nonlinear behaviour and feedback effects of human interaction, assessing their global impact on the society and enabling us to construct fragility indices and early warning systems.
https://www.bigdataspain.org/2017/talk/monitoring-world-geopolitics-through-big-data
Big Data Spain 2017
16th - 17th November Kinépolis Madrid
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Amit Sheth
Keynote at Web Intelligence 2017: http://webintelligence2017.com/program/keynotes/
Video: https://youtu.be/EIbhcqakgvA Paper: http://knoesis.org/node/2698
Abstract: While Bill Gates, Stephen Hawking, Elon Musk, Peter Thiel, and others engage in OpenAI discussions of whether or not AI, robots, and machines will replace humans, proponents of human-centric computing continue to extend work in which humans and machine partner in contextualized and personalized processing of multimodal data to derive actionable information.
In this talk, we discuss how maturing towards the emerging paradigms of semantic computing (SC), cognitive computing (CC), and perceptual computing (PC) provides a continuum through which to exploit the ever-increasing and growing diversity of data that could enhance people’s daily lives. SC and CC sift through raw data to personalize it according to context and individual users, creating abstractions that move the data closer to what humans can readily understand and apply in decision-making. PC, which interacts with the surrounding environment to collect data that is relevant and useful in understanding the outside world, is characterized by interpretative and exploratory activities that are supported by the use of prior/background knowledge. Using the examples of personalized digital health and a smart city, we will demonstrate how the trio of these computing paradigms form complementary capabilities that will enable the development of the next generation of intelligent systems. For background: http://bit.ly/PCSComputing
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Physical-Cyber-Social Computing involves the integration of observations from physical sensors, knowledge and experiences from cyber systems, and social interactions from people. This will allow machines to understand contexts, correlate multi-domain data, and provide personalized solutions by leveraging background knowledge spanning physical, cyber, and social domains. Semantic computing plays a key role in bridging differences between domains to derive insights. The vision is for systems that can proactively initiate information needs with minimal human involvement.
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
Opening talk at Singapore Symposium on Sentiment Analysis (S3A), February 6, 2015, Singapore. http://s3a.sentic.net/#s3a2015
Abstract
With the rapid rise in the popularity of social media, and near ubiquitous mobile access, the sharing of observations and opinions has become common-place. This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications -- be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries.
I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform.
I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris (http://twitris2.knoesis.org).
AI is the science and engineering of creating intelligent machines and software. It draws from fields like computer science, biology, psychology and linguistics. The goal is to develop systems that can perform tasks normally requiring human intelligence, like visual perception, decision making and language translation. Some key applications of AI include machine learning, expert systems, natural language processing and computer vision. As AI systems continue advancing, they are becoming better than humans at certain tasks like playing strategic games.
This document provides an introduction to big data, including:
- Big data is characterized by its volume, velocity, and variety, which makes it difficult to process using traditional databases and requires new technologies.
- Technologies like Hadoop, MongoDB, and cloud platforms from Google and Amazon can provide scalable storage and processing of big data.
- Examples of how big data is used include analyzing social media and search data to gain insights, enabling personalized experiences and targeted advertising.
- As data volumes continue growing exponentially from sources like sensors, simulations, and digital media, new tools and approaches are needed to effectively analyze and make sense of "big data".
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
This document summarizes a talk on how collective and augmented intelligence can help solve societal problems. It discusses how AI depends on human input, how collective intelligence benefits AI, and provides examples of using human computation and crowdsourcing to support disaster relief and conduct urban auditing. It also describes challenges in making crowdsourcing sustainable and assessing data quality, and emphasizes the need for iterative design of human-AI systems to bring together human, collective, and computational intelligence.
This document summarizes a lecture on network science given by Madhav Marathe at Lawrence Livermore National Laboratory in December 2010. It provides an overview of network science, including definitions of networks and their unique properties. It also discusses mathematical and computational approaches to modeling complex networks and applications to infrastructure planning, energy systems, and national security. The lecture acknowledges prior work that contributed to its material from various researchers and textbooks.
This document contains the resume of Yu-Li Liang. It lists his education including a PhD in Computer Science from University of Colorado Boulder and MS in Biomedical Engineering from Drexel University. His experience includes positions as a Data Scientist at Rubicon Project Inc. and a Postdoctoral Scholar at California Institute of Technology. His skills include Python, C++, machine learning, computer vision, and image processing. Selected projects involve tasks such as detecting obscene content in online videos, evaluating skin conditions from photos, and geographical feature detection in satellite images.
The document discusses integrating sensor and social data to understand city events. It describes collecting data from multiple sources, including sensors and social media. Statistical models are used to analyze the sensor data and identify anomalies, which are then correlated with events extracted from social media using spatial and temporal proximity. The approach is evaluated on traffic data from San Francisco, integrating data from traffic sensors and Twitter to extract and corroborate traffic events.
I summarize requirements for an "Open Analytics Environment" (aka "the Cauldron"), and some work being performed at the University of Chicago and Argonne National Laboratory towards its realization.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Presented at the Panel on
Sensor, Data, Analytics and Integration in Advanced Manufacturing, at the Connected Manufacturing track of Bosch-USA organized "Leveraging Public-Private Partnerships for Regional Growth Summit". Panel statement: Sensors, data and analytics are the core of any smart manufacturing system. What are the main challenges to create actionable outputs, replicate systems and scale efficiency gains across industries?
Moderator: Thomas Stiedl, Bosch
Panelists:
1. Amit Sheth, Wright State University
2. Howie Choset, Carnegie Melon University
3. Nagi Gebraeel, Georgia Institute of Technology
4. Brian Anthony, Massachusetts Institute of Technology
5. Yarom Polosky, Oak Ridget National Laboratory
For in-depth look:
Smart IoT: IoT as a human agent, human extension, and human complement
http://amitsheth.blogspot.com/2015/03/smart-iot-iot-as-human-agent-human.html
Semantic Gateway: http://knoesis.org/library/resource.php?id=2154
SSN Ontology: http://knoesis.org/library/resource.php?id=1659
Applications of Multimodal Physical (IoT), Cyber and Social Data for Reliable and Actionable Insights: http://knoesis.org/library/resource.php?id=2018
Smart Data: Transforming Big Data into Smart Data...: http://wiki.knoesis.org/index.php/Smart_Data
Historic use of the term Smart Data (2004): http://www.scribd.com/doc/186588820
NG2S: A Study of Pro-Environmental Tipping Point via ABMsKan Yuenyong
A study of tipping point: much less is known about the most efficient ways to reach such transitions or how self-reinforcing systemic transformations might be instigated through policy. We employ an agent-based model to study the emergence of social tipping points through various feedback loops that have been previously identified to constitute an ecological approach to human behavior. Our model suggests that even a linear introduction of pro-environmental affordances (action opportunities) to a social system can have non-linear positive effects on the emergence of collective pro-environmental behavior patterns.
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
1) Amit Sheth presented on how knowledge can help machines better understand big data.
2) He discussed challenges like understanding implicit entities, analyzing drug abuse forums, and understanding city traffic using sensors and text.
3) Sheth argued that knowledge graphs and ontologies can help interpret diverse data types and provide contextual understanding to help solve real-world problems.
This is a brief a brief review of current multi-disciplinary and collaborative projects at Kno.e.sis led by Prof. Amit Sheth. They cover research in big social data, IoT, semantic web, semantic sensor web, health informatics, personalized digital health, social data for social good, smart city, crisis informatics, digital data for material genome initiative, etc. Dec 2015 edition.
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Big Data Spain
Data from the media allows to enrich our analysis and to incorporate these insights into our models to capture nonlinear behaviour and feedback effects of human interaction, assessing their global impact on the society and enabling us to construct fragility indices and early warning systems.
https://www.bigdataspain.org/2017/talk/monitoring-world-geopolitics-through-big-data
Big Data Spain 2017
16th - 17th November Kinépolis Madrid
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Amit Sheth
Keynote at Web Intelligence 2017: http://webintelligence2017.com/program/keynotes/
Video: https://youtu.be/EIbhcqakgvA Paper: http://knoesis.org/node/2698
Abstract: While Bill Gates, Stephen Hawking, Elon Musk, Peter Thiel, and others engage in OpenAI discussions of whether or not AI, robots, and machines will replace humans, proponents of human-centric computing continue to extend work in which humans and machine partner in contextualized and personalized processing of multimodal data to derive actionable information.
In this talk, we discuss how maturing towards the emerging paradigms of semantic computing (SC), cognitive computing (CC), and perceptual computing (PC) provides a continuum through which to exploit the ever-increasing and growing diversity of data that could enhance people’s daily lives. SC and CC sift through raw data to personalize it according to context and individual users, creating abstractions that move the data closer to what humans can readily understand and apply in decision-making. PC, which interacts with the surrounding environment to collect data that is relevant and useful in understanding the outside world, is characterized by interpretative and exploratory activities that are supported by the use of prior/background knowledge. Using the examples of personalized digital health and a smart city, we will demonstrate how the trio of these computing paradigms form complementary capabilities that will enable the development of the next generation of intelligent systems. For background: http://bit.ly/PCSComputing
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Physical-Cyber-Social Computing involves the integration of observations from physical sensors, knowledge and experiences from cyber systems, and social interactions from people. This will allow machines to understand contexts, correlate multi-domain data, and provide personalized solutions by leveraging background knowledge spanning physical, cyber, and social domains. Semantic computing plays a key role in bridging differences between domains to derive insights. The vision is for systems that can proactively initiate information needs with minimal human involvement.
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
Opening talk at Singapore Symposium on Sentiment Analysis (S3A), February 6, 2015, Singapore. http://s3a.sentic.net/#s3a2015
Abstract
With the rapid rise in the popularity of social media, and near ubiquitous mobile access, the sharing of observations and opinions has become common-place. This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications -- be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries.
I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform.
I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris (http://twitris2.knoesis.org).
AI is the science and engineering of creating intelligent machines and software. It draws from fields like computer science, biology, psychology and linguistics. The goal is to develop systems that can perform tasks normally requiring human intelligence, like visual perception, decision making and language translation. Some key applications of AI include machine learning, expert systems, natural language processing and computer vision. As AI systems continue advancing, they are becoming better than humans at certain tasks like playing strategic games.
This document provides an introduction to big data, including:
- Big data is characterized by its volume, velocity, and variety, which makes it difficult to process using traditional databases and requires new technologies.
- Technologies like Hadoop, MongoDB, and cloud platforms from Google and Amazon can provide scalable storage and processing of big data.
- Examples of how big data is used include analyzing social media and search data to gain insights, enabling personalized experiences and targeted advertising.
- As data volumes continue growing exponentially from sources like sensors, simulations, and digital media, new tools and approaches are needed to effectively analyze and make sense of "big data".
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
This document summarizes a talk on how collective and augmented intelligence can help solve societal problems. It discusses how AI depends on human input, how collective intelligence benefits AI, and provides examples of using human computation and crowdsourcing to support disaster relief and conduct urban auditing. It also describes challenges in making crowdsourcing sustainable and assessing data quality, and emphasizes the need for iterative design of human-AI systems to bring together human, collective, and computational intelligence.
This document summarizes a lecture on network science given by Madhav Marathe at Lawrence Livermore National Laboratory in December 2010. It provides an overview of network science, including definitions of networks and their unique properties. It also discusses mathematical and computational approaches to modeling complex networks and applications to infrastructure planning, energy systems, and national security. The lecture acknowledges prior work that contributed to its material from various researchers and textbooks.
This document contains the resume of Yu-Li Liang. It lists his education including a PhD in Computer Science from University of Colorado Boulder and MS in Biomedical Engineering from Drexel University. His experience includes positions as a Data Scientist at Rubicon Project Inc. and a Postdoctoral Scholar at California Institute of Technology. His skills include Python, C++, machine learning, computer vision, and image processing. Selected projects involve tasks such as detecting obscene content in online videos, evaluating skin conditions from photos, and geographical feature detection in satellite images.
The document discusses integrating sensor and social data to understand city events. It describes collecting data from multiple sources, including sensors and social media. Statistical models are used to analyze the sensor data and identify anomalies, which are then correlated with events extracted from social media using spatial and temporal proximity. The approach is evaluated on traffic data from San Francisco, integrating data from traffic sensors and Twitter to extract and corroborate traffic events.
I summarize requirements for an "Open Analytics Environment" (aka "the Cauldron"), and some work being performed at the University of Chicago and Argonne National Laboratory towards its realization.
Event Processing Using Semantic Web TechnologiesMikko Rinne
The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
Using topological analysis to support event guided exploration in urban dataivaderivader
1) The paper proposes a technique to efficiently analyze large spatio-temporal urban data sets and automatically detect meaningful events using topological analysis.
2) It designs an indexing scheme to group similar event patterns across time slices and a visual interface to guide exploratory analysis.
3) Case studies on NYC taxi and subway data demonstrate how the technique can identify irregular traffic events like parades as well as regular hot spots and delay patterns to support analysis of these urban systems.
The document discusses stream reasoning, which combines reasoning techniques with data streams. It presents a conceptual architecture for stream reasoning that processes streamed input through selection, abstraction, reasoning, and decision steps. Key aspects are RDF streams as a new data format, and Continuous SPARQL (C-SPARQL) for continuously querying RDF data streams.
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Realtime Big Data Analytics for Event Detection in HighwaysYork University
This document introduces a real-time big data analytics platform for event detection and classification in highways. The platform consists of data, analytics, and management components. It leverages cloud computing for reliability, scalability, and adaptability. The platform can perform both real-time and retrospective analytics. It is demonstrated for detecting events on major highways in the Greater Toronto Area. The platform uses a cluster-based architecture and Spark for streaming analytics. Algorithms are developed to model event signatures and detect events from sensor data in real-time.
Oplægget blev holdt ved InfinIT-arrangementet Big Data og data-intensive systemer i Danmark, der blev af holdt en 15. januar 2014. Læs mere om arrangementet her: http://infinit.dk/dk/arrangementer/tidligere_arrangementer/big_data_i_danmark.htm
Project on nypd accident analysis using hadoop environmentSiddharth Chaudhary
For this project NYC motor-vehicle-collisions dataset is processed in Hadoop ecosystem using map reduce, Pig script and Hive query for analysis and visualization.
This document discusses geospatial analytics and spatial capabilities on big data systems. It covers analyzing movement data through techniques like trajectory analysis and discretization. It discusses operational requirements for analyzing telematics data at large scales. It proposes using Apache Spark and geospatial libraries on Hadoop for distributed processing and storage. Key analytical challenges discussed include snap-to-road matching, trajectory clustering, and traffic event detection. Machine learning techniques like kernel methods and sequence analysis are proposed for solving these challenges.
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
This document discusses streaming analytics and the Internet of Things. It describes some challenges of IoT such as data volume, processing speed requirements, and data storage needs. It then discusses WSO2's IoT and analytics platforms, which can perform real-time, batch, and predictive analytics on streaming IoT and other data. The analytics platform uses Siddhi to analyze event streams using techniques like filtering, windowing, pattern detection, joins, and more. Real-time analytics examples include alerts, counting, correlation, and learning models.
This document proposes an 8-step framework for migrating Singapore government data to a linked data format. The framework includes steps for specification identification, data set analysis, object modeling, ontology modeling, URI naming, RDF creation, external linking, and dataset publication. It evaluates existing linked data implementations and tools to recommend approaches for each step. The goal is to standardize terms across agencies and make government data more conveniently reusable and joinable through linked data principles.
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
Invited lecture at the University of Trieste.
The lecture covers (briefly) the data streaming processing paradigm, research challenges related to distributed, parallel and deterministic streaming analysis and the research of the DCS (Distributed Computing and Systems) groups at Chalmers University of Technology.
Toward Semantic Data Stream - Technologies and ApplicationsRaja Chiky
Massive data stream processing is a scientific challenge and an industrial concern. But with the current volumes of data streams , their number and variety, current techniques are not able to meet the requirements of applications. The Semantic Web tools , through the RDF for example, allow to address the problem of heterogeneous data. Thus, the data stream are converted to semantic data stream by using RDF triples extended with a timestamp. To be able to query , filter, or reason semantic data streams, the query language SPARQL must be extended to include concepts such as windowing , based on what has been done in Data Stream Management Systems. In this talk, I will present recent work on the semantic data stream management , particularly extensions made on SPARQL language and associated benchmarks.
The document proposes an approach for efficient semantically enriched complex event processing and pattern matching. It introduces a temporally annotated RDF named graph data model to represent event streams. A query model is presented that decomposes queries into subqueries over event patterns. These subqueries are rewritten and processed in parallel and distributed manner. The proposed approach aims to integrate stream reasoning and complex event processing by supporting new operators for RDF graph patterns and allowing the use of techniques like NFA and EDG for pattern matching.
Connecting the Next Billion Devices to the Internet - Standards and ProtocolsSteve Ray
This document discusses standards and protocols that enable interoperability for connecting devices to the internet of things. It notes that while there are many choices for communication protocols, interoperability requires agreement on information exchange standards. The talk will cover existing information exchange standards and technologies that can be used with the standards. It emphasizes that every API or protocol standard should have an accompanying information model that defines relevant concepts to provide necessary context and metadata. Semantic web technologies like ontologies, RDF, and SPARQL can help integrate data from different sources by mapping terms and relationships between standards. This approach supports interoperability across distributed, loosely coupled smart city systems.
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
This presentation is related to my Master's Thesis at Ozyegin University. We focused on data mining on the real streaming (not binary) data. The most popular data mining algorithm, Association Rule Mining (ARM), was performed during this study from scratch. At the end of the thesis, we published four national/international papers in the different conferences such as Cloud Computing and Big Data.
This document discusses EventShop, a system for real-time macro situation recognition from heterogeneous data streams. EventShop ingests data from sources like sensors, social media, satellites and uses it to detect situations. It represents spatial data as "emages" on a grid and uses operators to detect situations. The document outlines EventShop's architecture, collaboration with NICT on tools like EventWarehouse and Sticker, work on multi-granularity emages and predictive analytics, and applications for social life networks.
Introduction to Data streaming - 05/12/2014Raja Chiky
Raja Chiky is an associate professor whose research interests include data stream mining, distributed architectures, and recommender systems. The document outlines data streaming concepts including what a data stream is, data stream management systems, and basic approximate algorithms used for processing massive, high-velocity data streams. It also discusses challenges in distributed systems and using semantic technologies for data streaming.
Similar to Extracting City Traffic Events from Social Streams (20)
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
Introduction to AI Safety (public presentation).pptx
Extracting City Traffic Events from Social Streams
1. Extracting City Events from Social Streams
Pramod Anantharam
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
http://www.ict-citypulse.eu/page/
Collaborator: Dr. Payam Barnaghi
Advisors: Prof. Amit Sheth, Prof. Krishnaprasad Thirunarayan
Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events
from Social Streams. ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI=10.1145/2717317
http://doi.acm.org/10.1145/2717317
http://wiki.knoesis.org/index.php/Citypulse
2. A Historical Perspective on Cities and its Inhabitants
“kings, emperors and other rulers benefited from being on the front lines with their
people when it came to making decisions.”1
1http://gicoaches.com/what-we-can-learn-from-kings-of-the-past-who-disguised-themselves-as-ordinary-men/
http://en.wikipedia.org/wiki/Qianlong_Emperor
Qianlong Emperor (8 October 1735 – 9 February 1796)
Qing Dynasty (1644–1912)
Disguised as a commoner, Qianlong visited
cities to understand a common man’s life
This is popularly known as “Management by
Walking Around” since the 1980’s
3. A Modern Perspective on Cities and its Inhabitants
City authorities, government and other humanitarian agencies are benefited from
being on the front lines with their people when it comes to making decisions.
We want to be connected to citizens to
understand and prioritize decisions
6. What are People Talking About City Infrastructure on Twitter?
7. • What are people talking about city
infrastructure on twitter?
• How do we extract city infrastructure related
events from twitter?
• How can we leverage event and location
knowledge bases for event extraction?
• How well can we extract city events?
Research Questions
8. Some Challenges in Extracting Events from Tweets
• No well accepted definition of ‘events related to a
city’
• Tweets are short (140 characters) and its informal
nature make it hard to analyze
– Entity, location, time, and type of an event
• Multiple reports of the same event and sparse
report of some events (biased sample)
– Numbers don’t necessarily indicate intensity
• Validation of the solution is hard due to the open
domain nature of the problem
9. Formal Text Informal Text
Closed Domain
Open Domain [Roitman et al. 2012][Kumaran and Allan 2004]
[Lampos and Cristianini 2012]
[Becker et al. 2011]
[Wang et al. 2012]
[Ritter et al. 2012]
Related Work on Event Extraction
10. • N-grams + Regression
– Text analysis to extract uni- and bi-grams (event markers)
– Feature selection to select best possible event markers
– Apply regression to estimate conditional probability P(Y|X) to enable
prediction where Y is the target (e.g., rainfall) and X is the input (e.g., traffic
jam event).
• Clustering
– Create event clusters incrementally over time
– Identify clusters of interest based on its relevance (manual inspection)
– Granularity remains at the tweet/cluster level (tweets are assigned to clusters
of interest)
• Sequence Labeling (CRFs)
– Text analysis to extract features such as named entities, POS1 tagging
– Each event indicator is modeled as a mixture of event types that are latent
variables
– Each type corresponds to a distribution over named entities n (labels assigned
to event types by manual inspection) and other features
Event Extraction -- Techniques
1Part Of Speech
11. • Event extraction should be open domain (no a
priori event types) preferably with event
metadata (e.g., event duration, impact).
• Incorporate background knowledge related to
city events e.g., 511.org hierarchy, SCRIBE
ontology, city location names.
• Assess the intensity of an event using content and
network cues.
• Be robust w.r.t. to noise, informal nature, and
variability of data.
City Event Extraction -- Desiderata
12. • N-grams + Regression
– Open domain: works best when there is a reference
corpus to extract n-grams
– Event metadata: cannot distinguish between entities
and hence hard to extract event metadata
– Background knowledge: incorporating domain
vocabulary (e.g., subsumption) is not natural
– Event intensity: regression maps the event indicators
to some quantified values
– Robustness: quite robust if there is a reference corpus
Techniques -- Desiderata
13. • Clustering
– Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
– Event metadata: too coarse grained (document level)
and event metadata extraction is not natural
– Background knowledge: incorporating domain
vocabulary is not natural
– Event intensity: not captured
– Robustness: quite robust for twitter data with enough
data for each cluster
Techniques -- Desiderata
14. • Sequence Labeling (CRFs)
– Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
– Event metadata: event metadata extraction is
captured naturally with the named entities
– Background knowledge: incorporating domain
vocabulary is quite natural
– Event intensity: part-of-speech tag may indirectly
capture intensity
– Robustness: with a deeper model for capturing
context, quite robust for twitter data
Techniques -- Desiderata
15. City Infrastructure
Tweets from a city
POS
Tagging
Hybrid NER+
Event term
extraction
Geohashing
Temporal
Estimation
Impact
Assessment
Event
Aggregation
OSM
Locations
SCRIBE
ontology
511.org hierarchy
City Event Extraction
City Event Extraction Solution Architecture
City Event Annotation
17. City Event Annotation – CRF Annotation Examples
Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION
Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O
#Manteca O accident. B-EVENT two O lanes O blocked B-EVENT on O Hwy O 99 O NB
O at O Austin O Rd O #traffic O http://t.co/YehsHpD7aC O
#Fontana O accident. B-EVENT three O lanes O blocked B-EVENT on O I-10 O WB O
between O Cherry B-LOCATION Ave I-LOCATION and O Etiwanda O Ave O in O
#Ontario O #LAtraffic O http://t.co/e2e6MW3d78 O
B-LOCATION
I-LOCATION
B-EVENT
I-EVENT
O
Tags used in our approach:
18. a) Space: events reported within a grid (gi ∈G
where G is a set of all grids in a city)at a certain
time are most likely reporting the same event
b) Time: events reported within a time ∆t in a grid
gi are most likely to be reporting the same event
c) Theme: events with similar entities within a grid
gi and time ∆t are most likely reporting the same
event
City Event Extraction -- Key Insights
We will utilize these principles in the event aggregation algorithm
19. 0.6 miles
Max-lat
Min-lat
Min-long
Max-long
0.38 miles
37.7545166015625, -122.40966796875
37.7490234375, -122.40966796875
37.7545166015625, -122.420654296875
37.7490234375, -122.420654296875
4
37.74933, -122.4106711
Hierarchical spatial structure of geohash for
representing locations with variable precision.
Here the location string is 5H34
0 1 2 3 4 5 6
7 8 9 B C D E
F G H I J K L
0 1
7
2 3 4
5 6 8 9
0 1 2 3 4
5 6 7
0 1 2
3 4 5
6 7 8
City Event Extraction – Geohashing
21. City Event Extraction
– Event Aggregation Algorithm
Event metadata inference
Spatial filtering
Grouping Events by types
22. • <traffic, 5889, 2013-10-22 19:24:39, 2013-10-23 18:54:08, 19>
• <concert, 5889, 2013-10-20 19:46:06, 2013-10-21 19:06:29, 35>
• <accident, 32400, 2013-10-20 19:51:10, 2013-10-21 15:53:08, 11>
• <parade, 8672, 2013-08-10 12:57:17, 2013-08-10 18:57:21, 11>
City Event Extraction – A Sample of Extracted Events
Location refers to the geohash number which is mapped to lat-long
23. • City Event Annotation
– Automated creation of training data
– Annotation task (our CRF model vs. baseline CRF model)
• City Event Extraction
– Use aggregation algorithm for event extraction
– Extracted events AND ground truth
• Dataset (Aug – Nov 2013) ~ 8 GB of data on disk
– Over 8 million tweets (extract events using Alg. 1 and 2)
– Over 162 million sensor data points (find delays by looking
at change in travel time for links serving as ground truth)
– 311 active events and 170 scheduled events (readily
available events as ground truth)
Evaluation
24. Evaluation – Automated Creation of Training Data (to train CRF model)
Evaluation over 500 randomly chosen tweets from around 8,000 annotated tweets
Aho-Corasick [Commentz-Walter 1979] string matching algorithm implemented by LingPipe [Alias-i 2008]
25. Evaluation – Annotation Task (our CRF model vs. baseline CRF model)
Baseline CRF Model
Our CRF Model
Baseline CRF model (trained on a huge manually
created data) works well on generic tasks.
Our CRF model trained on automatically generated
training data performs on par with the baseline.
Our CRF model does better on the event extraction
task due to the availability of event related
knowledge
26. Ground Truth Data (only incident reports) -- City Event Extraction
We have around 162 million data records from sensors monitoring over 3,700 links in San Franciso Bay Area
<link_id, link_speed, link_volume, link_travel_time,time_stamp> a data record
GREEN – Active Events
YELLOW – Scheduled Events
311 active events and 170 scheduled events
28. Evaluation – Extracted Events AND Ground Truth Verification
Evaluation Metric For Comparing Events with Ground Truth:
• Complementary Events
• Additional information
• e.g., slow traffic from sensor data and accident from textual data
• Corroborative Events
• Additional confidence
• e.g., accident event supporting a accident report from ground truth
• Timeliness
• Additional insight
• e.g., knowing poor visibility before formal report from ground truth
29. Evaluation – Extracted Events AND Ground Truth Verification
Complementary Events
Complementary Events
30. Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events
Corroborative Events
34. • People in a city indeed talk about various
infrastructure related specifically, traffic.
• City traffic related events can be extracted from
tweets using sequence labeling techniques and
spatial aggregation algorithms.
• Domain knowledge of events and locations can
be utilized to create large training datasets to
train sequence labeling algorithms.
• Traffic events extracted from twitter can be
complementary, corroborative, and timely
compared to formal reports of traffic events.
Conclusion
35. [Kumaran and Allan 2004] Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new
event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development
in information retrieval. ACM, 297–304.
[Lampos and Cristianini 2012] Vasileios Lampos and Nello Cristianini. 2012. Nowcasting events from the social web with
statistical learn- ing. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 (2012), 72.
[Roitman et al. 2012] Haggai Roitman, Jonathan Mamou, Sameep Mehta, Aharon Satt, and LV Subramaniam. 2012.
Harnessing the Crowds for smart city sensing. In Proceedings of the 1st international workshop on Multimodal crowd
sensing. ACM, 17–18.
[Ritter et al. 2012] Alan Ritter, Oren Etzioni, Sam Clark, and others. 2012. Open domain event extraction from twitter. In
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1104–
1112.
[Wang et al. 2012] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. 2012. Automatic crime prediction using
events extracted from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, 231–
238.
[Becker et al. 2011] Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event
Identification on Twitter.. In ICWSM.
[Alias-i 2008] Alias-i. 2008. LingPipe 4.1.0. (2008). http://alias-i.com/lingpipe
[Commentz-Walter 1979] Beate Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer.
References
Editor's Notes
Note Citizen Centered Smart City
Understanding Events in a City from Twitter Streams – hope this will be my direction
Citizens are central to a city, country or in the past kingdom
Since the beginning of civilizations (or settlements) around 8000 BC, people have moved toward living in ‘cities’
Kings and emperors realized the importance of understanding citizen moods, sentiments, and opinions in making decisions
Qianlong emperor is one such example
Concept applies to even today!
- Connection to people is the key! We always want to hear citizens talk about both good and bad in a city
- Good will help us know what works and bad will help us prioritize and work toward making it better
Social media such as twitter, FB, myspace, and many others gives direct access to what citizens think about a city
Best way to tap into the problems and challenges citizens face in a city
- Citizens need following services to help lead a healthy and prosperous life in a city
Data gathered in a city by various departments
Citizens reporting their observations of city infrastructure
You may ask do citizens really talk about city infrastructure?
Twitter as a source of real-time information
There are over 200 million users generating 500 million tweets / day
Twitter as a source of events in a city
Citizens use twitter to express their concerns of city infrastructure that impacts their life
There are some knowledge bases from IBM Smart Planet initiative that can help us for city events
[Kumaran and Allan 2004] - event detection from news articles using a combination of classification and NER techniques
[Lampos and Cristianini 2012] – they estimated rainfall in London using tweets & compare it with actual rainfall, influenza like illness and compare it with Health Protection Agency
[Roitman et al. 2012] – interesting piece of work on harnessing the cowd for city sensing, sensor fusion
[Ritter et al. 2012] – open domain event extraction from twitter (general events such as product release, TV shows, movie release)
[Wang et al. 2012] – predicts crime using a prediction model built by associating topics in tweets to ground truth data
[Becker et al. 2011] – distuinges between real-world events vs. Non-events tweets using clustering
Classification is another method – but it is not feasible for the open domain problem we are interested in.
CRFs – Conditional Random Fields
Sequence Labeling
Consider deeper features such as named entities, POS tags
Captures context of mention of entities within a tweet
Allows natural incorporation of domain knowledge
Intensity vs. density => tweets, page-rank analogous to importance of event, importance of events
Importance of event based on location + time
Contrasting density
Noise
Flood - people affected vs. people conversing
N-grams + Regression
No direct way to extract event metadata extraction
May need reference corpus for creating n-grams
Needs good quality tweets if no reference corpus
Clustering
Does not capture event metadata
Too coarse grained (tweet level)
May not be able to identify location, time etc.
CRF assigns a tag to each token
Global normalization is the argmax term
RHS is just a regression based implementation of linear chain (potentials defined only over adjacent tags) CRF
LingPipe implementation of CRF is used in our experiments
localized event detection strategy, city a composition of smaller geographical units
We call these geographical units as grids
Geohash provides us a way of compartmentalizing a city into uniquely addressable grids
Distance computed using the formula:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
d = R * c (where R is the radius of the Earth)
Found the box for the tweet!
37.7545166015625, -122.420654296875
37.7545166015625, -122.40966796875
37.7490234375, -122.40966796875
37.7490234375, -122.420654296875
Infer delays – we have sensor data including speed of vehicles and travel time. We will use this data for verifying the events we have extracted. We look for change (reduction/increase) in travel time for all the links in the vicinity of the event (we will vary the radius from 0.5 miles to 2 miles)
[Alias-i 2008] Alias-i. 2008. LingPipe 4.1.0. (2008). http://alias-i.com/lingpipe
[Commentz-Walter 1979] Beate Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer.
- The only place this annotation suffered is the I-EVENT since this is a stateless model
The record of 511.org may have its own timestamp which may be before tweets