Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

  • 12,061 views
Uploaded on

See instead more recent version (ICDE2014 keynote): http://j.mp/ICDE-key …

See instead more recent version (ICDE2014 keynote): http://j.mp/ICDE-key
A video of a version of this talk: http://youtu.be/8RhpFlfpJ-A


Amit Sheth, "Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web," keynote at the 21st Italian Symposium on Advanced Database Systems,
June 30 - July 03 2013, Roccella Jonica, Italy. Also invited talks given in Universities in Spain and Italy in June 2013.

Highlight: How to harness Smart Data that is actionable, from the Voluminous Big Data with Velocity and Variety-- using Semantics and the Semantic Web core to bring Human-Centric Computing in practice.

Abstract from: http://www.sebd2013.unirc.it/invitedSpeakers.html

Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics , which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.

Additional background at: http://knoesis.org/vision > SmartData and "Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications," http://www.knoesis.org/library/resource.php?id=1889 .

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
12,061
On Slideshare
0
From Embeds
0
Number of Embeds
50

Actions

Shares
Downloads
215
Comments
1
Likes
15

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  • http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  • Types of DataFormats of DataAlso talk about the increase in the platforms that helps generating these data
  • Example high velocity Big Data applications at work:financial services, stock brokerage, weather tracking, movies/entertainment and online retail.Fast data (rate at which data is coming: esp from mobile, social and sensor sources), Rapid changes – in the data content, Stream analysis – to cope with the incoming data for real-time online analytics
  • Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies
  • http://radhakrishna.typepad.com/rks_musings/2013/04/big-data-review.htmlGoogle predicted the spread of flu in real time - after analyzing two datasets, a.) 50 million most common terms that Americans type, b.) data on the spread of seasonal flu from public health agency- tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases- model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system (Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013)
  • Better Algorithms Beat More Data — And Here’s Whyhttp://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/Big Data Cannot Replace Human Judgmenthttp://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx**Comments about the articles
  • Smart data makes sense out of big data – it provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, to provide actionable information and improve decision making.
  • - HUMAN CENTRIC!!
  • Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans
  • All the data related to human activity, existence and experiencesMore on PCS Computing: http://wiki.knoesis.org/index.php/PCS
  • Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans Example of a human guided modeling and improved performancehttp://research.microsoft.com/en-us/um/people/akapoor/papers/IJCAI%202011a.pdf
  • Also, we have weather application which performs abstraction on weather sensory observations to identify blizzard conditions (food for actions!!) :--20,000 weather stations (with ~5 sensors per station)-- Real-Time Feature Streams - live demo: http://knoesis1.wright.edu/EventStreams/ - video demo: https://skydrive.live.com/?cid=77950e284187e848&sc=photos&id=77950E284187E848%21276
  • Lets find it..
  • Starting slide Various Big data problems – Traditional examples vs what we are doing examples. Variety and Velocity than Volume. kHealth problem. People will be interested in Smart Data.Traditional ML techniques, High Performance Computing, Statistics. Human level of Abstraction is Smart data.
  • http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.htmlI would like to start with a motivational example here.
  • http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-floodinghttp://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.htmlhttp://mashable.com/2012/10/31/hurricane-sandy-facebook/We in our lab have quite a bit of Social Data Research going on. So I would like to focus on the use of social networks during these disasters/crisis.Twitter and Facebook are massively used during disasters. During Hurricane Sandy there were …Not only this a major outbreak of tweets were during Japan earthquake which crossed more that 2000 tweets/sec.So why do people intend to use social networks to this extent during disasters.
  • Fraustino, Julia Daisy, Brooke Liu and Yan Jin. “Social Media Use during Disasters: A Review of the Knowledge Base and Gaps,” Final Report to Human Factors/Behavioral Sciences Division, Science and Technology Directorate, U.S. Department of Homeland Security. College Park, MD: START, 2012. Disaster communication deals with disaster information disseminated to the public by governments, emergency management organizations, and disaster responders as well as disaster information created and shared by journalists and the public. Disaster communication increasingly occurs via social media in addition to more conventional communication modes such as traditional media (e.g., newspaper, TV, radio) and word-of-mouth (e.g., phone call, face-to-face, group). Timely, interactive communication and user-generated content are hallmarks of social media, which include a diverse array of web- and mobile-based tools Disaster communication deals with (1) disaster information disseminated to the public by governments, emergency management organizations, and disaster responders often via traditional and social media; as well as (2) disaster information created and shared by journalists and affected members of the public often through word-of-mouth communication and social media. For information seeking. Disasters often breed high levels of uncertainty among the public (Mitroff, 2004), which prompts them to engage in heightened information seeking, (Boyle, Schmierbach, Armstrong, & McLeod, 2004; Procopio & Procopio, 2007). As expected, information seeking is a primary driver of social media use during routine times and during disasters (Liu et al., in press; PEW Internet, 2011). For timely information. Social media provide real-time disaster information, which no other media can provide (Kavanaugh et al., 2011; Kodrich & Laituri, 2011). Social media can become the primary source of time-sensitive disaster information, especially when official sources provide information too slowly or are unavailable (Spiro et al., 2012). For example, during the 2007 California wildfires, the public turned to social media because they thought journalists and public officials were too slow to provide relevant information about their communities (Sutton, Palen, & Shklovski, 2008). Time-sensitive information provided by social media during disasters is also useful for officials. For example, in an analysis of more than 500 million tweets, Culotta (2010) found Twitter data forecasted future influenza rates with high accuracy during the 2009 pandemic, obtaining a 95% correlation with national health statistics. Notably, the national statistics came from hospital survey reports, which typically had a lag time of one to two weeks for influenza reporting. For unique information. One of the primary reasons the public uses social media during disaster is to obtain unique information (Caplan, Perse, & Gennaria, 2007). Applied to a disaster setting, which is inherently unpredictable and evolving, it follows that individuals turn to whatever source will provide the newest details. Oftentimes, individuals experiencing the event first-hand are on the scene of the disaster and can provide updates more quickly than traditional news sources and disaster response organization. For instance, in the Mumbai terrorist attacks that included multiple coordinated shootings and bombings across two days, laypersons were first to break the news on Twitter (Merrifield & Palenchar, 2012). Research participants report using social media to satisfy their need to have the latest information available during disasters and for information gathering and sharing during disasters (Palen, Starbird, Vieweg, & Hughes, 2010; Vieweg, Hughes, Starbird, & Palen, 2010). For unfiltered information. To obtain crisis information, individuals often communicate with one another via social media rather than seeking a traditional news source or organizational website (Stephens & Malone, 2009). The public check in with social media not only to obtain up-to-date, timely information unavailable elsewhere, but also because they appreciate that information may be unfiltered by traditional media, organizations, or politicians (Liu et al., in press).  To determine disaster magnitude. The public uses social media to stay apprised of the extent of a disaster (Liu et al., in press). They may turn to governmental or organizational sources for this information, but research has shown that if the public do not receive the information they desire when they desire it, they, along with others, will fill in the blanks (Stephens & Malone, 2009), which can create rumors and misinformation. On the flipside, when the public believed that officials were not disseminating enough information regarding the size and trajectory of the 2007 California wildfires, they took matters into their own hands, using social media to track fire locations in real-time and notify residents who were potentially in danger (Sutton, Palen, & Shklovski, 2008).  To check in with family and friends. While Americans predominately use social media to connect with family and friends (PEW Internet, 2011), during disasters those connections may shift. For those with family or friends directly involved with the disaster, social media can provide a way to ensure safety, offer support, and receive timely status updates (Procopio & Procopio, 2007; Stephens & Malone, 2009). In a survey of 1,058 Americans, the American Red Cross (2010) found that nearly half of their respondents would use social media to let loved ones know they are safe during disasters. After the 2011 earthquake and tsunami in Japan, the public turned to Twitter, Facebook, Skype, and local Japanese social networks to keep in touch with loved ones while mobile networks were down (Gao, Barbier, & Goolsby, 2011). Researchers also note that disasters may enhance feelings of affection toward family members, and indeed survey participants reported expressing more positive emotions toward their loved ones than usual as a result of the September 11 terrorist attacks, even if they were not directly impacted by the disaster (Fredrickson et al., 2003). Finally, disasters can motivate the public to reconnect with family and friends via social media (Procopio & Procopio, 2009; Semaan & Mark, 2012).  To self-mobilize. During disasters, the public may use social media to organize emergency relief and ongoing assistance efforts from both near and afar. In fact, one research group dubbed those who surge to the forefront of digital and in-person disaster relief efforts as “voluntweeters” (Starbird & Palen, 2011). Other research documents the role of Facebook and Twitter in disaster relief fundraising (Horrigan & Morris, 2005; PEJ, 2010). Research also reveals how social media can help identify and respond to urgent needs after disasters. For example, just two hours after the 2010 Haitian earthquake Tufts University volunteers created Ushahidi-Haiti, a crisis map where disaster survivors and volunteers could send incident reports via text messages and tweets. In less than two weeks, 2,500 incident reports were sent to the map (Gao, Barbier, & Gollsby, 2011).  To maintain a sense of community. During disasters the media in general and social media in particular may provide a unique gratification: sense of community. That is, as the public logs in online to share their feelings and thoughts, they assist each other in creating a sense of security and community, even when scattered across a vast geographical area (Lev-On, 2011; Procopio & Procopio, 2007). As Reynolds and Seeger (2012) observed, social media create communities during disasters that may be temporary or may continue well into the future.  To seek emotional support and healing. Finally, disasters are often inherently tragic, prompting individuals to seek not only information but also human contact, conversation, and emotional care (Sutton et al., 2008). Social media are positioned to facilitate emotional support, allowing individuals to foster virtual communities and relationships, share information and feelings, and even demand resolution (Choi & Lin, 2009; Stephens & Malone, 2009). Indeed, social media in general and blogs in particular are instrumental for providing emotional support during and after disasters (Macias, Hilyard, & Freimuth, 2009; PEJ New Media Index, 2011). Additionally, social media in general and Twitter in particular can aid healing, as research finds during both natural disasters, such as Hurricane Katrina (Procopio & Procopio, 2007), and man-made disasters, such as the July 2011 attacks in Oslo, Norway (Perng et al., 2012).
  • http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec -- Facebook help during Hurricane Sandyhttp://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html – Twitter page for Hurricane Sandyhttp://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.htmlCategorization of severity based on weather conditions. Actionable information is contextually dependent.
  • http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (in turn data) can help people during disasters. Data becomes smart data if it takes recipient into account - context.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  • http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_UShttp://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdfLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contact accordingly.Sensor data for emergency responders. Who in the population needs immidiate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  • http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contxt accordingly.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  • http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandysDuring the storm last night, user @comfortablysmug was the source of a load of frightening but false information about conditions in New York City that spread wildly on Twitter and onto news broadcasts before Con Ed, the MTA, and Wall Street sources had to take time out of the crisis situation to refute them.
  • Although we face challenges like these with data everytime. The most important thing is what you aim to do with the data. I mean what value do you intend to provide from the data
  • http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  • http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  • -- Contextual Questioning – Potential Information needed from Humans
  • Larry Smarr is a professor at the University of California, San DiegoAnd he was diagnosed with Crones DiseaseWhat’s interesting about this case is that Larry diagnosed himselfHe is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptomsThrough this process he discovered inflammation, which led him to discovery of Crones DiseaseThis type of self-tracking is becoming more and more common
  • Massive amount of data will be collected by sensors and mobile devices yet patients and doctors care about “actionable” information.This data has all the four Vs of big data and we used knowledge enabled techniques to transform it into valueIn the context of PD, we analyzed massive amount of sensor data collected by sensors on a smartphones to understand detection and characterization of PD severity.
  • Main idea: Prior knowledge of PD was used to facilitate its detection from massive sensor data by reducing the search spaceDetails:Declarative knowledge of PD includes PD severity and their symptoms as shown in the logical rule aboveEach PD severity level is a conjunction of a set of PD symptomsEach symptom was mapped to its manifestation in sensor observationsThe availability of declarative knowledge significantly improved the analytics by aiding feature selection processThe graphs above contrasts the physical movements and voice of two control group members and two PD patients
  • kHealth:http://www.youtube.com/watch?v=btnRi64hJp4EventShop:*http://www.slideshare.net/jain49/eventshop-120721, http://dl.acm.org/citation.cfm?id=2488175
  • - what if we could automate this sense making ability?- and what if we could do this at scale?
  • sense making based on human cognitive models
  • perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  • perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  • A single-feature (disease) assumption means that all the observed properties (symptoms) must be explained by a single feature.i.e., this framework is not expressive enough to model comorbidity where there may be more than one feature (disease) co-existing For example, if there are two diseases causing disjoint symptoms, and all the symptoms of both the diseases are observed, then this framework will not be able to find the coverage and returns no diseases.
  • perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  • - With this ability,many problems could be solved- For example: we could help solve health problems (before they become serious health problems) through monitoring symptoms and real-time sense making, acting as an early warning system to detect problematic health conditions
  • Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies
  • Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologiesHenson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  • compute machine perception inferences -- i.e., explanation and discrimination -- of high-complexity on a resource-constrained devices in milisecondsDifference between the other systems and what this system provides
  • Intelligence at the age. Shipping computation and domain models to the edge (Distributed)
  • “to help software reusability in order to allow new applications to be built faster and to share innovations (software components, novel approaches) amongst software developers” “to standardize and commoditize back-end data stores so client software may access any Open mHealth-compliant data store in a uniform way (interoperability)” “to produce examples and documentation of these concepts meaningfully and simply”
  • Observe data from different sensors at the same time.
  • System Architecture Fig. shows an overview of the SemHealth architecture. SensorsAll are bluetooth sensors already utilized by the current k-Health application to measure weight, heart rate, and blood pressureAndroid applicationReads sensor observations through bluetoothPerforms annotation on observations and generates percepts from those observationsUploads annotated observations and percepts to the server-side data storeRetrieves data using DSU API and feeds data to DPU and/or DVU APIsVisualizes data through DVU APIConsidered a “nice to have” as existing visualization may be used as-isWill utilize existing graphing library for Android with Open mHealth-style API that may be translated to browser at a later timeServer-sideOpen mHealth compliant DSU and DPU APIsTriple data storage replaces existing SQLite database in k-Health applicationExisting k-Health reasoner now the brains behind DPU
  • http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  • Much of the early work in Big data is being done with focusing on uni-directional among XYZ.
  • http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
  • http://knoesis.wright.edu/library/resource.php?id=1787
  • Categorization of severity based on weather conditions. Actionable information is contextually dependent.
  • - 1 (+half) minuteAlright, so let’s motivate by this situation during emergency - Various actors: resource seekers, responder teams, resource providers at remote siteAnd - each of these actor groups have questions --- - needs - providers - responders: wondering!Here we have social network to connect these actors and bridge the gap for communication platformBut it’s potential use is yet to be realized for effective helpBecause.. (next slide)
  • Talk about what kind of smart data we provide that helps the actions of crisis response coordination.
  • Source: Purohit et. al 2013 (https://docs.google.com/a/knoesis.org/document/d/1aBJ2egHICUwaWxR8jOoTIUfEYj1QAnUt0q7haIKoYGY/edit# , http://www.knoesis.org/library/resource.php?id=1865)
  • http://twitris.knoesis.org/oklahomatornado
  • (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  • Highly rich interface for response team
  • Definition of the event US Elections and some changes/subevents --- Primaries --- Debates -- People/Places/Organizations involved in the eventArab Spring -- Subevents during those -- Egypt protests
  • Explain about continuous semantics
  • Pucher, J., Korattyswaroopam, N., & Ittyerah, N. (2004). The crisis of public transport in India: Overwhelming needs but limited resources. Journal of Public Transportation, 7(4), 1-30.
  • Point of this slide: correlations
  • Point of this slide: heterogeneity and uncertainty
  • A single observation of slow moving traffic may have multiple explanations.
  • Internal observations are limited to whatever the on-road sensors can observe. In the 511.org data we have analyzed, the internal observations are mentioned above.External events are obtained from sources beyond the on-road sensors e.g., some agency like 511.org which reports traffic incidents.Note that: Internal observations are mostly machine sensors External events are mostly textual observationsThe analogy in healthcare will be:Internal observations: on body sensors such as heart rate, temperatureExternal events: jogging, walking, taking stairs
  • e.g. equation for projectile motion may not precisely compute the actual projectile. Air resistance may have been ignored
  • Used of open data for parameters is promising and can be explored as future research.
  • Some facts about the domain of traffic got from Conceptnet5The types of events are obtained by using the comprehensive subsumption relationship from 511.orgWe propose to use such a knowledge in complementing the PGM structure learning algorithmsCapableOf(traffic jam, occur twice each day)CapableOf(traffic jam, slow traffic)RelatedTo(accident, car crash)Causes(road ice, accident)CapableOf(bad weather, slow traffic)HasSubevent(go to concert, car crash)Causes(go to baseball game, traffic jam)Causes(traffic accident, traffic jam)BadWeather(road ice)BadWeather(bad weather)ScheduledEvent(go to concert)ScheduledEvent(go to baseball game)ActiveEvent(traffic accident)Delay(slow traffic)Delay(traffic jam)TimeOfDay(occur twice each day)
  • Declarative knowledge + statistical correlationThis slide illustrates the three operations to enrich the correlation structure extracted using statistical methods These operations utilize declarative knowledge form ConceptNet5 as shown in each step
  • Statistical correlation structure shown aboveThe enriched structure is shown belowThe enrichment of the graphical model will potentially allow us to capture the domain precisely and also improve our prediction as the model would get closer to the underlying probabilistic distribution in the real-worldLog-Likelihood score is one way of quantifying how good a structure is based on the observed data There may be many candidate structures extracted from data which result in the log likelihood scoreDeclarative knowledge will help us ground statistical models to reality which will allow us to pick one structure over the other Pramod Anantharam, KrishnaprasadThirunarayan and AmitSheth, 'Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases,' 2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013) at SIAM International Conference on Data Mining (SDM13), pp. 13--20, Texas, USA, May 2-4, 2013.We stopped at structure extraction for our workshop paper (SIAM ACS workshop) since the declarative knowledge we used (ConceptNet5) and statistical model (nodes and edges) are at the same level of abstraction
  • More at: http://wiki.knoesis.org/index.php/PCSAnd http://knoesis.org/projects/ssw/

Transcript

  • 1. 1
  • 2. 2011How much data?48(2013)500(2013)2http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  • 3. 1% of the data isused for analysis.3http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  • 4. VarietySemi structured4
  • 5. VelocityFast DataRapid ChangesReal-Time/Stream AnalysisCurrent application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
  • 6. • Focus on verticals: advertising‚ social media‚ retail‚financial services‚ telecom‚ and healthcare– Aggregate data, focused on transactions, limitedintegration (limited complexity), analytics to find(simple) patterns– Emphasis on technologies to handlevolume/scale, and to lesser extent velocity:Hadoop, NoSQL,MPP warehouse ….– Full faith in the power of data (nohypothesis), bottom up analysis6Current Focus on Big Data
  • 7. • What if your data volume gets so large andvaried you dont know how to deal with it?• Do you store all your data?• Do you analyze it all?• How can you find out which data points arereally important?• How can you use it to your best advantage?7Questions typically asked on Big Datahttp://www.sas.com/big-data/
  • 8. http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/Variety of Data Analytics Enablers8
  • 9. • Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematicalmodels to test the search terms, comparing their predictions againstthe actual flu cases; 45 important parameters were founds– Model was tested when H1N1 crisis struck in 2009 and gave moremeaningful and valuable real time information than any public healthofficial system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]• FareCast: predict the direction of air fares over differentroutes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]• NY city manholes problem [ICML Discussion, 2012]9Illustrative Big Data Applications
  • 10. • Current focus mainly to serve business intelligence and targetedanalytics needs, not to serve complex individual and collectivehuman needs (e.g., empower human in health, fitness and well-being; better disaster coordination) that is highlypersonalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory natureof real-world and human perception– Need deeper understanding of data and its role to information (e.g., skew,coverage)• Human involvement and guidance: Leading to actionableinformation, understanding and insight right in the context ofhuman activities– Bottom-up & Top-down processing: Infusion of models and backgroundknowledge (data + knowledge + reasoning)10What is missing?
  • 11. Makes SenseActionable or help decision support/making11
  • 12. Smart DataSmart data makes sense out of Big dataIt provides value from harnessing thechallenges posed byvolume, velocity, variety and veracity of bigdata, in-turn providing actionableinformation and improve decisionmaking.12
  • 13. “OF human, BY human and FOR human”Smart data is focused on the actionablevalue achieved by human involvement indata creation, processing and consumptionphases for improvingthe human experience.Another perspective on Smart Data13
  • 14. DescriptiveExploratoryInferentialPredictiveCausalImprovedAnalytics CREATIONPROCESSINGEXPERIENCE& DECISIONMAKING14Human Centric Computing
  • 15. “OF human, BY human and FOR human”Another perspective on Smart Data15
  • 16. Petabytes of Physical(sensory)-Cyber-Social Data everyday!More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16‘OF human’ : Relevant Real-time DataStreams for Human Experience
  • 17. “OF human, BY human and FOR human”17Another perspective on Smart Data
  • 18. Use of Prior Human-created Knowledge Models18‘BY human’: InvolvingCrowd Intelligence in data processing workflowsCrowdsourcing and Domain-expert guidedMachine Learning Modeling
  • 19. “OF human, BY human and FOR human”Another perspective on Smart Data19
  • 20. Detection of events, such as wheezingsound, indoortemperature, humidity, dust, and CO2levelWeather ApplicationAsthma HealthcareApplicationClose the window at homeduring day to avoid CO2 ingush, to avoid asthma attacksat night20‘FOR human’ :Improving Human ExperiencePopulation LevelPersonalPublic HealthAction in the Physical World
  • 21. 21Why do we care about Smart Datarather than Big Data?
  • 22. Transforming Big Data into Smart Data:Deriving Value via harnessing Volume, Variety and Velocityusing semantics and Semantic WebPut Knoesis BannerKeynote at SEBD 2013, July 1, 2013 and invited talk in universities in Spain, June 2013.The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USAPavanKapanipathiPramodAnantharamAmit ShethCoryHensonDr. T.K.PrasadMaryamPanahiazarContributions by many, but Special Thanks to:HemantPurohit
  • 23. Second-costliest hurricane in United Stateshistory estimated damage $75 billion90-115 mph windsState of Emergency in New York285 people killed on the track of Sandy750,000 without power (NY)Immense devastation and Human suffering23Big Data to Smart Data: Disaster Management examplehttp://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
  • 24. 20 million tweets with “sandy, hurricane”keywords between Oct 27th and Nov 1st2nd most popular topic on Facebook during 2012Social (Big) Data during Hurricane Sandy24• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html• http://mashable.com/2012/10/31/hurricane-sandy-facebook/
  • 25. For information seekingFor timely informationFor unique informationFor unfiltered informationTo determine disaster magnitudeTo check in with family and friendsTo self-mobilizeTo maintain a sense of communityTo seek emotional support and healingGovernmentsEmergency managementorganizationsJournalistsDisaster respondersPublicBIG DATA TO SMART DATA: WHY? and FOR WHOM?25Fraustino et al. Social Media Useduring Disasters: A Review of theKnowledge Base and Gaps. US Dept.of Homeland Security, START 2012.
  • 26. Improving situational awareness- Timely delivery of necessaryinformation to the right peopleImproving coordination betweenresource seekers and suppliersDetecting the magnitude ofdisaster by people sentiments.Many more challenges…Can SNS’s make Disaster Management easier –Giving Actionable Information (Smart Data)26http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rechttp://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.htmlhttp://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
  • 27. VolumeTwitter hits half a billion tweets a day!ChallengesDelivering the necessaryactionable/information to the right people27http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
  • 28. VelocityVolume@ConEdison Twitter handle that the company had onlyset up in June gained an extra 16,000 followers over thestorm. – Did the information reach everyone?ChallengesDelivering the necessary/actionableinformation to the right peopleRate of Data ArrivalApproximately 7000 TPS10 images per second on instagram28http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_UShttp://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
  • 29. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USVelocityVarietyVolumeSemi StructuredStructuredUnstructuredSensorsLinked Open DataWikipediaChallengesDelivering the necessary/actionableinformation to the right people29
  • 30. VelocityVarietyVeracityVolumeChallengesDelivering the necessary/actionableinformation to the right people30http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
  • 31. VelocityVarietyVeracityVolume31
  • 32. Value-Makes Sense-Actionable Information-Decision support/makingData http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 32Smart Datafocuses on thevalue
  • 33. Value-Makes Sense-Actionable Information-Decision support/makingDisaster ManagementVictimsTimely and Contextual Information about• Electricity, Food, Water, Shelter anddonation offers related to the disaster.Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 33
  • 34. DescriptiveExploratoryInferentialPredictiveCausalHuman Centric ComputingImprovedAnalytics CreationProcessingExperience34Revisiting..
  • 35. • Healthcare– kHealth– SemHeath• Social event coordination– Twitris• Traffic monitoring– kTraffic35Applications of Smart Data Analytics
  • 36. The Patient of the FutureMIT Technology Review, 2012http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 36
  • 37. To gain new insight inpatient care &early indications ofdisease37Smart Data in Healthcare
  • 38. Sensing is a key enabler of the Internet of ThingsBUT, how do we make sense of the resulting avalancheof sensor data?50 Billion Things by 2020 (Cisco)38
  • 39. Parkinson’s disease (PD) data from The Michael J. Fox Foundationfor Parkinson’s Research.391https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data8 weeks of data from 5 sensors on a smart phone, collected for 16 patientsresulting in ~12 GB (with lot of missing data).Variety VolumeVeracityVelocityValueCan we detect the onset of Parkinson’s disease?Can we characterize the disease progression?Can we provide actionable information to the patient?semanticsRepresenting prior knowledge of PDled to a focused exploration of thismassive datasetWHY Big Data to Smart Data: Healthcare example
  • 40. 40Big Data to Smart Data Using a Knowledge Based ApproachParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)ParkinsonAdvanced(person) = Fall(person)Control Group PD PatientsMovements of an activeperson has a gooddistribution over X, Y, andZ axisRestricted movements bya PD patient can be seenin the accelerationreadingsAudio is well modulatedwith good variations inthe energy of the voiceAudio is not wellmodulated represented amonotone speechDeclarative Knowledge ofParkinson’s Disease used to focusour attention on symptommanifestations in sensorobservations
  • 41. • 25 million people in the U.S. are diagnosed withasthma (7 million are children)1.• 300 million people suffering from asthmaworldwide2.• Asthma related healthcare costs alone are around$50 billion a year2.• 155,000 hospital admissions and 593,000 emergencydepartment visits in 20063.411http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.Asthma: Severity of the problem
  • 42. Asthma is a multifactorial disease with health signals spanningpersonal, public health, and population levels.42Real-time health signals from personal level (e.g., Wheezometer, NO inbreath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), andpopulation level (e.g., pollen level, CO2) arriving continuously in fine grainedsamples potentially with missing information and uneven sampling frequencies.Variety VolumeVeracityVelocityValueCan we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?semanticsUnderstanding relationships betweenhealth signals and asthma attacksfor providing actionable informationWHY Big Data to Smart Data: Healthcare example
  • 43. 43Population LevelPersonalPublic HealthVariety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromisedValue: Can I reduce my asthma attacks at night?Decision support to doctorsby providing them withdeeper insights into patientasthma careAsthma: Demonstration of Value
  • 44. 44Sensordrone – for monitoringenvironmental air qualityWheezometer – for monitoringwheezing soundsCan I reduce my asthma attacks at night?What are the triggers?What is the wheezing level?What is the propensity toward asthma?What is the exposure level over a day?What is the air quality indoors?Commute to WorkPersonalPublic HealthPopulation LevelClosing the window at homein the morning and taking analternate route to office maylead to reduced asthma attacksActionableInformationAsthma: Actionable Information for Asthma Patients
  • 45. Personal, Public Health, and Population Level Signals for Monitoring AsthmaAsthma Control => Daily MedicationChoices for startingtherapyNot Well Controlled Poor ControlledSeverity Level ofAsthma(Recommended Action) (Recommended Action) (Recommended Action)Intermittent Asthma SABA prn - -Mild Persistent Asthma Low dose ICS Medium ICS Medium ICSModerate PersistentAsthmaMedium dose ICS aloneOr withLABA/montelukastMedium ICS +LABA/MontelukastOr High dose ICSMedium ICS +LABA/MontelukastOr High dose ICS*Severe Persistent Asthma High dose ICS withLABA/montelukastNeeds specialist care Needs specialist careICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ;*consider referral to specialistAsthma Controland Actionable InformationSensors and their observationsfor understanding asthma45
  • 46. 46PersonalLevel SignalsSocietal LevelSignals(Personal Level Signals)(PersonalizedSocietal Level Signal)(Societal Level Signals)Societal Level SignalsRelevant to thePersonal LevelPersonal Level Sensors(kHealth**) (EventShop*)Qualify QuantifyActionRecommendationWhat are the features influencing my asthma?What is the contribution of each of these features?How controlled is my asthma? (risk score)What will be my action plan to manage asthma?StorageSocietal Level SensorsAsthma Early Warning Model (AEWM)Query AEWMVerify & augmentdomain knowledgeRecommendedActionActionJustificationAsthma Early Warning Model*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
  • 47. 47Population LevelPersonalWheeze – YesDo you have tightness of chest? –YesObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding<Wheezing=Yes, time, location><ChectTightness=Yes, time, location><PollenLevel=Medium, time, location><Pollution=Yes, time, location><Activity=High, time, location>WheezingChectTightnessPollenLevelPollutionActivityWheezingChectTightnessPollenLevelPollutionActivityRiskCategory<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>...ExpertKnowledgeBackgroundKnowledgetweet reporting pollution leveland asthma attacksAcceleration readings fromon-phone sensorsSensor and personalobservationsSignals from personal, personalspaces, and community spacesRisk Category assigned bydoctorsQualifyQuantifyEnrichOutdoor pollen and pollutionPublic HealthHealth Signal Extraction to UnderstandingWell Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor
  • 48. … and do it efficiently and at scaleWhat if we could automate thissense making ability?48
  • 49. People are good at making sense of sensory inputWhat can we learn from cognitive models of perception?• The key ingredient is prior knowledge49
  • 50. * based on Neisser’s cognitive model of perceptionObservePropertyPerceiveFeatureExplanationDiscrimination12Perception Cycle*Translating low-level signalsinto high-level knowledgeFocusing attention on thoseaspects of the environment thatprovide useful informationPrior Knowledge50
  • 51. To enable machine perception,Semantic Web technology is used to integratesensor data with prior knowledge on the Web51
  • 52. Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph52
  • 53. Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph53
  • 54. ObservePropertyPerceiveFeatureExplanation1Translating low-level signalsinto high-level knowledgeExplanationExplanation is the act of choosing the objects or events that best account for aset of observations; often referred to as hypothesis building54
  • 55. ExplanationInference to the best explanation• In general, explanation is an abductive problem; andhard to computeFinding the sweet spot between abduction and OWL• Single-feature assumption* enables use of OWL-DLdeductive reasoner* An explanation must be a single feature which accounts forall observed propertiesExplanation is the act of choosing the objects or events that best account for a set ofobservations; often referred to as hypothesis building55
  • 56. ExplanationExplanatory Feature: a feature that explains the set of observed propertiesExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaObserved Property Explanatory Feature56
  • 57. Discrimination is the act of finding those properties that, if observed, would help distinguishbetween multiple explanatory featuresObservePropertyPerceiveFeatureExplanationDiscrimination2Focusing attention on thoseaspects of the environment thatprovide useful informationDiscrimination57
  • 58. DiscriminationExpected Property: would be explained by every explanatory featureExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaExpected Property Explanatory Feature58
  • 59. DiscriminationNot Applicable Property: would not be explained by any explanatory featureNotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaNot Applicable Property Explanatory Feature59
  • 60. DiscriminationDiscriminating Property: is neither expected nor not-applicableDiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicablePropertyelevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaDiscriminating Property Explanatory Feature60
  • 61. Through physical monitoring andanalysis, our cellphones could act asan early warning system to detectserious health conditions, andprovide actionable informationcanary in a coal mineOur MotivationkHealth: knowledge-enabled healthcare61
  • 62. Qualities-High BP-Increased WeightEntities-Hypertension-HypothyroidismkHealthMachine SensorsPersonal InputEMR/PHRComorbidity risk scoree.g., Charlson IndexLongitudinal studies ofcardiovascular risks- Find correlations- Validation- domain knowledge- domain expertParameterize themodelRisk Assessment ModelCurrent Observations-Physical-Physiological-HistoryRisk Score(Actionable Information)Model CreationValidate correlationsHistorical observationsof each patientRisk Score: from Data to Abstraction and Actionable Information62
  • 63. How do we implement machine perception efficiently on aresource-constrained device?Use of OWL reasoner is resource intensive(especially on resource-constrained devices),in terms of both memory and time• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)63
  • 64. intelligence at the edgeApproach 1: Send all sensor observationsto the cloud for processingApproach 2: downscale semanticprocessing so that each device is capableof machine perception64Henson et al. An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-ConstrainedDevices, ISWC 2012.
  • 65. Efficient execution of machine perceptionUse bit vector encodings and their operations to encode prior knowledge andexecute semantic reasoning010110001101001111001010110001101101101011000110100111100101011000110101100011010011165
  • 66. O(n3) < x < O(n4) O(n)Efficiency Improvement• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to linearEvaluation on a mobile device66
  • 67. 2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized andintegrated with prior knowledge on the Web3 Intelligence at the edgeBy downscaling semantic inference, machine perception canexecute efficiently on resource-constrained devicesSemantic Perception for smarter analytics: 3 ideas to takeaway1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensorysignals into high-level knowledge useful for decision making67
  • 68. • Real Time Feature Streams:http://www.youtube.com/watch?v=_ews4w_eCpg• kHealth: http://www.youtube.com/watch?v=btnRi64hJp468Demos
  • 69. 73Smart Data in Social Media AnalyticsTo Understand thehuman socialdynamics in realworld events
  • 70. 0.5B Tweets per day0.5B Users60% on Mobile5530 Tweets per secondrelated to the Japan earthquake and tsunami17000 Tweetsper second74Twitter During Real-world Events of Interesthttp://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  • 71. 75http://usatoday30.usatoday.com/news/politics/twitter-election-meterhttp://twitris.knoesis.org/
  • 72. State of the Art – Uni/Bi Dimensional Analysis During ElectionsTopicsSentiments76
  • 73. Twitris’ Dimensions of Integrated Semantic Analysis77Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
  • 74. 78http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
  • 75. 79[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]/t
  • 76. 80Twitris: Sentiment Analysis- Smart Answers with reasoning!How was Obama doing in the first debate?
  • 77. 81Red Color: Negative TopicsGreen Color: Positive TopicsTwitris: Sentiment Analysis- Smart Answers with reasoning!How was Obama doing in the second debate?SMART DATA IS ABOUT ANALYSIS FOR REASONING(what caused the positive sentiment for Democrats)BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)http://knoesis.wright.edu/library/resource.php?id=1787
  • 78. Top 100 influential users thattalks about Barack ObamaPositive or NegativeInfluenceTwitris: Network AnalysisSMART DATA TELLS YOU HOW CAN A SYSTEM BETWEAKED FOR THE DESIRED ACTIONS!Could we engage with users (targeted) with extremepolarity leaning for Obama to spark an agenda in the wholenetwork of voters (ACTION)? 82
  • 79. Twitris: Community EvolutionSMART DATA FOCUSES ON THE CAUSALITYOF CHANGES IN REAL-WORLD ACTIONS!RomneyObamaEvolution of influencer interaction networks for Romney vs. Obamatopical communities, during U.S. Presidential Election 2012 debatesBefore 1stdebateAfter 1stdebateAfterHurricane SandyAfter 3rddebate83
  • 80. The Dead People mentionedin the event OWCTwitris: Impact of Background Knowledge84
  • 81. How People from Differentparts of the world talkedabout US ElectionImages and VideosRelated to US ElectionTwitris: Analysis by Location85
  • 82. What is Smart Data in the context ofDisaster ManagementACTIONABLE: Timely delivery ofright resources and information tothe right people at right location!86Because everyone wants to Help, but DON’T KNOW HOW!
  • 83. Join us for the SocialGood!http://twitris.knoesis.orgRT @OpOKRelief:Southgate Baptist Churchon 4th Street in Moorehasfood, water, clothes, diapers, toys, and more. Ifyou cant go,call 794Text "FOOD" to32333, REDCROSS to90999, or STORM to80888 to donate $10in storm relief.#moore #oklahoma#disasterrelief#donateWant to help animals in#Oklahoma? @ASPCA tellshow you can help:http://t.co/mt8l9PwzmOCITIZEN SENSORSRESPONSE TEAMS(including humanitarianorg. and ‘pseudo’ responders)VICTIM SITECoordination ofneeds and offersUsing Social MediaDoes anyoneknow where tosend a check todonate to thetornadovictims?Where do I goto help out forvolunteer workaround Moore?Anyone know?Anyone knowwhere to donateto help theanimals from theOklahomadisaster? #oklahoma #dogsMatchedMatchedMatchedServing the need!If you would like to volunteertoday, help is desperatelyneeded in Shawnee. Call273-5331 for more infohttp://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-1153161287Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress:
  • 84. Smart Data from Twitris system forDisaster Response CoordinationWhich are the primary locations withmost negative sentiments/emotions?Who are all the people to engagewith for better informationdiffusion?Which are the most importantorganizations acting at mylocation?Smart data provides actionable information and improve decision making throughsemantic analysis of Big Data.Who are the resource seekers andsuppliers? How can one donate?88
  • 85. Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89Disaster Response Coordination Framework
  • 86. Disaster Response Coordination:Twitris Summary for Actionable Nuggets90Important tags tosummarize Big Data flowRelated to OklahomatornadoImages and Videos Relatedto Oklahoma tornado
  • 87. 91Disaster Response Coordination:Twitris Real-time information for needsIncoming Tweets with needtypes to give quick idea ofwhat is needed and wherecurrently #OKCLegends for Differentneeds #OKC(It is real-time widget for monitoring of needs, so will not be active after the event has passed)http://twitris.knoesis.org/oklahomatornado
  • 88. 92Disaster Response Coordination:Influencers to engage with for specific needsInfluential users are respectiveneeds and their interactionnetwork on the right.
  • 89. Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013- 1.3% as the highly precise donation requests to help- 0.02% as the highly precise donation offers to help93• Anyone know how to get involved tohelp the tornado victims inOklahoma??#tornado #oklahomacity(OFFER)• I want to donate to the Oklahoma causeshoes clothes even food if I can (OFFER)Disaster Response Coordination:Finding Actionable Nuggets for Responders to act• Text REDCROSS to 909-99 to donate tothose impacted by the Moore tornado!http://t.co/oQMljkicPs (REQUEST)• Please donate to Oklahoma disasterrelief efforts.: http://t.co/crRvLAaHtk(REQUEST)For responders, most important information is the scarcity andavailability of resources, can we mine it via Social Media?
  • 90. • Features driven by the experience of domain experts at theresponder organizations• Examples,– ‘I want to <donate/ help/ bring>’ for extraction of offeringintention– ‘tent house’ OR ‘cots’ for shelter need types94Disaster Response Coordination:Human Knowledge to drive information extraction
  • 91. • A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching forneeds (query) vs. offers (documents)• Example,– @bladesofmilford please help get the word out,we are accepting kid clothes to sendto the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)95Disaster Response Coordination:Automatic Matching of needs and offersMatching thecompetitive intentions(Needs and Offers) canoffload humans for thetask of resourcematchmaking forcoordination.
  • 92. 96Disaster Response Coordination:Engagement Interface for respondersWhat-Where-How-Who-WhyCoordinationInfluential users to engagewith and resources forseekers/supplies at alocation, at a timestampContextualInformation for achosen topical tags
  • 93. • Illustrious scenario: #Oklahoma-tornado 201397Disaster Response Coordination:Anecdote for the value of Smart DataFEMA asked us to quickly filterout gas-leak related dataMining the data for smart nuggetsto inform FEMA (Timely needs)Engaged with the author of thisinformation to confirm (Veracity)e.g., All gas leaks in #moore were capped and stopped by11:30 last night (at 5/22/2013 1:41:37)Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)e.g., I want to go to Oklahoma this weekend & do what i can to help those people withfood,cloths & supplies,im in the feel of wanting to help ! :)
  • 94. An event is a dynamic topic that evolves andmight later fork into several distinct events.Smart Data analytics to capture rapidly evolving social data events98Social Media is the pulse of thepopulace, a true reflection ofevents all over the globe!
  • 95. Continuous Semantics99
  • 96. Dynamic Model CreationContinuous Semantics 100
  • 97. Dynamic Model Creation:101Example of how background knowledge helpunderstand situation described in the tweets, whilealso updating knowledge model also
  • 98. How is Continuous Semantics a form ofSmart Data Analytics?Keeping the Background Knowledgeabreast with the changes of the eventSmartly learning and adapting data acquisition(Temporally apt Big Data, i.e. Fast Data)In-turn providing temporally relevantSmart Data through analysis102
  • 99. 103Smart Data Analytics in Traffic ManagementTo improve theeveryday lifeentangled dueto our mostcommonproblem ofsticking intraffic
  • 100. By 2001 over 285 million Indians lived in cities, more than in allNorth American cities combined (Office of the Registrar General of India 2001)11The Crisis of Public Transport in India2IBM Smarter TrafficModes of transportation in Indian CitiesTexas Transportation Institute (TTI)Congestion report in U.S.104Severity of the Traffic Problem
  • 101. Vehicular traffic data from San Francisco Bay Area aggregated from on-roadsensors (numerical) and incident reports (textual)105http://511.org/Every minute update of speed, volume, travel time, and occupancy resulting in178 million link status observations, 738 active events, and 146 scheduledevents with many unevenly sampled observations collected over 3 months.Variety VolumeVeracityVelocityValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we provide actionable information to decision makers?semanticsRepresenting prior knowledge oftraffic lead to a focused explorationof this massive datasetBig Data to Smart Data: Traffic Management example
  • 102. Slow movingtrafficLinkDescriptionScheduledEventScheduledEvent511.org511.orgSchedule Information511.orgTraffic Monitoring106Heterogeneity in a Physical-Cyber-Social System
  • 103. 107Heterogeneity in a Physical-Cyber-Social System
  • 104. • Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):– Scheduled Events: music events, fair, theatre events, concerts, roadwork, repairs, etc.– Active Events: accidents, disabled vehicles, break down ofroads/bridges, fire, bad weather, etc.– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm• Each of these events may have a varying impact on traffic.• A delay prediction algorithm should process multimodal andmulti-sensory observations.Uncertainty in a Physical-Cyber-Social System108
  • 105. • Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variablesacross different parts of the network• External events– Accident, music event, sporting event, andplanned events– External events and internal observations mayexhibit correlationsModeling Traffic Events109
  • 106. AccidentMusic eventSporting eventRoad WorkTheatre eventExternal events<ActiveEvents, ScheduledEvents>Internal observations<speed, volume, traveTime>WeatherTime of DayModeling Traffic Events110
  • 107. Domain ExpertscoldPoorVisibilitySlowTrafficIcyRoadDeclarative domain knowledgeCausalknowledgeLinked Open DataCold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)1 0 1 11 1 1 01 1 1 11 0 1 0Domain ObservationsDomain KnowledgeStructure and parametersComplementing Probabilistic Models with Declarative Knowledge112Correlations to causations usingDeclarative knowledge on theSemantic Web
  • 108. • Declarative knowledge about various domainsare increasingly being published on the web1,2.• Declarative knowledge describes concepts andrelationships in a domain (structure).• Linked Open Data may be used to derivepriors probability of events (parameters).• Explored the use declarative knowledge forstructure using ConceptNet 5.1http://conceptnet5.media.mit.edu/2http://linkeddata.org/Domain Knowledge113
  • 109. http://conceptnet5.media.mit.edu/web/c/en/traffic_jamDelaygo to baseball gametraffic jamtraffic accidenttraffic jamActiveEventScheduledEventCausestraffic jamCausestraffic jamCapableOfslow trafficCapableOfoccur twice each dayCausesis_abad weatherCapableOfslow trafficroad iceCausesaccidentTimeOfDaygo to concertHasSubeventcar crashaccidentRelatedTocar crashBadWeatherCausesCausesis_ais_ais_a is_a is_ais_ais_aConceptNet 5114
  • 110. Traffic jamLinkDescriptionScheduledEventtraffic jambaseball gameAdd missing random variablesTime of daybad weather CapableOf slow trafficbad weatherTraffic data from sensors deployed on roadnetwork in San Francisco Bay Areatime of daytraffic jambaseball gametime of dayslow trafficThree Operations: Complementing graphical model structure extractionAdd missing links bad weathertraffic jambaseball gametime of dayslow trafficAdd link directionbad weathertraffic jambaseball gametime of dayslow trafficgo to baseball game Causes traffic jamKnowledge from ConceptNet5traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic115
  • 111. 116Scheduled EventActive EventDay of week Time of daydelayTravel timespeedvolumeStructure extracted formtraffic observations(sensors + textual) usingstatistical techniquesScheduled EventActive EventDay of weekTime of daydelayTravel timespeedvolumeBad WeatherEnriched structure which haslink directions and new nodessuch as “Bad Weather”potentially leading to betterdelay predictionsEnriched Probabilistic Models using ConceptNet 5
  • 112. Take Away• It is all about the human – not computing, notdevice– Computing for human experience• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human– But in serving human needs, there is a lot more thanwhat current big data analytics handle –variety, contextual, personalized, subjective, spanningdata and knowledge across P-C-S dimensions118
  • 113. Acknowledgements• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belongto their respective owners, we are grateful to the owners for usefulness of theseimages in our context.119
  • 114. • OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase• Vision: http://knoesis.org/node/266• Publications: http://knoesis.org/library120References and Further Readings
  • 115. Thanks …121
  • 116. 122Physical Cyber Social ComputingAmit Sheth, Kno.e.sis, Wright State
  • 117. Amit Sheth’sPHD studentsAshutosh JadhavHemantPurohitVinhNguyenLu ChenPavanKapanipathiPramodAnantharamSujanPereraAlan SmithPramod KoneruMaryam PanahiazarSarasi LalithsenaCory HensonKalpaGunaratnaDelroyCameronSanjayaWijeratneWenboWangKno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
  • 118. 124thank you, and please visit us athttp://knoesis.orgKno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USASmart Data