Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
2011How much data?48(2013)500(2013)2http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into...
1% of the data isused for analysis.3http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp:...
VarietySemi structured4
VelocityFast DataRapid ChangesReal-Time/Stream AnalysisCurrent application examples: financial services, stock brokerage, ...
• Focus on verticals: advertising‚ social media‚ retail‚financial services‚ telecom‚ and healthcare– Aggregate data, focus...
• What if your data volume gets so large andvaried you dont know how to deal with it?• Do you store all your data?• Do you...
http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/Variety of Data Analytics Enablers8
• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathemat...
• Current focus mainly to serve business intelligence and targetedanalytics needs, not to serve complex individual and col...
Makes SenseActionable or help decision support/making11
Smart DataSmart data makes sense out of Big dataIt provides value from harnessing thechallenges posed byvolume, velocity, ...
“OF human, BY human and FOR human”Smart data is focused on the actionablevalue achieved by human involvement indata creati...
DescriptiveExploratoryInferentialPredictiveCausalImprovedAnalytics CREATIONPROCESSINGEXPERIENCE& DECISIONMAKING14Human Cen...
“OF human, BY human and FOR human”Another perspective on Smart Data15
Petabytes of Physical(sensory)-Cyber-Social Data everyday!More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16‘...
“OF human, BY human and FOR human”17Another perspective on Smart Data
Use of Prior Human-created Knowledge Models18‘BY human’: InvolvingCrowd Intelligence in data processing workflowsCrowdsour...
“OF human, BY human and FOR human”Another perspective on Smart Data19
Detection of events, such as wheezingsound, indoortemperature, humidity, dust, and CO2levelWeather ApplicationAsthma Healt...
21Why do we care about Smart Datarather than Big Data?
Transforming Big Data into Smart Data:Deriving Value via harnessing Volume, Variety and Velocityusing semantics and Semant...
Second-costliest hurricane in United Stateshistory estimated damage $75 billion90-115 mph windsState of Emergency in New Y...
20 million tweets with “sandy, hurricane”keywords between Oct 27th and Nov 1st2nd most popular topic on Facebook during 20...
For information seekingFor timely informationFor unique informationFor unfiltered informationTo determine disaster magnitu...
Improving situational awareness- Timely delivery of necessaryinformation to the right peopleImproving coordination between...
VolumeTwitter hits half a billion tweets a day!ChallengesDelivering the necessaryactionable/information to the right peopl...
VelocityVolume@ConEdison Twitter handle that the company had onlyset up in June gained an extra 16,000 followers over thes...
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publi...
VelocityVarietyVeracityVolumeChallengesDelivering the necessary/actionableinformation to the right people30http://www.buzz...
VelocityVarietyVeracityVolume31
Value-Makes Sense-Actionable Information-Decision support/makingData http://www.wired.com/insights/2013/04/big-data-fast-d...
Value-Makes Sense-Actionable Information-Decision support/makingDisaster ManagementVictimsTimely and Contextual Informatio...
DescriptiveExploratoryInferentialPredictiveCausalHuman Centric ComputingImprovedAnalytics CreationProcessingExperience34Re...
• Healthcare– kHealth– SemHeath• Social event coordination– Twitris• Traffic monitoring– kTraffic35Applications of Smart D...
The Patient of the FutureMIT Technology Review, 2012http://www.technologyreview.com/featuredstory/426968/the-patient-of-th...
To gain new insight inpatient care &early indications ofdisease37Smart Data in Healthcare
Sensing is a key enabler of the Internet of ThingsBUT, how do we make sense of the resulting avalancheof sensor data?50 Bi...
Parkinson’s disease (PD) data from The Michael J. Fox Foundationfor Parkinson’s Research.391https://www.kaggle.com/c/predi...
40Big Data to Smart Data Using a Knowledge Based ApproachParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)Parki...
• 25 million people in the U.S. are diagnosed withasthma (7 million are children)1.• 300 million people suffering from ast...
Asthma is a multifactorial disease with health signals spanningpersonal, public health, and population levels.42Real-time ...
43Population LevelPersonalPublic HealthVariety: Health signals span heterogeneous sourcesVolume: Health signals are fine g...
44Sensordrone – for monitoringenvironmental air qualityWheezometer – for monitoringwheezing soundsCan I reduce my asthma a...
Personal, Public Health, and Population Level Signals for Monitoring AsthmaAsthma Control => Daily MedicationChoices for s...
46PersonalLevel SignalsSocietal LevelSignals(Personal Level Signals)(PersonalizedSocietal Level Signal)(Societal Level Sig...
47Population LevelPersonalWheeze – YesDo you have tightness of chest? –YesObservationsPhysical-Cyber-Social System Health ...
… and do it efficiently and at scaleWhat if we could automate thissense making ability?48
People are good at making sense of sensory inputWhat can we learn from cognitive models of perception?• The key ingredient...
* based on Neisser’s cognitive model of perceptionObservePropertyPerceiveFeatureExplanationDiscrimination12Perception Cycl...
To enable machine perception,Semantic Web technology is used to integratesensor data with prior knowledge on the Web51
Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph52
Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph53
ObservePropertyPerceiveFeatureExplanation1Translating low-level signalsinto high-level knowledgeExplanationExplanation is ...
ExplanationInference to the best explanation• In general, explanation is an abductive problem; andhard to computeFinding t...
ExplanationExplanatory Feature: a feature that explains the set of observed propertiesExplanatoryFeature ≡ ∃ssn:isProperty...
Discrimination is the act of finding those properties that, if observed, would help distinguishbetween multiple explanator...
DiscriminationExpected Property: would be explained by every explanatory featureExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ...
DiscriminationNot Applicable Property: would not be explained by any explanatory featureNotApplicableProperty ≡ ¬∃ssn:isPr...
DiscriminationDiscriminating Property: is neither expected nor not-applicableDiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ...
Through physical monitoring andanalysis, our cellphones could act asan early warning system to detectserious health condit...
Qualities-High BP-Increased WeightEntities-Hypertension-HypothyroidismkHealthMachine SensorsPersonal InputEMR/PHRComorbidi...
How do we implement machine perception efficiently on aresource-constrained device?Use of OWL reasoner is resource intensi...
intelligence at the edgeApproach 1: Send all sensor observationsto the cloud for processingApproach 2: downscale semanticp...
Efficient execution of machine perceptionUse bit vector encodings and their operations to encode prior knowledge andexecut...
O(n3) < x < O(n4) O(n)Efficiency Improvement• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minut...
2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized andintegrated with p...
• Real Time Feature Streams:http://www.youtube.com/watch?v=_ews4w_eCpg• kHealth: http://www.youtube.com/watch?v=btnRi64hJp...
73Smart Data in Social Media AnalyticsTo Understand thehuman socialdynamics in realworld events
0.5B Tweets per day0.5B Users60% on Mobile5530 Tweets per secondrelated to the Japan earthquake and tsunami17000 Tweetsper...
75http://usatoday30.usatoday.com/news/politics/twitter-election-meterhttp://twitris.knoesis.org/
State of the Art – Uni/Bi Dimensional Analysis During ElectionsTopicsSentiments76
Twitris’ Dimensions of Integrated Semantic Analysis77Sheth et al. Twitris- a System for Collective Social Intelligence, ES...
78http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/...
79[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]/t
80Twitris: Sentiment Analysis- Smart Answers with reasoning!How was Obama doing in the first debate?
81Red Color: Negative TopicsGreen Color: Positive TopicsTwitris: Sentiment Analysis- Smart Answers with reasoning!How was ...
Top 100 influential users thattalks about Barack ObamaPositive or NegativeInfluenceTwitris: Network AnalysisSMART DATA TEL...
Twitris: Community EvolutionSMART DATA FOCUSES ON THE CAUSALITYOF CHANGES IN REAL-WORLD ACTIONS!RomneyObamaEvolution of in...
The Dead People mentionedin the event OWCTwitris: Impact of Background Knowledge84
How People from Differentparts of the world talkedabout US ElectionImages and VideosRelated to US ElectionTwitris: Analysi...
What is Smart Data in the context ofDisaster ManagementACTIONABLE: Timely delivery ofright resources and information tothe...
Join us for the SocialGood!http://twitris.knoesis.orgRT @OpOKRelief:Southgate Baptist Churchon 4th Street in Moorehasfood,...
Smart Data from Twitris system forDisaster Response CoordinationWhich are the primary locations withmost negative sentimen...
Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89Disaster Resp...
Disaster Response Coordination:Twitris Summary for Actionable Nuggets90Important tags tosummarize Big Data flowRelated to ...
91Disaster Response Coordination:Twitris Real-time information for needsIncoming Tweets with needtypes to give quick idea ...
92Disaster Response Coordination:Influencers to engage with for specific needsInfluential users are respectiveneeds and th...
Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013- 1.3% as the highly precise d...
• Features driven by the experience of domain experts at theresponder organizations• Examples,– ‘I want to <donate/ help/ ...
• A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching forneeds (query) vs. offers (doc...
96Disaster Response Coordination:Engagement Interface for respondersWhat-Where-How-Who-WhyCoordinationInfluential users to...
• Illustrious scenario: #Oklahoma-tornado 201397Disaster Response Coordination:Anecdote for the value of Smart DataFEMA as...
An event is a dynamic topic that evolves andmight later fork into several distinct events.Smart Data analytics to capture ...
Continuous Semantics99
Dynamic Model CreationContinuous Semantics 100
Dynamic Model Creation:101Example of how background knowledge helpunderstand situation described in the tweets, whilealso ...
How is Continuous Semantics a form ofSmart Data Analytics?Keeping the Background Knowledgeabreast with the changes of the ...
103Smart Data Analytics in Traffic ManagementTo improve theeveryday lifeentangled dueto our mostcommonproblem ofsticking i...
By 2001 over 285 million Indians lived in cities, more than in allNorth American cities combined (Office of the Registrar ...
Vehicular traffic data from San Francisco Bay Area aggregated from on-roadsensors (numerical) and incident reports (textua...
Slow movingtrafficLinkDescriptionScheduledEventScheduledEvent511.org511.orgSchedule Information511.orgTraffic Monitoring10...
107Heterogeneity in a Physical-Cyber-Social System
• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):– Scheduled Events: music events, fair, th...
• Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variablesacross...
AccidentMusic eventSporting eventRoad WorkTheatre eventExternal events<ActiveEvents, ScheduledEvents>Internal observations...
Domain ExpertscoldPoorVisibilitySlowTrafficIcyRoadDeclarative domain knowledgeCausalknowledgeLinked Open DataCold (YES/NO)...
• Declarative knowledge about various domainsare increasingly being published on the web1,2.• Declarative knowledge descri...
http://conceptnet5.media.mit.edu/web/c/en/traffic_jamDelaygo to baseball gametraffic jamtraffic accidenttraffic jamActiveE...
Traffic jamLinkDescriptionScheduledEventtraffic jambaseball gameAdd missing random variablesTime of daybad weather Capable...
116Scheduled EventActive EventDay of week Time of daydelayTravel timespeedvolumeStructure extracted formtraffic observatio...
Take Away• It is all about the human – not computing, notdevice– Computing for human experience• Whatever we do in Smart D...
Acknowledgements• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…• Note:• For images and sources, if not on slides, pleas...
• OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase• Vision: http://knoesis.org/node/266• P...
Thanks …121
122Physical Cyber Social ComputingAmit Sheth, Kno.e.sis, Wright State
Amit Sheth’sPHD studentsAshutosh JadhavHemantPurohitVinhNguyenLu ChenPavanKapanipathiPramodAnantharamSujanPereraAlan Smith...
124thank you, and please visit us athttp://knoesis.orgKno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing...
Upcoming SlideShare
Loading in …5
×

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

18,573 views

Published on

See instead more recent version (ICDE2014 keynote): http://j.mp/ICDE-key
A video of a version of this talk: http://youtu.be/8RhpFlfpJ-A


Amit Sheth, "Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web," keynote at the 21st Italian Symposium on Advanced Database Systems,
June 30 - July 03 2013, Roccella Jonica, Italy. Also invited talks given in Universities in Spain and Italy in June 2013.

Highlight: How to harness Smart Data that is actionable, from the Voluminous Big Data with Velocity and Variety-- using Semantics and the Semantic Web core to bring Human-Centric Computing in practice.

Abstract from: http://www.sebd2013.unirc.it/invitedSpeakers.html

Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics , which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.

Additional background at: http://knoesis.org/vision > SmartData and "Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications," http://www.knoesis.org/library/resource.php?id=1889 .

Published in: Technology, Education

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

  1. 1. 1
  2. 2. 2011How much data?48(2013)500(2013)2http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  3. 3. 1% of the data isused for analysis.3http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  4. 4. VarietySemi structured4
  5. 5. VelocityFast DataRapid ChangesReal-Time/Stream AnalysisCurrent application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
  6. 6. • Focus on verticals: advertising‚ social media‚ retail‚financial services‚ telecom‚ and healthcare– Aggregate data, focused on transactions, limitedintegration (limited complexity), analytics to find(simple) patterns– Emphasis on technologies to handlevolume/scale, and to lesser extent velocity:Hadoop, NoSQL,MPP warehouse ….– Full faith in the power of data (nohypothesis), bottom up analysis6Current Focus on Big Data
  7. 7. • What if your data volume gets so large andvaried you dont know how to deal with it?• Do you store all your data?• Do you analyze it all?• How can you find out which data points arereally important?• How can you use it to your best advantage?7Questions typically asked on Big Datahttp://www.sas.com/big-data/
  8. 8. http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/Variety of Data Analytics Enablers8
  9. 9. • Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematicalmodels to test the search terms, comparing their predictions againstthe actual flu cases; 45 important parameters were founds– Model was tested when H1N1 crisis struck in 2009 and gave moremeaningful and valuable real time information than any public healthofficial system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]• FareCast: predict the direction of air fares over differentroutes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]• NY city manholes problem [ICML Discussion, 2012]9Illustrative Big Data Applications
  10. 10. • Current focus mainly to serve business intelligence and targetedanalytics needs, not to serve complex individual and collectivehuman needs (e.g., empower human in health, fitness and well-being; better disaster coordination) that is highlypersonalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory natureof real-world and human perception– Need deeper understanding of data and its role to information (e.g., skew,coverage)• Human involvement and guidance: Leading to actionableinformation, understanding and insight right in the context ofhuman activities– Bottom-up & Top-down processing: Infusion of models and backgroundknowledge (data + knowledge + reasoning)10What is missing?
  11. 11. Makes SenseActionable or help decision support/making11
  12. 12. Smart DataSmart data makes sense out of Big dataIt provides value from harnessing thechallenges posed byvolume, velocity, variety and veracity of bigdata, in-turn providing actionableinformation and improve decisionmaking.12
  13. 13. “OF human, BY human and FOR human”Smart data is focused on the actionablevalue achieved by human involvement indata creation, processing and consumptionphases for improvingthe human experience.Another perspective on Smart Data13
  14. 14. DescriptiveExploratoryInferentialPredictiveCausalImprovedAnalytics CREATIONPROCESSINGEXPERIENCE& DECISIONMAKING14Human Centric Computing
  15. 15. “OF human, BY human and FOR human”Another perspective on Smart Data15
  16. 16. Petabytes of Physical(sensory)-Cyber-Social Data everyday!More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16‘OF human’ : Relevant Real-time DataStreams for Human Experience
  17. 17. “OF human, BY human and FOR human”17Another perspective on Smart Data
  18. 18. Use of Prior Human-created Knowledge Models18‘BY human’: InvolvingCrowd Intelligence in data processing workflowsCrowdsourcing and Domain-expert guidedMachine Learning Modeling
  19. 19. “OF human, BY human and FOR human”Another perspective on Smart Data19
  20. 20. Detection of events, such as wheezingsound, indoortemperature, humidity, dust, and CO2levelWeather ApplicationAsthma HealthcareApplicationClose the window at homeduring day to avoid CO2 ingush, to avoid asthma attacksat night20‘FOR human’ :Improving Human ExperiencePopulation LevelPersonalPublic HealthAction in the Physical World
  21. 21. 21Why do we care about Smart Datarather than Big Data?
  22. 22. Transforming Big Data into Smart Data:Deriving Value via harnessing Volume, Variety and Velocityusing semantics and Semantic WebPut Knoesis BannerKeynote at SEBD 2013, July 1, 2013 and invited talk in universities in Spain, June 2013.The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USAPavanKapanipathiPramodAnantharamAmit ShethCoryHensonDr. T.K.PrasadMaryamPanahiazarContributions by many, but Special Thanks to:HemantPurohit
  23. 23. Second-costliest hurricane in United Stateshistory estimated damage $75 billion90-115 mph windsState of Emergency in New York285 people killed on the track of Sandy750,000 without power (NY)Immense devastation and Human suffering23Big Data to Smart Data: Disaster Management examplehttp://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
  24. 24. 20 million tweets with “sandy, hurricane”keywords between Oct 27th and Nov 1st2nd most popular topic on Facebook during 2012Social (Big) Data during Hurricane Sandy24• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html• http://mashable.com/2012/10/31/hurricane-sandy-facebook/
  25. 25. For information seekingFor timely informationFor unique informationFor unfiltered informationTo determine disaster magnitudeTo check in with family and friendsTo self-mobilizeTo maintain a sense of communityTo seek emotional support and healingGovernmentsEmergency managementorganizationsJournalistsDisaster respondersPublicBIG DATA TO SMART DATA: WHY? and FOR WHOM?25Fraustino et al. Social Media Useduring Disasters: A Review of theKnowledge Base and Gaps. US Dept.of Homeland Security, START 2012.
  26. 26. Improving situational awareness- Timely delivery of necessaryinformation to the right peopleImproving coordination betweenresource seekers and suppliersDetecting the magnitude ofdisaster by people sentiments.Many more challenges…Can SNS’s make Disaster Management easier –Giving Actionable Information (Smart Data)26http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rechttp://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.htmlhttp://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
  27. 27. VolumeTwitter hits half a billion tweets a day!ChallengesDelivering the necessaryactionable/information to the right people27http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
  28. 28. VelocityVolume@ConEdison Twitter handle that the company had onlyset up in June gained an extra 16,000 followers over thestorm. – Did the information reach everyone?ChallengesDelivering the necessary/actionableinformation to the right peopleRate of Data ArrivalApproximately 7000 TPS10 images per second on instagram28http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_UShttp://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
  29. 29. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USVelocityVarietyVolumeSemi StructuredStructuredUnstructuredSensorsLinked Open DataWikipediaChallengesDelivering the necessary/actionableinformation to the right people29
  30. 30. VelocityVarietyVeracityVolumeChallengesDelivering the necessary/actionableinformation to the right people30http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
  31. 31. VelocityVarietyVeracityVolume31
  32. 32. Value-Makes Sense-Actionable Information-Decision support/makingData http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 32Smart Datafocuses on thevalue
  33. 33. Value-Makes Sense-Actionable Information-Decision support/makingDisaster ManagementVictimsTimely and Contextual Information about• Electricity, Food, Water, Shelter anddonation offers related to the disaster.Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 33
  34. 34. DescriptiveExploratoryInferentialPredictiveCausalHuman Centric ComputingImprovedAnalytics CreationProcessingExperience34Revisiting..
  35. 35. • Healthcare– kHealth– SemHeath• Social event coordination– Twitris• Traffic monitoring– kTraffic35Applications of Smart Data Analytics
  36. 36. The Patient of the FutureMIT Technology Review, 2012http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 36
  37. 37. To gain new insight inpatient care &early indications ofdisease37Smart Data in Healthcare
  38. 38. Sensing is a key enabler of the Internet of ThingsBUT, how do we make sense of the resulting avalancheof sensor data?50 Billion Things by 2020 (Cisco)38
  39. 39. Parkinson’s disease (PD) data from The Michael J. Fox Foundationfor Parkinson’s Research.391https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data8 weeks of data from 5 sensors on a smart phone, collected for 16 patientsresulting in ~12 GB (with lot of missing data).Variety VolumeVeracityVelocityValueCan we detect the onset of Parkinson’s disease?Can we characterize the disease progression?Can we provide actionable information to the patient?semanticsRepresenting prior knowledge of PDled to a focused exploration of thismassive datasetWHY Big Data to Smart Data: Healthcare example
  40. 40. 40Big Data to Smart Data Using a Knowledge Based ApproachParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)ParkinsonAdvanced(person) = Fall(person)Control Group PD PatientsMovements of an activeperson has a gooddistribution over X, Y, andZ axisRestricted movements bya PD patient can be seenin the accelerationreadingsAudio is well modulatedwith good variations inthe energy of the voiceAudio is not wellmodulated represented amonotone speechDeclarative Knowledge ofParkinson’s Disease used to focusour attention on symptommanifestations in sensorobservations
  41. 41. • 25 million people in the U.S. are diagnosed withasthma (7 million are children)1.• 300 million people suffering from asthmaworldwide2.• Asthma related healthcare costs alone are around$50 billion a year2.• 155,000 hospital admissions and 593,000 emergencydepartment visits in 20063.411http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.Asthma: Severity of the problem
  42. 42. Asthma is a multifactorial disease with health signals spanningpersonal, public health, and population levels.42Real-time health signals from personal level (e.g., Wheezometer, NO inbreath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), andpopulation level (e.g., pollen level, CO2) arriving continuously in fine grainedsamples potentially with missing information and uneven sampling frequencies.Variety VolumeVeracityVelocityValueCan we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?semanticsUnderstanding relationships betweenhealth signals and asthma attacksfor providing actionable informationWHY Big Data to Smart Data: Healthcare example
  43. 43. 43Population LevelPersonalPublic HealthVariety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromisedValue: Can I reduce my asthma attacks at night?Decision support to doctorsby providing them withdeeper insights into patientasthma careAsthma: Demonstration of Value
  44. 44. 44Sensordrone – for monitoringenvironmental air qualityWheezometer – for monitoringwheezing soundsCan I reduce my asthma attacks at night?What are the triggers?What is the wheezing level?What is the propensity toward asthma?What is the exposure level over a day?What is the air quality indoors?Commute to WorkPersonalPublic HealthPopulation LevelClosing the window at homein the morning and taking analternate route to office maylead to reduced asthma attacksActionableInformationAsthma: Actionable Information for Asthma Patients
  45. 45. Personal, Public Health, and Population Level Signals for Monitoring AsthmaAsthma Control => Daily MedicationChoices for startingtherapyNot Well Controlled Poor ControlledSeverity Level ofAsthma(Recommended Action) (Recommended Action) (Recommended Action)Intermittent Asthma SABA prn - -Mild Persistent Asthma Low dose ICS Medium ICS Medium ICSModerate PersistentAsthmaMedium dose ICS aloneOr withLABA/montelukastMedium ICS +LABA/MontelukastOr High dose ICSMedium ICS +LABA/MontelukastOr High dose ICS*Severe Persistent Asthma High dose ICS withLABA/montelukastNeeds specialist care Needs specialist careICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ;*consider referral to specialistAsthma Controland Actionable InformationSensors and their observationsfor understanding asthma45
  46. 46. 46PersonalLevel SignalsSocietal LevelSignals(Personal Level Signals)(PersonalizedSocietal Level Signal)(Societal Level Signals)Societal Level SignalsRelevant to thePersonal LevelPersonal Level Sensors(kHealth**) (EventShop*)Qualify QuantifyActionRecommendationWhat are the features influencing my asthma?What is the contribution of each of these features?How controlled is my asthma? (risk score)What will be my action plan to manage asthma?StorageSocietal Level SensorsAsthma Early Warning Model (AEWM)Query AEWMVerify & augmentdomain knowledgeRecommendedActionActionJustificationAsthma Early Warning Model*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
  47. 47. 47Population LevelPersonalWheeze – YesDo you have tightness of chest? –YesObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding<Wheezing=Yes, time, location><ChectTightness=Yes, time, location><PollenLevel=Medium, time, location><Pollution=Yes, time, location><Activity=High, time, location>WheezingChectTightnessPollenLevelPollutionActivityWheezingChectTightnessPollenLevelPollutionActivityRiskCategory<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>...ExpertKnowledgeBackgroundKnowledgetweet reporting pollution leveland asthma attacksAcceleration readings fromon-phone sensorsSensor and personalobservationsSignals from personal, personalspaces, and community spacesRisk Category assigned bydoctorsQualifyQuantifyEnrichOutdoor pollen and pollutionPublic HealthHealth Signal Extraction to UnderstandingWell Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor
  48. 48. … and do it efficiently and at scaleWhat if we could automate thissense making ability?48
  49. 49. People are good at making sense of sensory inputWhat can we learn from cognitive models of perception?• The key ingredient is prior knowledge49
  50. 50. * based on Neisser’s cognitive model of perceptionObservePropertyPerceiveFeatureExplanationDiscrimination12Perception Cycle*Translating low-level signalsinto high-level knowledgeFocusing attention on thoseaspects of the environment thatprovide useful informationPrior Knowledge50
  51. 51. To enable machine perception,Semantic Web technology is used to integratesensor data with prior knowledge on the Web51
  52. 52. Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph52
  53. 53. Prior knowledge on the WebW3C Semantic SensorNetwork (SSN) Ontology Bi-partite Graph53
  54. 54. ObservePropertyPerceiveFeatureExplanation1Translating low-level signalsinto high-level knowledgeExplanationExplanation is the act of choosing the objects or events that best account for aset of observations; often referred to as hypothesis building54
  55. 55. ExplanationInference to the best explanation• In general, explanation is an abductive problem; andhard to computeFinding the sweet spot between abduction and OWL• Single-feature assumption* enables use of OWL-DLdeductive reasoner* An explanation must be a single feature which accounts forall observed propertiesExplanation is the act of choosing the objects or events that best account for a set ofobservations; often referred to as hypothesis building55
  56. 56. ExplanationExplanatory Feature: a feature that explains the set of observed propertiesExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaObserved Property Explanatory Feature56
  57. 57. Discrimination is the act of finding those properties that, if observed, would help distinguishbetween multiple explanatory featuresObservePropertyPerceiveFeatureExplanationDiscrimination2Focusing attention on thoseaspects of the environment thatprovide useful informationDiscrimination57
  58. 58. DiscriminationExpected Property: would be explained by every explanatory featureExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaExpected Property Explanatory Feature58
  59. 59. DiscriminationNot Applicable Property: would not be explained by any explanatory featureNotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaNot Applicable Property Explanatory Feature59
  60. 60. DiscriminationDiscriminating Property: is neither expected nor not-applicableDiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicablePropertyelevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaDiscriminating Property Explanatory Feature60
  61. 61. Through physical monitoring andanalysis, our cellphones could act asan early warning system to detectserious health conditions, andprovide actionable informationcanary in a coal mineOur MotivationkHealth: knowledge-enabled healthcare61
  62. 62. Qualities-High BP-Increased WeightEntities-Hypertension-HypothyroidismkHealthMachine SensorsPersonal InputEMR/PHRComorbidity risk scoree.g., Charlson IndexLongitudinal studies ofcardiovascular risks- Find correlations- Validation- domain knowledge- domain expertParameterize themodelRisk Assessment ModelCurrent Observations-Physical-Physiological-HistoryRisk Score(Actionable Information)Model CreationValidate correlationsHistorical observationsof each patientRisk Score: from Data to Abstraction and Actionable Information62
  63. 63. How do we implement machine perception efficiently on aresource-constrained device?Use of OWL reasoner is resource intensive(especially on resource-constrained devices),in terms of both memory and time• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)63
  64. 64. intelligence at the edgeApproach 1: Send all sensor observationsto the cloud for processingApproach 2: downscale semanticprocessing so that each device is capableof machine perception64Henson et al. An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-ConstrainedDevices, ISWC 2012.
  65. 65. Efficient execution of machine perceptionUse bit vector encodings and their operations to encode prior knowledge andexecute semantic reasoning010110001101001111001010110001101101101011000110100111100101011000110101100011010011165
  66. 66. O(n3) < x < O(n4) O(n)Efficiency Improvement• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to linearEvaluation on a mobile device66
  67. 67. 2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized andintegrated with prior knowledge on the Web3 Intelligence at the edgeBy downscaling semantic inference, machine perception canexecute efficiently on resource-constrained devicesSemantic Perception for smarter analytics: 3 ideas to takeaway1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensorysignals into high-level knowledge useful for decision making67
  68. 68. • Real Time Feature Streams:http://www.youtube.com/watch?v=_ews4w_eCpg• kHealth: http://www.youtube.com/watch?v=btnRi64hJp468Demos
  69. 69. 73Smart Data in Social Media AnalyticsTo Understand thehuman socialdynamics in realworld events
  70. 70. 0.5B Tweets per day0.5B Users60% on Mobile5530 Tweets per secondrelated to the Japan earthquake and tsunami17000 Tweetsper second74Twitter During Real-world Events of Interesthttp://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  71. 71. 75http://usatoday30.usatoday.com/news/politics/twitter-election-meterhttp://twitris.knoesis.org/
  72. 72. State of the Art – Uni/Bi Dimensional Analysis During ElectionsTopicsSentiments76
  73. 73. Twitris’ Dimensions of Integrated Semantic Analysis77Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
  74. 74. 78http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
  75. 75. 79[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]/t
  76. 76. 80Twitris: Sentiment Analysis- Smart Answers with reasoning!How was Obama doing in the first debate?
  77. 77. 81Red Color: Negative TopicsGreen Color: Positive TopicsTwitris: Sentiment Analysis- Smart Answers with reasoning!How was Obama doing in the second debate?SMART DATA IS ABOUT ANALYSIS FOR REASONING(what caused the positive sentiment for Democrats)BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)http://knoesis.wright.edu/library/resource.php?id=1787
  78. 78. Top 100 influential users thattalks about Barack ObamaPositive or NegativeInfluenceTwitris: Network AnalysisSMART DATA TELLS YOU HOW CAN A SYSTEM BETWEAKED FOR THE DESIRED ACTIONS!Could we engage with users (targeted) with extremepolarity leaning for Obama to spark an agenda in the wholenetwork of voters (ACTION)? 82
  79. 79. Twitris: Community EvolutionSMART DATA FOCUSES ON THE CAUSALITYOF CHANGES IN REAL-WORLD ACTIONS!RomneyObamaEvolution of influencer interaction networks for Romney vs. Obamatopical communities, during U.S. Presidential Election 2012 debatesBefore 1stdebateAfter 1stdebateAfterHurricane SandyAfter 3rddebate83
  80. 80. The Dead People mentionedin the event OWCTwitris: Impact of Background Knowledge84
  81. 81. How People from Differentparts of the world talkedabout US ElectionImages and VideosRelated to US ElectionTwitris: Analysis by Location85
  82. 82. What is Smart Data in the context ofDisaster ManagementACTIONABLE: Timely delivery ofright resources and information tothe right people at right location!86Because everyone wants to Help, but DON’T KNOW HOW!
  83. 83. Join us for the SocialGood!http://twitris.knoesis.orgRT @OpOKRelief:Southgate Baptist Churchon 4th Street in Moorehasfood, water, clothes, diapers, toys, and more. Ifyou cant go,call 794Text "FOOD" to32333, REDCROSS to90999, or STORM to80888 to donate $10in storm relief.#moore #oklahoma#disasterrelief#donateWant to help animals in#Oklahoma? @ASPCA tellshow you can help:http://t.co/mt8l9PwzmOCITIZEN SENSORSRESPONSE TEAMS(including humanitarianorg. and ‘pseudo’ responders)VICTIM SITECoordination ofneeds and offersUsing Social MediaDoes anyoneknow where tosend a check todonate to thetornadovictims?Where do I goto help out forvolunteer workaround Moore?Anyone know?Anyone knowwhere to donateto help theanimals from theOklahomadisaster? #oklahoma #dogsMatchedMatchedMatchedServing the need!If you would like to volunteertoday, help is desperatelyneeded in Shawnee. Call273-5331 for more infohttp://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-1153161287Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress:
  84. 84. Smart Data from Twitris system forDisaster Response CoordinationWhich are the primary locations withmost negative sentiments/emotions?Who are all the people to engagewith for better informationdiffusion?Which are the most importantorganizations acting at mylocation?Smart data provides actionable information and improve decision making throughsemantic analysis of Big Data.Who are the resource seekers andsuppliers? How can one donate?88
  85. 85. Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89Disaster Response Coordination Framework
  86. 86. Disaster Response Coordination:Twitris Summary for Actionable Nuggets90Important tags tosummarize Big Data flowRelated to OklahomatornadoImages and Videos Relatedto Oklahoma tornado
  87. 87. 91Disaster Response Coordination:Twitris Real-time information for needsIncoming Tweets with needtypes to give quick idea ofwhat is needed and wherecurrently #OKCLegends for Differentneeds #OKC(It is real-time widget for monitoring of needs, so will not be active after the event has passed)http://twitris.knoesis.org/oklahomatornado
  88. 88. 92Disaster Response Coordination:Influencers to engage with for specific needsInfluential users are respectiveneeds and their interactionnetwork on the right.
  89. 89. Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013- 1.3% as the highly precise donation requests to help- 0.02% as the highly precise donation offers to help93• Anyone know how to get involved tohelp the tornado victims inOklahoma??#tornado #oklahomacity(OFFER)• I want to donate to the Oklahoma causeshoes clothes even food if I can (OFFER)Disaster Response Coordination:Finding Actionable Nuggets for Responders to act• Text REDCROSS to 909-99 to donate tothose impacted by the Moore tornado!http://t.co/oQMljkicPs (REQUEST)• Please donate to Oklahoma disasterrelief efforts.: http://t.co/crRvLAaHtk(REQUEST)For responders, most important information is the scarcity andavailability of resources, can we mine it via Social Media?
  90. 90. • Features driven by the experience of domain experts at theresponder organizations• Examples,– ‘I want to <donate/ help/ bring>’ for extraction of offeringintention– ‘tent house’ OR ‘cots’ for shelter need types94Disaster Response Coordination:Human Knowledge to drive information extraction
  91. 91. • A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching forneeds (query) vs. offers (documents)• Example,– @bladesofmilford please help get the word out,we are accepting kid clothes to sendto the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)95Disaster Response Coordination:Automatic Matching of needs and offersMatching thecompetitive intentions(Needs and Offers) canoffload humans for thetask of resourcematchmaking forcoordination.
  92. 92. 96Disaster Response Coordination:Engagement Interface for respondersWhat-Where-How-Who-WhyCoordinationInfluential users to engagewith and resources forseekers/supplies at alocation, at a timestampContextualInformation for achosen topical tags
  93. 93. • Illustrious scenario: #Oklahoma-tornado 201397Disaster Response Coordination:Anecdote for the value of Smart DataFEMA asked us to quickly filterout gas-leak related dataMining the data for smart nuggetsto inform FEMA (Timely needs)Engaged with the author of thisinformation to confirm (Veracity)e.g., All gas leaks in #moore were capped and stopped by11:30 last night (at 5/22/2013 1:41:37)Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)e.g., I want to go to Oklahoma this weekend & do what i can to help those people withfood,cloths & supplies,im in the feel of wanting to help ! :)
  94. 94. An event is a dynamic topic that evolves andmight later fork into several distinct events.Smart Data analytics to capture rapidly evolving social data events98Social Media is the pulse of thepopulace, a true reflection ofevents all over the globe!
  95. 95. Continuous Semantics99
  96. 96. Dynamic Model CreationContinuous Semantics 100
  97. 97. Dynamic Model Creation:101Example of how background knowledge helpunderstand situation described in the tweets, whilealso updating knowledge model also
  98. 98. How is Continuous Semantics a form ofSmart Data Analytics?Keeping the Background Knowledgeabreast with the changes of the eventSmartly learning and adapting data acquisition(Temporally apt Big Data, i.e. Fast Data)In-turn providing temporally relevantSmart Data through analysis102
  99. 99. 103Smart Data Analytics in Traffic ManagementTo improve theeveryday lifeentangled dueto our mostcommonproblem ofsticking intraffic
  100. 100. By 2001 over 285 million Indians lived in cities, more than in allNorth American cities combined (Office of the Registrar General of India 2001)11The Crisis of Public Transport in India2IBM Smarter TrafficModes of transportation in Indian CitiesTexas Transportation Institute (TTI)Congestion report in U.S.104Severity of the Traffic Problem
  101. 101. Vehicular traffic data from San Francisco Bay Area aggregated from on-roadsensors (numerical) and incident reports (textual)105http://511.org/Every minute update of speed, volume, travel time, and occupancy resulting in178 million link status observations, 738 active events, and 146 scheduledevents with many unevenly sampled observations collected over 3 months.Variety VolumeVeracityVelocityValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we provide actionable information to decision makers?semanticsRepresenting prior knowledge oftraffic lead to a focused explorationof this massive datasetBig Data to Smart Data: Traffic Management example
  102. 102. Slow movingtrafficLinkDescriptionScheduledEventScheduledEvent511.org511.orgSchedule Information511.orgTraffic Monitoring106Heterogeneity in a Physical-Cyber-Social System
  103. 103. 107Heterogeneity in a Physical-Cyber-Social System
  104. 104. • Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):– Scheduled Events: music events, fair, theatre events, concerts, roadwork, repairs, etc.– Active Events: accidents, disabled vehicles, break down ofroads/bridges, fire, bad weather, etc.– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm• Each of these events may have a varying impact on traffic.• A delay prediction algorithm should process multimodal andmulti-sensory observations.Uncertainty in a Physical-Cyber-Social System108
  105. 105. • Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variablesacross different parts of the network• External events– Accident, music event, sporting event, andplanned events– External events and internal observations mayexhibit correlationsModeling Traffic Events109
  106. 106. AccidentMusic eventSporting eventRoad WorkTheatre eventExternal events<ActiveEvents, ScheduledEvents>Internal observations<speed, volume, traveTime>WeatherTime of DayModeling Traffic Events110
  107. 107. Domain ExpertscoldPoorVisibilitySlowTrafficIcyRoadDeclarative domain knowledgeCausalknowledgeLinked Open DataCold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)1 0 1 11 1 1 01 1 1 11 0 1 0Domain ObservationsDomain KnowledgeStructure and parametersComplementing Probabilistic Models with Declarative Knowledge112Correlations to causations usingDeclarative knowledge on theSemantic Web
  108. 108. • Declarative knowledge about various domainsare increasingly being published on the web1,2.• Declarative knowledge describes concepts andrelationships in a domain (structure).• Linked Open Data may be used to derivepriors probability of events (parameters).• Explored the use declarative knowledge forstructure using ConceptNet 5.1http://conceptnet5.media.mit.edu/2http://linkeddata.org/Domain Knowledge113
  109. 109. http://conceptnet5.media.mit.edu/web/c/en/traffic_jamDelaygo to baseball gametraffic jamtraffic accidenttraffic jamActiveEventScheduledEventCausestraffic jamCausestraffic jamCapableOfslow trafficCapableOfoccur twice each dayCausesis_abad weatherCapableOfslow trafficroad iceCausesaccidentTimeOfDaygo to concertHasSubeventcar crashaccidentRelatedTocar crashBadWeatherCausesCausesis_ais_ais_a is_a is_ais_ais_aConceptNet 5114
  110. 110. Traffic jamLinkDescriptionScheduledEventtraffic jambaseball gameAdd missing random variablesTime of daybad weather CapableOf slow trafficbad weatherTraffic data from sensors deployed on roadnetwork in San Francisco Bay Areatime of daytraffic jambaseball gametime of dayslow trafficThree Operations: Complementing graphical model structure extractionAdd missing links bad weathertraffic jambaseball gametime of dayslow trafficAdd link directionbad weathertraffic jambaseball gametime of dayslow trafficgo to baseball game Causes traffic jamKnowledge from ConceptNet5traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic115
  111. 111. 116Scheduled EventActive EventDay of week Time of daydelayTravel timespeedvolumeStructure extracted formtraffic observations(sensors + textual) usingstatistical techniquesScheduled EventActive EventDay of weekTime of daydelayTravel timespeedvolumeBad WeatherEnriched structure which haslink directions and new nodessuch as “Bad Weather”potentially leading to betterdelay predictionsEnriched Probabilistic Models using ConceptNet 5
  112. 112. Take Away• It is all about the human – not computing, notdevice– Computing for human experience• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human– But in serving human needs, there is a lot more thanwhat current big data analytics handle –variety, contextual, personalized, subjective, spanningdata and knowledge across P-C-S dimensions118
  113. 113. Acknowledgements• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belongto their respective owners, we are grateful to the owners for usefulness of theseimages in our context.119
  114. 114. • OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase• Vision: http://knoesis.org/node/266• Publications: http://knoesis.org/library120References and Further Readings
  115. 115. Thanks …121
  116. 116. 122Physical Cyber Social ComputingAmit Sheth, Kno.e.sis, Wright State
  117. 117. Amit Sheth’sPHD studentsAshutosh JadhavHemantPurohitVinhNguyenLu ChenPavanKapanipathiPramodAnantharamSujanPereraAlan SmithPramod KoneruMaryam PanahiazarSarasi LalithsenaCory HensonKalpaGunaratnaDelroyCameronSanjayaWijeratneWenboWangKno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
  118. 118. 124thank you, and please visit us athttp://knoesis.orgKno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USASmart Data

×