SlideShare a Scribd company logo
1 of 118
1
2011
How much data?
48
(2013)
500
(2013)
2http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
1% of the data is
used for analysis.
3
http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode
http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
Variety
Semi structured
4
Velocity
Fast Data
Rapid Changes
Real-Time/Stream Analysis
Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
• Focus on verticals: advertising‚ social media‚ retail‚
financial services‚ telecom‚ and healthcare
– Aggregate data, focused on transactions, limited
integration (limited complexity), analytics to find
(simple) patterns
– Emphasis on technologies to handle
volume/scale, and to lesser extent velocity:
Hadoop, NoSQL,MPP warehouse ….
– Full faith in the power of data (no
hypothesis), bottom up analysis
6
Current Focus on Big Data
• What if your data volume gets so large and
varied you don't know how to deal with it?
• Do you store all your data?
• Do you analyze it all?
• How can you find out which data points are
really important?
• How can you use it to your best advantage?
7
Questions typically asked on Big Data
http://www.sas.com/big-data/
http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
Variety of Data Analytics Enablers
8
• Prediction of the spread of flu in real time during H1N1 2009
– Google tested a mammoth of 450 million different mathematical
models to test the search terms, comparing their predictions against
the actual flu cases; 45 important parameters were founds
– Model was tested when H1N1 crisis struck in 2009 and gave more
meaningful and valuable real time information than any public health
official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• FareCast: predict the direction of air fares over different
routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• NY city manholes problem [ICML Discussion, 2012]
9
Illustrative Big Data Applications
• Current focus mainly to serve business intelligence and targeted
analytics needs, not to serve complex individual and collective
human needs (e.g., empower human in health, fitness and well-
being; better disaster coordination) that is highly
personalized/individualized/contextualized
– Incorporate real-world complexity: multi-modal and multi-sensory nature
of real-world and human perception
– Need deeper understanding of data and its role to information (e.g., skew,
coverage)
• Human involvement and guidance: Leading to actionable
information, understanding and insight right in the context of
human activities
– Bottom-up & Top-down processing: Infusion of models and background
knowledge (data + knowledge + reasoning)
10
What is missing?
Makes Sense
Actionable or help decision support/making
11
Smart Data
Smart data makes sense out of Big data
It provides value from harnessing the
challenges posed by
volume, velocity, variety and veracity of big
data, in-turn providing actionable
information and improve decision
making.
12
“OF human, BY human and FOR human”
Smart data is focused on the actionable
value achieved by human involvement in
data creation, processing and consumption
phases for improving
the human experience.
Another perspective on Smart Data
13
Descriptive
Exploratory
Inferential
Predictive
Causal
Improved
Analytics CREATION
PROCESSING
EXPERIENCE
& DECISION
MAKING
14
Human Centric Computing
“OF human, BY human and FOR human”
Another perspective on Smart Data
15
Petabytes of Physical(sensory)-Cyber-Social Data everyday!
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16
‘OF human’ : Relevant Real-time Data
Streams for Human Experience
“OF human, BY human and FOR human”
17
Another perspective on Smart Data
Use of Prior Human-created Knowledge Models
18
‘BY human’: Involving
Crowd Intelligence in data processing workflows
Crowdsourcing and Domain-expert guided
Machine Learning Modeling
“OF human, BY human and FOR human”
Another perspective on Smart Data
19
Detection of events, such as wheezing
sound, indoor
temperature, humidity, dust, and CO2
level
Weather Application
Asthma Healthcare
Application
Close the window at home
during day to avoid CO2 in
gush, to avoid asthma attacks
at night
20
‘FOR human’ :
Improving Human Experience
Population Level
Personal
Public Health
Action in the Physical World
21
Why do we care about Smart Data
rather than Big Data?
Transforming Big Data into Smart Data:
Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web
Put Knoesis Banner
Keynote at SEBD 2013, July 1, 2013 and invited talk in universities in Spain, June 2013.
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA
Pavan
Kapanipathi
Pramod
Anantharam
Amit Sheth
Cory
Henson
Dr. T.K.
Prasad
Maryam
Panahiazar
Contributions by many, but Special Thanks to:
Hemant
Purohit
Second-costliest hurricane in United States
history estimated damage $75 billion
90-115 mph winds
State of Emergency in New York
285 people killed on the track of Sandy
750,000 without power (NY)
Immense devastation and Human suffering
23
Big Data to Smart Data: Disaster Management example
http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
20 million tweets with “sandy, hurricane”
keywords between Oct 27th and Nov 1st
2nd most popular topic on Facebook during 2012
Social (Big) Data during Hurricane Sandy
24
• http://www.guardian.co.uk/news/datablog/2
012/oct/31/twitter-sandy-flooding
• http://www.huffingtonpost.com/2012/11/02
/twitter-hurricane-sandy_n_2066281.html
• http://mashable.com/2012/10/31/hurricane-
sandy-facebook/
For information seeking
For timely information
For unique information
For unfiltered information
To determine disaster magnitude
To check in with family and friends
To self-mobilize
To maintain a sense of community
To seek emotional support and healing
Governments
Emergency management
organizations
Journalists
Disaster responders
Public
BIG DATA TO SMART DATA: WHY? and FOR WHOM?
25
Fraustino et al. Social Media Use
during Disasters: A Review of the
Knowledge Base and Gaps. US Dept.
of Homeland Security, START 2012.
Improving situational awareness
- Timely delivery of necessary
information to the right people
Improving coordination between
resource seekers and suppliers
Detecting the magnitude of
disaster by people sentiments.
Many more challenges…
Can SNS’s make Disaster Management easier –
Giving Actionable Information (Smart Data)
26
http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec
http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html
http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
Volume
Twitter hits half a billion tweets a day!
Challenges
Delivering the necessary
actionable/information to the right people
27
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/
http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
Velocity
Volume
@ConEdison Twitter handle that the company had only
set up in June gained an extra 16,000 followers over the
storm. – Did the information reach everyone?
Challenges
Delivering the necessary/actionable
information to the right people
Rate of Data Arrival
Approximately 7000 TPS
10 images per second on instagram
28
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/
http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-
tweets-a-day/
http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion
_accounts_140m_in_the_US
Velocity
Variety
Volume
Semi Structured
Structured
Unstructured
Sensors
Linked Open Data
Wikipedia
Challenges
Delivering the necessary/actionable
information to the right people
29
Velocity
Variety
Veracity
Volume
Challenges
Delivering the necessary/actionable
information to the right people
30http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
Velocity
Variety
Veracity
Volume
31
Value
-Makes Sense
-Actionable Information
-Decision support/making
Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 32
Smart Data
focuses on the
value
Value
-Makes Sense
-Actionable Information
-Decision support/making
Disaster Management
Victims
Timely and Contextual Information about
• Electricity, Food, Water, Shelter and
donation offers related to the disaster.
Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 33
Descriptive
Exploratory
Inferential
Predictive
Causal
Human Centric Computing
Improved
Analytics Creation
Processing
Experience
34
Revisiting..
• Healthcare
– kHealth
– SemHeath
• Social event coordination
– Twitris
• Traffic monitoring
– kTraffic
35
Applications of Smart Data Analytics
The Patient of the Future
MIT Technology Review, 2012
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 36
To gain new insight in
patient care &
early indications of
disease
37
Smart Data in Healthcare
Sensing is a key enabler of the Internet of Things
BUT, how do we make sense of the resulting avalanche
of sensor data?
50 Billion Things by 2020 (Cisco)
38
Parkinson’s disease (PD) data from The Michael J. Fox Foundation
for Parkinson’s Research.
39
1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data
8 weeks of data from 5 sensors on a smart phone, collected for 16 patients
resulting in ~12 GB (with lot of missing data).
Variety Volume
VeracityVelocity
Value
Can we detect the onset of Parkinson’s disease?
Can we characterize the disease progression?
Can we provide actionable information to the patient?
semantics
Representing prior knowledge of PD
led to a focused exploration of this
massive dataset
WHY Big Data to Smart Data: Healthcare example
40
Big Data to Smart Data Using a Knowledge Based Approach
ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)
ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)
ParkinsonAdvanced(person) = Fall(person)
Control Group PD Patients
Movements of an active
person has a good
distribution over X, Y, and
Z axis
Restricted movements by
a PD patient can be seen
in the acceleration
readings
Audio is well modulated
with good variations in
the energy of the voice
Audio is not well
modulated represented a
monotone speech
Declarative Knowledge of
Parkinson’s Disease used to focus
our attention on symptom
manifestations in sensor
observations
• 25 million people in the U.S. are diagnosed with
asthma (7 million are children)1.
• 300 million people suffering from asthma
worldwide2.
• Asthma related healthcare costs alone are around
$50 billion a year2.
• 155,000 hospital admissions and 593,000 emergency
department visits in 20063.
41
1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/
2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html
3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
Asthma: Severity of the problem
Asthma is a multifactorial disease with health signals spanning
personal, public health, and population levels.
42
Real-time health signals from personal level (e.g., Wheezometer, NO in
breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and
population level (e.g., pollen level, CO2) arriving continuously in fine grained
samples potentially with missing information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
Value
Can we detect the asthma severity level?
Can we characterize asthma control level?
What risk factors influence asthma control?
What is the contribution of each risk factor?semantics
Understanding relationships between
health signals and asthma attacks
for providing actionable information
WHY Big Data to Smart Data: Healthcare example
43
Population Level
Personal
Public Health
Variety: Health signals span heterogeneous sources
Volume: Health signals are fine grained
Velocity: Real-time change in situations
Veracity: Reliability of health signals may be compromised
Value: Can I reduce my asthma attacks at night?
Decision support to doctors
by providing them with
deeper insights into patient
asthma care
Asthma: Demonstration of Value
44
Sensordrone – for monitoring
environmental air quality
Wheezometer – for monitoring
wheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers?
What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
What is the air quality indoors?
Commute to Work
Personal
Public Health
Population Level
Closing the window at home
in the morning and taking an
alternate route to office may
lead to reduced asthma attacks
Actionable
Information
Asthma: Actionable Information for Asthma Patients
Personal, Public Health, and Population Level Signals for Monitoring Asthma
Asthma Control => Daily Medication
Choices for starting
therapy
Not Well Controlled Poor Controlled
Severity Level of
Asthma
(Recommended Action) (Recommended Action) (Recommended Action)
Intermittent Asthma SABA prn - -
Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS
Moderate Persistent
Asthma
Medium dose ICS alone
Or with
LABA/montelukast
Medium ICS +
LABA/Montelukast
Or High dose ICS
Medium ICS +
LABA/Montelukast
Or High dose ICS*
Severe Persistent Asthma High dose ICS with
LABA/montelukast
Needs specialist care Needs specialist care
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ;
*consider referral to specialist
Asthma Control
and Actionable Information
Sensors and their observations
for understanding asthma
45
46
Personal
Level Signals
Societal Level
Signals
(Personal Level Signals)
(Personalized
Societal Level Signal)
(Societal Level Signals)
Societal Level Signals
Relevant to the
Personal Level
Personal Level Sensors
(kHealth**) (EventShop*)
Qualify Quantify
Action
Recommendation
What are the features influencing my asthma?
What is the contribution of each of these features?
How controlled is my asthma? (risk score)
What will be my action plan to manage asthma?
Storage
Societal Level Sensors
Asthma Early Warning Model (AEWM)
Query AEWM
Verify & augment
domain knowledge
Recommended
Action
Action
Justification
Asthma Early Warning Model
*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
47
Population Level
Personal
Wheeze – Yes
Do you have tightness of chest? –Yes
ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,
Activity, Wheezing, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert
Knowledge
Background
Knowledge
tweet reporting pollution level
and asthma attacks
Acceleration readings from
on-phone sensors
Sensor and personal
observations
Signals from personal, personal
spaces, and community spaces
Risk Category assigned by
doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continue
Not Well Controlled – contact nurse
Poor Controlled – contact doctor
… and do it efficiently and at scale
What if we could automate this
sense making ability?
48
People are good at making sense of sensory input
What can we learn from cognitive models of perception?
• The key ingredient is prior knowledge
49
* based on Neisser’s cognitive model of perception
Observe
Property
Perceive
Feature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals
into high-level knowledge
Focusing attention on those
aspects of the environment that
provide useful information
Prior Knowledge
50
To enable machine perception,
Semantic Web technology is used to integrate
sensor data with prior knowledge on the Web
51
Prior knowledge on the Web
W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
52
Prior knowledge on the Web
W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
53
Observe
Property
Perceive
Feature
Explanation
1
Translating low-level signals
into high-level knowledge
Explanation
Explanation is the act of choosing the objects or events that best account for a
set of observations; often referred to as hypothesis building
54
Explanation
Inference to the best explanation
• In general, explanation is an abductive problem; and
hard to compute
Finding the sweet spot between abduction and OWL
• Single-feature assumption* enables use of OWL-DL
deductive reasoner
* An explanation must be a single feature which accounts for
all observed properties
Explanation is the act of choosing the objects or events that best account for a set of
observations; often referred to as hypothesis building
55
Explanation
Explanatory Feature: a feature that explains the set of observed properties
ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
56
Discrimination is the act of finding those properties that, if observed, would help distinguish
between multiple explanatory features
Observe
Property
Perceive
Feature
Explanation
Discrimination
2
Focusing attention on those
aspects of the environment that
provide useful information
Discrimination
57
Discrimination
Expected Property: would be explained by every explanatory feature
ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
58
Discrimination
Not Applicable Property: would not be explained by any explanatory feature
NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
59
Discrimination
Discriminating Property: is neither expected nor not-applicable
DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property Explanatory Feature
60
Through physical monitoring and
analysis, our cellphones could act as
an early warning system to detect
serious health conditions, and
provide actionable information
canary in a coal mine
Our Motivation
kHealth: knowledge-enabled healthcare
61
Qualities
-High BP
-Increased Weight
Entities
-Hypertension
-Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk score
e.g., Charlson Index
Longitudinal studies of
cardiovascular risks
- Find correlations
- Validation
- domain knowledge
- domain expert
Parameterize the
model
Risk Assessment Model
Current Observations
-Physical
-Physiological
-History
Risk Score
(Actionable Information)
Model CreationValidate correlations
Historical observations
of each patient
Risk Score: from Data to Abstraction and Actionable Information
62
How do we implement machine perception efficiently on a
resource-constrained device?
Use of OWL reasoner is resource intensive
(especially on resource-constrained devices),
in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes
• Asymptotic complexity: O(n3)
63
intelligence at the edge
Approach 1: Send all sensor observations
to the cloud for processing
Approach 2: downscale semantic
processing so that each device is capable
of machine perception
64
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained
Devices, ISWC 2012.
Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and
execute semantic reasoning
010110001101
0011110010101
1000110110110
101100011010
0111100101011
000110101100
0110100111
65
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds
• Complexity growth reduced from polynomial to linear
Evaluation on a mobile device
66
2 Prior knowledge is the key to perception
Using SW technologies, machine perception can be formalized and
integrated with prior knowledge on the Web
3 Intelligence at the edge
By downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level sensory
signals into high-level knowledge useful for decision making
67
• Real Time Feature Streams:
http://www.youtube.com/watch?v=_ews4w_eCpg
• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4
68
Demos
73
Smart Data in Social Media Analytics
To Understand the
human social
dynamics in real
world events
0.5B Tweets per day
0.5B Users
60% on Mobile
5530 Tweets per second
related to the Japan earthquake and tsunami
17000 Tweets
per second
74
Twitter During Real-world Events of Interest
http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/
http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-
twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitter
http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
75http://usatoday30.usatoday.com/news/politics/twitter-election-meter
http://twitris.knoesis.org/
State of the Art – Uni/Bi Dimensional Analysis During Elections
Topics
Sentiments
76
Twitris’ Dimensions of Integrated Semantic Analysis
77Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
78
http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249
http://semanticweb.com/election-2012-the-semantic-recap_b33278
79
[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]
/t
80
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the first debate?
81
Red Color: Negative Topics
Green Color: Positive Topics
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the second debate?
SMART DATA IS ABOUT ANALYSIS FOR REASONING
(what caused the positive sentiment for Democrats)
BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)
http://knoesis.wright.edu/library/resource.php?id=1787
Top 100 influential users that
talks about Barack Obama
Positive or Negative
Influence
Twitris: Network Analysis
SMART DATA TELLS YOU HOW CAN A SYSTEM BE
TWEAKED FOR THE DESIRED ACTIONS!
Could we engage with users (targeted) with extreme
polarity leaning for Obama to spark an agenda in the whole
network of voters (ACTION)? 82
Twitris: Community Evolution
SMART DATA FOCUSES ON THE CAUSALITY
OF CHANGES IN REAL-WORLD ACTIONS!
Romney
Obama
Evolution of influencer interaction networks for Romney vs. Obama
topical communities, during U.S. Presidential Election 2012 debates
Before 1st
debate
After 1st
debate
After
Hurricane Sandy
After 3rd
debate
83
The Dead People mentioned
in the event OWC
Twitris: Impact of Background Knowledge
84
How People from Different
parts of the world talked
about US Election
Images and Videos
Related to US Election
Twitris: Analysis by Location
85
What is Smart Data in the context of
Disaster Management
ACTIONABLE: Timely delivery of
right resources and information to
the right people at right location!
86
Because everyone wants to Help, but DON’T KNOW HOW!
Join us for the Social
Good!
http://twitris.knoesis.org
RT @OpOKRelief:
Southgate Baptist Church
on 4th Street in Moore
has
food, water, clothes, diap
ers, toys, and more. If
you can't go,call 794
Text "FOOD" to
32333, REDCROSS to
90999, or STORM to
80888 to donate $10
in storm relief.
#moore #oklahoma
#disasterrelief
#donate
Want to help animals in
#Oklahoma? @ASPCA tells
how you can help:
http://t.co/mt8l9PwzmO
CITIZEN SENSORS
RESPONSE TEAMS
(including humanitarian
org. and ‘pseudo’ responders)
VICTIM SITE
Coordination of
needs and offers
Using Social Media
Does anyone
know where to
send a check to
donate to the
tornado
victims?
Where do I go
to help out for
volunteer work
around Moore?
Anyone know?
Anyone know
where to donate
to help the
animals from the
Oklahoma
disaster? #oklah
oma #dogs
Matched
Matched
Matched
Serving the need!
If you would like to volunteer
today, help is desperately
needed in Shawnee. Call
273-5331 for more info
http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-11531612
87
Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress:
Smart Data from Twitris system for
Disaster Response Coordination
Which are the primary locations with
most negative sentiments/emotions?
Who are all the people to engage
with for better information
diffusion?Which are the most important
organizations acting at my
location?
Smart data provides actionable information and improve decision making through
semantic analysis of Big Data.
Who are the resource seekers and
suppliers? How can one donate?
88
Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89
Disaster Response Coordination Framework
Disaster Response Coordination:
Twitris Summary for Actionable Nuggets
90
Important tags to
summarize Big Data flow
Related to Oklahoma
tornado
Images and Videos Related
to Oklahoma tornado
91
Disaster Response Coordination:
Twitris Real-time information for needs
Incoming Tweets with need
types to give quick idea of
what is needed and where
currently #OKC
Legends for Different
needs #OKC
(It is real-time widget for monitoring of needs, so will not be active after the event has passed)
http://twitris.knoesis.org/oklahomatornado
92
Disaster Response Coordination:
Influencers to engage with for specific needs
Influential users are respective
needs and their interaction
network on the right.
Really sparse Signal to Noise:
• 2M tweets during the first week after #Oklahoma-tornado-2013
- 1.3% as the highly precise donation requests to help
- 0.02% as the highly precise donation offers to help
93
• Anyone know how to get involved to
help the tornado victims in
Oklahoma??#tornado #oklahomacity
(OFFER)
• I want to donate to the Oklahoma cause
shoes clothes even food if I can (OFFER)
Disaster Response Coordination:
Finding Actionable Nuggets for Responders to act
• Text REDCROSS to 909-99 to donate to
those impacted by the Moore tornado!
http://t.co/oQMljkicPs (REQUEST)
• Please donate to Oklahoma disaster
relief efforts.: http://t.co/crRvLAaHtk
(REQUEST)
For responders, most important information is the scarcity and
availability of resources, can we mine it via Social Media?
• Features driven by the experience of domain experts at the
responder organizations
• Examples,
– ‘I want to <donate/ help/ bring>’ for extraction of offering
intention
– ‘tent house’ OR ‘cots’ for shelter need types
94
Disaster Response Coordination:
Human Knowledge to drive information extraction
• A knowledge-driven approach
– A rich inventory of metadata for tweets
– Semantic matching for
needs (query) vs. offers (documents)
• Example,
– @bladesofmilford please help get the word out,we are accepting kid clothes to send
to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)
– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)
95
Disaster Response Coordination:
Automatic Matching of needs and offers
Matching the
competitive intentions
(Needs and Offers) can
offload humans for the
task of resource
matchmaking for
coordination.
96
Disaster Response Coordination:
Engagement Interface for responders
What-Where-How-Who-Why
Coordination
Influential users to engage
with and resources for
seekers/supplies at a
location, at a timestamp
Contextual
Information for a
chosen topical tags
• Illustrious scenario: #Oklahoma-tornado 2013
97
Disaster Response Coordination:
Anecdote for the value of Smart Data
FEMA asked us to quickly filter
out gas-leak related data
Mining the data for smart nuggets
to inform FEMA (Timely needs)
Engaged with the author of this
information to confirm (Veracity)
e.g., All gas leaks in #moore were capped and stopped by
11:30 last night (at 5/22/2013 1:41:37)
Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)
e.g., I want to go to Oklahoma this weekend & do what i can to help those people with
food,cloths & supplies,im in the feel of wanting to help ! :)
An event is a dynamic topic that evolves and
might later fork into several distinct events.
Smart Data analytics to capture rapidly evolving social data events
98
Social Media is the pulse of the
populace, a true reflection of
events all over the globe!
Continuous Semantics
99
Dynamic Model Creation
Continuous Semantics 100
Dynamic Model Creation:
101
Example of how background knowledge help
understand situation described in the tweets, while
also updating knowledge model also
How is Continuous Semantics a form of
Smart Data Analytics?
Keeping the Background Knowledge
abreast with the changes of the event
Smartly learning and adapting data acquisition
(Temporally apt Big Data, i.e. Fast Data)
In-turn providing temporally relevant
Smart Data through analysis
102
103
Smart Data Analytics in Traffic Management
To improve the
everyday life
entangled due
to our most
common
problem of
sticking in
traffic
By 2001 over 285 million Indians lived in cities, more than in all
North American cities combined (Office of the Registrar General of India 2001)1
1The Crisis of Public Transport in India
2IBM Smarter Traffic
Modes of transportation in Indian Cities
Texas Transportation Institute (TTI)
Congestion report in U.S.
104
Severity of the Traffic Problem
Vehicular traffic data from San Francisco Bay Area aggregated from on-road
sensors (numerical) and incident reports (textual)
105
http://511.org/
Every minute update of speed, volume, travel time, and occupancy resulting in
178 million link status observations, 738 active events, and 146 scheduled
events with many unevenly sampled observations collected over 3 months.
Variety Volume
VeracityVelocity
Value
Can we detect the onset of traffic congestion?
Can we characterize traffic congestion based on events?
Can we provide actionable information to decision makers?
semantics
Representing prior knowledge of
traffic lead to a focused exploration
of this massive dataset
Big Data to Smart Data: Traffic Management example
Slow moving
traffic
Link
Description
Scheduled
Event
Scheduled
Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
106
Heterogeneity in a Physical-Cyber-Social System
107
Heterogeneity in a Physical-Cyber-Social System
• Observation: Slow Moving Traffic
• Multiple Causes (Uncertain about the cause):
– Scheduled Events: music events, fair, theatre events, concerts, road
work, repairs, etc.
– Active Events: accidents, disabled vehicles, break down of
roads/bridges, fire, bad weather, etc.
– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm
• Each of these events may have a varying impact on traffic.
• A delay prediction algorithm should process multimodal and
multi-sensory observations.
Uncertainty in a Physical-Cyber-Social System
108
• Internal observations
– Speed, volume, and travel time observations
– Correlations may exist between these variables
across different parts of the network
• External events
– Accident, music event, sporting event, and
planned events
– External events and internal observations may
exhibit correlations
Modeling Traffic Events
109
Accident
Music event
Sporting event
Road Work
Theatre event
External events
<ActiveEvents, ScheduledEvents>
Internal observations
<speed, volume, traveTime>
Weather
Time of Day
Modeling Traffic Events
110
Domain Experts
cold
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Causal
knowledge
Linked Open Data
Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)
1 0 1 1
1 1 1 0
1 1 1 1
1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
Complementing Probabilistic Models with Declarative Knowledge
112
Correlations to causations using
Declarative knowledge on the
Semantic Web
• Declarative knowledge about various domains
are increasingly being published on the web1,2.
• Declarative knowledge describes concepts and
relationships in a domain (structure).
• Linked Open Data may be used to derive
priors probability of events (parameters).
• Explored the use declarative knowledge for
structure using ConceptNet 5.
1http://conceptnet5.media.mit.edu/
2http://linkeddata.org/
Domain Knowledge
113
http://conceptnet5.media.mit.edu/web/c/en/traffic_jam
Delay
go to baseball game
traffic jam
traffic accident
traffic jam
ActiveEvent
ScheduledEvent
Causes
traffic jam
Causes
traffic jam
CapableOf
slow traffic
CapableOf
occur twice each day
Causes
is_a
bad weather
CapableOf
slow traffic
road ice
Causes
accident
TimeOfDay
go to concert
HasSubevent
car crash
accident
RelatedTo
car crash
BadWeather
Causes
Causes
is_a
is_a
is_a is_a is_a
is_a
is_a
ConceptNet 5
114
Traffic jam
Link
Description
Scheduled
Event
traffic jambaseball game
Add missing random variables
Time of day
bad weather CapableOf slow traffic
bad weather
Traffic data from sensors deployed on road
network in San Francisco Bay Area
time of day
traffic jambaseball game
time of day
slow traffic
Three Operations: Complementing graphical model structure extraction
Add missing links bad weather
traffic jambaseball game
time of day
slow traffic
Add link direction
bad weather
traffic jambaseball game
time of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each day
traffic jam CapableOf slow traffic
115
116
Scheduled Event
Active Event
Day of week Time of day
delay
Travel time
speed
volume
Structure extracted form
traffic observations
(sensors + textual) using
statistical techniques
Scheduled Event
Active Event
Day of week
Time of day
delayTravel time
speed
volume
Bad Weather
Enriched structure which has
link directions and new nodes
such as “Bad Weather”
potentially leading to better
delay predictions
Enriched Probabilistic Models using ConceptNet 5
Take Away
• It is all about the human – not computing, not
device
– Computing for human experience
• Whatever we do in Smart Data, focus on human-
in-the-loop (empowering machine computing!):
– Of Human, By Human, For Human
– But in serving human needs, there is a lot more than
what current big data analytics handle –
variety, contextual, personalized, subjective, spanning
data and knowledge across P-C-S dimensions
118
Acknowledgements
• Kno.e.sis team
• Funds: NSF, NIH, AFRL, Industry…
• Note:
• For images and sources, if not on slides, please see slide notes
• Some images were taken from the Web Search results and all such images belong
to their respective owners, we are grateful to the owners for usefulness of these
images in our context.
119
• OpenSource: http://knoesis.org/opensource
• Showcase: http://knoesis.org/showcase
• Vision: http://knoesis.org/node/266
• Publications: http://knoesis.org/library
120
References and Further Readings
Thanks …
121
122
Physical Cyber Social Computing
Amit Sheth, Kno.e.sis, Wright State
Amit Sheth’s
PHD students
Ashutosh Jadhav
Hemant
Purohit
Vinh
Nguyen
Lu Chen
Pavan
Kapanipathi
Pramod
Anantharam
Sujan
Perera
Alan Smith
Pramod Koneru
Maryam Panahiazar
Sarasi Lalithsena
Cory Henson
Kalpa
Gunaratna
Delroy
Cameron
Sanjaya
Wijeratne
Wenbo
Wang
Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
124
thank you, and please visit us at
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Smart Data

More Related Content

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

  • 1. 1
  • 3. 1% of the data is used for analysis. 3 http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  • 5. Velocity Fast Data Rapid Changes Real-Time/Stream Analysis Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
  • 6. • Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare – Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns – Emphasis on technologies to handle volume/scale, and to lesser extent velocity: Hadoop, NoSQL,MPP warehouse …. – Full faith in the power of data (no hypothesis), bottom up analysis 6 Current Focus on Big Data
  • 7. • What if your data volume gets so large and varied you don't know how to deal with it? • Do you store all your data? • Do you analyze it all? • How can you find out which data points are really important? • How can you use it to your best advantage? 7 Questions typically asked on Big Data http://www.sas.com/big-data/
  • 9. • Prediction of the spread of flu in real time during H1N1 2009 – Google tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds – Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013] • FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013] • NY city manholes problem [ICML Discussion, 2012] 9 Illustrative Big Data Applications
  • 10. • Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well- being; better disaster coordination) that is highly personalized/individualized/contextualized – Incorporate real-world complexity: multi-modal and multi-sensory nature of real-world and human perception – Need deeper understanding of data and its role to information (e.g., skew, coverage) • Human involvement and guidance: Leading to actionable information, understanding and insight right in the context of human activities – Bottom-up & Top-down processing: Infusion of models and background knowledge (data + knowledge + reasoning) 10 What is missing?
  • 11. Makes Sense Actionable or help decision support/making 11
  • 12. Smart Data Smart data makes sense out of Big data It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making. 12
  • 13. “OF human, BY human and FOR human” Smart data is focused on the actionable value achieved by human involvement in data creation, processing and consumption phases for improving the human experience. Another perspective on Smart Data 13
  • 15. “OF human, BY human and FOR human” Another perspective on Smart Data 15
  • 16. Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16 ‘OF human’ : Relevant Real-time Data Streams for Human Experience
  • 17. “OF human, BY human and FOR human” 17 Another perspective on Smart Data
  • 18. Use of Prior Human-created Knowledge Models 18 ‘BY human’: Involving Crowd Intelligence in data processing workflows Crowdsourcing and Domain-expert guided Machine Learning Modeling
  • 19. “OF human, BY human and FOR human” Another perspective on Smart Data 19
  • 20. Detection of events, such as wheezing sound, indoor temperature, humidity, dust, and CO2 level Weather Application Asthma Healthcare Application Close the window at home during day to avoid CO2 in gush, to avoid asthma attacks at night 20 ‘FOR human’ : Improving Human Experience Population Level Personal Public Health Action in the Physical World
  • 21. 21 Why do we care about Smart Data rather than Big Data?
  • 22. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web Put Knoesis Banner Keynote at SEBD 2013, July 1, 2013 and invited talk in universities in Spain, June 2013. The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA Pavan Kapanipathi Pramod Anantharam Amit Sheth Cory Henson Dr. T.K. Prasad Maryam Panahiazar Contributions by many, but Special Thanks to: Hemant Purohit
  • 23. Second-costliest hurricane in United States history estimated damage $75 billion 90-115 mph winds State of Emergency in New York 285 people killed on the track of Sandy 750,000 without power (NY) Immense devastation and Human suffering 23 Big Data to Smart Data: Disaster Management example http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
  • 24. 20 million tweets with “sandy, hurricane” keywords between Oct 27th and Nov 1st 2nd most popular topic on Facebook during 2012 Social (Big) Data during Hurricane Sandy 24 • http://www.guardian.co.uk/news/datablog/2 012/oct/31/twitter-sandy-flooding • http://www.huffingtonpost.com/2012/11/02 /twitter-hurricane-sandy_n_2066281.html • http://mashable.com/2012/10/31/hurricane- sandy-facebook/
  • 25. For information seeking For timely information For unique information For unfiltered information To determine disaster magnitude To check in with family and friends To self-mobilize To maintain a sense of community To seek emotional support and healing Governments Emergency management organizations Journalists Disaster responders Public BIG DATA TO SMART DATA: WHY? and FOR WHOM? 25 Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.
  • 26. Improving situational awareness - Timely delivery of necessary information to the right people Improving coordination between resource seekers and suppliers Detecting the magnitude of disaster by people sentiments. Many more challenges… Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data) 26 http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
  • 27. Volume Twitter hits half a billion tweets a day! Challenges Delivering the necessary actionable/information to the right people 27 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
  • 28. Velocity Volume @ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone? Challenges Delivering the necessary/actionable information to the right people Rate of Data Arrival Approximately 7000 TPS 10 images per second on instagram 28 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
  • 30. Velocity Variety Veracity Volume Challenges Delivering the necessary/actionable information to the right people 30http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
  • 32. Value -Makes Sense -Actionable Information -Decision support/making Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 32 Smart Data focuses on the value
  • 33. Value -Makes Sense -Actionable Information -Decision support/making Disaster Management Victims Timely and Contextual Information about • Electricity, Food, Water, Shelter and donation offers related to the disaster. Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 33
  • 35. • Healthcare – kHealth – SemHeath • Social event coordination – Twitris • Traffic monitoring – kTraffic 35 Applications of Smart Data Analytics
  • 36. The Patient of the Future MIT Technology Review, 2012 http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 36
  • 37. To gain new insight in patient care & early indications of disease 37 Smart Data in Healthcare
  • 38. Sensing is a key enabler of the Internet of Things BUT, how do we make sense of the resulting avalanche of sensor data? 50 Billion Things by 2020 (Cisco) 38
  • 39. Parkinson’s disease (PD) data from The Michael J. Fox Foundation for Parkinson’s Research. 39 1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data 8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data). Variety Volume VeracityVelocity Value Can we detect the onset of Parkinson’s disease? Can we characterize the disease progression? Can we provide actionable information to the patient? semantics Representing prior knowledge of PD led to a focused exploration of this massive dataset WHY Big Data to Smart Data: Healthcare example
  • 40. 40 Big Data to Smart Data Using a Knowledge Based Approach ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person) ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person) ParkinsonAdvanced(person) = Fall(person) Control Group PD Patients Movements of an active person has a good distribution over X, Y, and Z axis Restricted movements by a PD patient can be seen in the acceleration readings Audio is well modulated with good variations in the energy of the voice Audio is not well modulated represented a monotone speech Declarative Knowledge of Parkinson’s Disease used to focus our attention on symptom manifestations in sensor observations
  • 41. • 25 million people in the U.S. are diagnosed with asthma (7 million are children)1. • 300 million people suffering from asthma worldwide2. • Asthma related healthcare costs alone are around $50 billion a year2. • 155,000 hospital admissions and 593,000 emergency department visits in 20063. 41 1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/ 2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145. Asthma: Severity of the problem
  • 42. Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels. 42 Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies. Variety Volume VeracityVelocity Value Can we detect the asthma severity level? Can we characterize asthma control level? What risk factors influence asthma control? What is the contribution of each risk factor?semantics Understanding relationships between health signals and asthma attacks for providing actionable information WHY Big Data to Smart Data: Healthcare example
  • 43. 43 Population Level Personal Public Health Variety: Health signals span heterogeneous sources Volume: Health signals are fine grained Velocity: Real-time change in situations Veracity: Reliability of health signals may be compromised Value: Can I reduce my asthma attacks at night? Decision support to doctors by providing them with deeper insights into patient asthma care Asthma: Demonstration of Value
  • 44. 44 Sensordrone – for monitoring environmental air quality Wheezometer – for monitoring wheezing sounds Can I reduce my asthma attacks at night? What are the triggers? What is the wheezing level? What is the propensity toward asthma? What is the exposure level over a day? What is the air quality indoors? Commute to Work Personal Public Health Population Level Closing the window at home in the morning and taking an alternate route to office may lead to reduced asthma attacks Actionable Information Asthma: Actionable Information for Asthma Patients
  • 45. Personal, Public Health, and Population Level Signals for Monitoring Asthma Asthma Control => Daily Medication Choices for starting therapy Not Well Controlled Poor Controlled Severity Level of Asthma (Recommended Action) (Recommended Action) (Recommended Action) Intermittent Asthma SABA prn - - Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS Moderate Persistent Asthma Medium dose ICS alone Or with LABA/montelukast Medium ICS + LABA/Montelukast Or High dose ICS Medium ICS + LABA/Montelukast Or High dose ICS* Severe Persistent Asthma High dose ICS with LABA/montelukast Needs specialist care Needs specialist care ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist Asthma Control and Actionable Information Sensors and their observations for understanding asthma 45
  • 46. 46 Personal Level Signals Societal Level Signals (Personal Level Signals) (Personalized Societal Level Signal) (Societal Level Signals) Societal Level Signals Relevant to the Personal Level Personal Level Sensors (kHealth**) (EventShop*) Qualify Quantify Action Recommendation What are the features influencing my asthma? What is the contribution of each of these features? How controlled is my asthma? (risk score) What will be my action plan to manage asthma? Storage Societal Level Sensors Asthma Early Warning Model (AEWM) Query AEWM Verify & augment domain knowledge Recommended Action Action Justification Asthma Early Warning Model *http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
  • 47. 47 Population Level Personal Wheeze – Yes Do you have tightness of chest? –Yes ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding <Wheezing=Yes, time, location> <ChectTightness=Yes, time, location> <PollenLevel=Medium, time, location> <Pollution=Yes, time, location> <Activity=High, time, location> Wheezing ChectTightness PollenLevel Pollution Activity Wheezing ChectTightness PollenLevel Pollution Activity RiskCategory <PollenLevel, ChectTightness, Pollution, Activity, Wheezing, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> . . . Expert Knowledge Background Knowledge tweet reporting pollution level and asthma attacks Acceleration readings from on-phone sensors Sensor and personal observations Signals from personal, personal spaces, and community spaces Risk Category assigned by doctors Qualify Quantify Enrich Outdoor pollen and pollution Public Health Health Signal Extraction to Understanding Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor
  • 48. … and do it efficiently and at scale What if we could automate this sense making ability? 48
  • 49. People are good at making sense of sensory input What can we learn from cognitive models of perception? • The key ingredient is prior knowledge 49
  • 50. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* Translating low-level signals into high-level knowledge Focusing attention on those aspects of the environment that provide useful information Prior Knowledge 50
  • 51. To enable machine perception, Semantic Web technology is used to integrate sensor data with prior knowledge on the Web 51
  • 52. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 52
  • 53. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 53
  • 54. Observe Property Perceive Feature Explanation 1 Translating low-level signals into high-level knowledge Explanation Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building 54
  • 55. Explanation Inference to the best explanation • In general, explanation is an abductive problem; and hard to compute Finding the sweet spot between abduction and OWL • Single-feature assumption* enables use of OWL-DL deductive reasoner * An explanation must be a single feature which accounts for all observed properties Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building 55
  • 56. Explanation Explanatory Feature: a feature that explains the set of observed properties ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Observed Property Explanatory Feature 56
  • 57. Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features Observe Property Perceive Feature Explanation Discrimination 2 Focusing attention on those aspects of the environment that provide useful information Discrimination 57
  • 58. Discrimination Expected Property: would be explained by every explanatory feature ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Expected Property Explanatory Feature 58
  • 59. Discrimination Not Applicable Property: would not be explained by any explanatory feature NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Not Applicable Property Explanatory Feature 59
  • 60. Discrimination Discriminating Property: is neither expected nor not-applicable DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Discriminating Property Explanatory Feature 60
  • 61. Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information canary in a coal mine Our Motivation kHealth: knowledge-enabled healthcare 61
  • 62. Qualities -High BP -Increased Weight Entities -Hypertension -Hypothyroidism kHealth Machine Sensors Personal Input EMR/PHR Comorbidity risk score e.g., Charlson Index Longitudinal studies of cardiovascular risks - Find correlations - Validation - domain knowledge - domain expert Parameterize the model Risk Assessment Model Current Observations -Physical -Physiological -History Risk Score (Actionable Information) Model CreationValidate correlations Historical observations of each patient Risk Score: from Data to Abstraction and Actionable Information 62
  • 63. How do we implement machine perception efficiently on a resource-constrained device? Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3) 63
  • 64. intelligence at the edge Approach 1: Send all sensor observations to the cloud for processing Approach 2: downscale semantic processing so that each device is capable of machine perception 64 Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  • 65. Efficient execution of machine perception Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning 010110001101 0011110010101 1000110110110 101100011010 0111100101011 000110101100 0110100111 65
  • 66. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear Evaluation on a mobile device 66
  • 67. 2 Prior knowledge is the key to perception Using SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web 3 Intelligence at the edge By downscaling semantic inference, machine perception can execute efficiently on resource-constrained devices Semantic Perception for smarter analytics: 3 ideas to takeaway 1 Translate low-level data to high-level knowledge Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making 67
  • 68. • Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg • kHealth: http://www.youtube.com/watch?v=btnRi64hJp4 68 Demos
  • 69. 73 Smart Data in Social Media Analytics To Understand the human social dynamics in real world events
  • 70. 0.5B Tweets per day 0.5B Users 60% on Mobile 5530 Tweets per second related to the Japan earthquake and tsunami 17000 Tweets per second 74 Twitter During Real-world Events of Interest http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/ http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013- twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitter http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  • 72. State of the Art – Uni/Bi Dimensional Analysis During Elections Topics Sentiments 76
  • 73. Twitris’ Dimensions of Integrated Semantic Analysis 77Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
  • 75. 79 [The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST] /t
  • 76. 80 Twitris: Sentiment Analysis- Smart Answers with reasoning! How was Obama doing in the first debate?
  • 77. 81 Red Color: Negative Topics Green Color: Positive Topics Twitris: Sentiment Analysis- Smart Answers with reasoning! How was Obama doing in the second debate? SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win) http://knoesis.wright.edu/library/resource.php?id=1787
  • 78. Top 100 influential users that talks about Barack Obama Positive or Negative Influence Twitris: Network Analysis SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS! Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole network of voters (ACTION)? 82
  • 79. Twitris: Community Evolution SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS! Romney Obama Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates Before 1st debate After 1st debate After Hurricane Sandy After 3rd debate 83
  • 80. The Dead People mentioned in the event OWC Twitris: Impact of Background Knowledge 84
  • 81. How People from Different parts of the world talked about US Election Images and Videos Related to US Election Twitris: Analysis by Location 85
  • 82. What is Smart Data in the context of Disaster Management ACTIONABLE: Timely delivery of right resources and information to the right people at right location! 86 Because everyone wants to Help, but DON’T KNOW HOW!
  • 83. Join us for the Social Good! http://twitris.knoesis.org RT @OpOKRelief: Southgate Baptist Church on 4th Street in Moore has food, water, clothes, diap ers, toys, and more. If you can't go,call 794 Text "FOOD" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10 in storm relief. #moore #oklahoma #disasterrelief #donate Want to help animals in #Oklahoma? @ASPCA tells how you can help: http://t.co/mt8l9PwzmO CITIZEN SENSORS RESPONSE TEAMS (including humanitarian org. and ‘pseudo’ responders) VICTIM SITE Coordination of needs and offers Using Social Media Does anyone know where to send a check to donate to the tornado victims? Where do I go to help out for volunteer work around Moore? Anyone know? Anyone know where to donate to help the animals from the Oklahoma disaster? #oklah oma #dogs Matched Matched Matched Serving the need! If you would like to volunteer today, help is desperately needed in Shawnee. Call 273-5331 for more info http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-11531612 87 Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress:
  • 84. Smart Data from Twitris system for Disaster Response Coordination Which are the primary locations with most negative sentiments/emotions? Who are all the people to engage with for better information diffusion?Which are the most important organizations acting at my location? Smart data provides actionable information and improve decision making through semantic analysis of Big Data. Who are the resource seekers and suppliers? How can one donate? 88
  • 85. Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89 Disaster Response Coordination Framework
  • 86. Disaster Response Coordination: Twitris Summary for Actionable Nuggets 90 Important tags to summarize Big Data flow Related to Oklahoma tornado Images and Videos Related to Oklahoma tornado
  • 87. 91 Disaster Response Coordination: Twitris Real-time information for needs Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  • 88. 92 Disaster Response Coordination: Influencers to engage with for specific needs Influential users are respective needs and their interaction network on the right.
  • 89. Really sparse Signal to Noise: • 2M tweets during the first week after #Oklahoma-tornado-2013 - 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help 93 • Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER) • I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) Disaster Response Coordination: Finding Actionable Nuggets for Responders to act • Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST) • Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST) For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?
  • 90. • Features driven by the experience of domain experts at the responder organizations • Examples, – ‘I want to <donate/ help/ bring>’ for extraction of offering intention – ‘tent house’ OR ‘cots’ for shelter need types 94 Disaster Response Coordination: Human Knowledge to drive information extraction
  • 91. • A knowledge-driven approach – A rich inventory of metadata for tweets – Semantic matching for needs (query) vs. offers (documents) • Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST) – I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) 95 Disaster Response Coordination: Automatic Matching of needs and offers Matching the competitive intentions (Needs and Offers) can offload humans for the task of resource matchmaking for coordination.
  • 92. 96 Disaster Response Coordination: Engagement Interface for responders What-Where-How-Who-Why Coordination Influential users to engage with and resources for seekers/supplies at a location, at a timestamp Contextual Information for a chosen topical tags
  • 93. • Illustrious scenario: #Oklahoma-tornado 2013 97 Disaster Response Coordination: Anecdote for the value of Smart Data FEMA asked us to quickly filter out gas-leak related data Mining the data for smart nuggets to inform FEMA (Timely needs) Engaged with the author of this information to confirm (Veracity) e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37) Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders) e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)
  • 94. An event is a dynamic topic that evolves and might later fork into several distinct events. Smart Data analytics to capture rapidly evolving social data events 98 Social Media is the pulse of the populace, a true reflection of events all over the globe!
  • 97. Dynamic Model Creation: 101 Example of how background knowledge help understand situation described in the tweets, while also updating knowledge model also
  • 98. How is Continuous Semantics a form of Smart Data Analytics? Keeping the Background Knowledge abreast with the changes of the event Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data) In-turn providing temporally relevant Smart Data through analysis 102
  • 99. 103 Smart Data Analytics in Traffic Management To improve the everyday life entangled due to our most common problem of sticking in traffic
  • 100. By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1 1The Crisis of Public Transport in India 2IBM Smarter Traffic Modes of transportation in Indian Cities Texas Transportation Institute (TTI) Congestion report in U.S. 104 Severity of the Traffic Problem
  • 101. Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual) 105 http://511.org/ Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months. Variety Volume VeracityVelocity Value Can we detect the onset of traffic congestion? Can we characterize traffic congestion based on events? Can we provide actionable information to decision makers? semantics Representing prior knowledge of traffic lead to a focused exploration of this massive dataset Big Data to Smart Data: Traffic Management example
  • 103. 107 Heterogeneity in a Physical-Cyber-Social System
  • 104. • Observation: Slow Moving Traffic • Multiple Causes (Uncertain about the cause): – Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc. – Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc. – Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm • Each of these events may have a varying impact on traffic. • A delay prediction algorithm should process multimodal and multi-sensory observations. Uncertainty in a Physical-Cyber-Social System 108
  • 105. • Internal observations – Speed, volume, and travel time observations – Correlations may exist between these variables across different parts of the network • External events – Accident, music event, sporting event, and planned events – External events and internal observations may exhibit correlations Modeling Traffic Events 109
  • 106. Accident Music event Sporting event Road Work Theatre event External events <ActiveEvents, ScheduledEvents> Internal observations <speed, volume, traveTime> Weather Time of Day Modeling Traffic Events 110
  • 107. Domain Experts cold PoorVisibility SlowTraffic IcyRoad Declarative domain knowledge Causal knowledge Linked Open Data Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO) 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 Domain Observations Domain Knowledge Structure and parameters Complementing Probabilistic Models with Declarative Knowledge 112 Correlations to causations using Declarative knowledge on the Semantic Web
  • 108. • Declarative knowledge about various domains are increasingly being published on the web1,2. • Declarative knowledge describes concepts and relationships in a domain (structure). • Linked Open Data may be used to derive priors probability of events (parameters). • Explored the use declarative knowledge for structure using ConceptNet 5. 1http://conceptnet5.media.mit.edu/ 2http://linkeddata.org/ Domain Knowledge 113
  • 109. http://conceptnet5.media.mit.edu/web/c/en/traffic_jam Delay go to baseball game traffic jam traffic accident traffic jam ActiveEvent ScheduledEvent Causes traffic jam Causes traffic jam CapableOf slow traffic CapableOf occur twice each day Causes is_a bad weather CapableOf slow traffic road ice Causes accident TimeOfDay go to concert HasSubevent car crash accident RelatedTo car crash BadWeather Causes Causes is_a is_a is_a is_a is_a is_a is_a ConceptNet 5 114
  • 110. Traffic jam Link Description Scheduled Event traffic jambaseball game Add missing random variables Time of day bad weather CapableOf slow traffic bad weather Traffic data from sensors deployed on road network in San Francisco Bay Area time of day traffic jambaseball game time of day slow traffic Three Operations: Complementing graphical model structure extraction Add missing links bad weather traffic jambaseball game time of day slow traffic Add link direction bad weather traffic jambaseball game time of day slow traffic go to baseball game Causes traffic jam Knowledge from ConceptNet5 traffic jam CapableOfoccur twice each day traffic jam CapableOf slow traffic 115
  • 111. 116 Scheduled Event Active Event Day of week Time of day delay Travel time speed volume Structure extracted form traffic observations (sensors + textual) using statistical techniques Scheduled Event Active Event Day of week Time of day delayTravel time speed volume Bad Weather Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions Enriched Probabilistic Models using ConceptNet 5
  • 112. Take Away • It is all about the human – not computing, not device – Computing for human experience • Whatever we do in Smart Data, focus on human- in-the-loop (empowering machine computing!): – Of Human, By Human, For Human – But in serving human needs, there is a lot more than what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions 118
  • 113. Acknowledgements • Kno.e.sis team • Funds: NSF, NIH, AFRL, Industry… • Note: • For images and sources, if not on slides, please see slide notes • Some images were taken from the Web Search results and all such images belong to their respective owners, we are grateful to the owners for usefulness of these images in our context. 119
  • 114. • OpenSource: http://knoesis.org/opensource • Showcase: http://knoesis.org/showcase • Vision: http://knoesis.org/node/266 • Publications: http://knoesis.org/library 120 References and Further Readings
  • 116. 122 Physical Cyber Social Computing Amit Sheth, Kno.e.sis, Wright State
  • 117. Amit Sheth’s PHD students Ashutosh Jadhav Hemant Purohit Vinh Nguyen Lu Chen Pavan Kapanipathi Pramod Anantharam Sujan Perera Alan Smith Pramod Koneru Maryam Panahiazar Sarasi Lalithsena Cory Henson Kalpa Gunaratna Delroy Cameron Sanjaya Wijeratne Wenbo Wang Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
  • 118. 124 thank you, and please visit us at http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Smart Data

Editor's Notes

  1. http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  2. http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  3. Types of DataFormats of DataAlso talk about the increase in the platforms that helps generating these data
  4. Example high velocity Big Data applications at work:financial services, stock brokerage, weather tracking, movies/entertainment and online retail.Fast data (rate at which data is coming: esp from mobile, social and sensor sources), Rapid changes – in the data content, Stream analysis – to cope with the incoming data for real-time online analytics
  5. Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies
  6. http://radhakrishna.typepad.com/rks_musings/2013/04/big-data-review.htmlGoogle predicted the spread of flu in real time - after analyzing two datasets, a.) 50 million most common terms that Americans type, b.) data on the spread of seasonal flu from public health agency- tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases- model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system (Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013)
  7. Better Algorithms Beat More Data — And Here’s Whyhttp://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/Big Data Cannot Replace Human Judgmenthttp://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx**Comments about the articles
  8. Smart data makes sense out of big data – it provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, to provide actionable information and improve decision making.
  9. - HUMAN CENTRIC!!
  10. Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans
  11. All the data related to human activity, existence and experiencesMore on PCS Computing: http://wiki.knoesis.org/index.php/PCS
  12. Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans Example of a human guided modeling and improved performancehttp://research.microsoft.com/en-us/um/people/akapoor/papers/IJCAI%202011a.pdf
  13. Also, we have weather application which performs abstraction on weather sensory observations to identify blizzard conditions (food for actions!!) :--20,000 weather stations (with ~5 sensors per station)-- Real-Time Feature Streams - live demo: http://knoesis1.wright.edu/EventStreams/ - video demo: https://skydrive.live.com/?cid=77950e284187e848&amp;sc=photos&amp;id=77950E284187E848%21276
  14. Lets find it..
  15. Starting slide Various Big data problems – Traditional examples vs what we are doing examples. Variety and Velocity than Volume. kHealth problem. People will be interested in Smart Data.Traditional ML techniques, High Performance Computing, Statistics. Human level of Abstraction is Smart data.
  16. http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.htmlI would like to start with a motivational example here.
  17. http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-floodinghttp://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.htmlhttp://mashable.com/2012/10/31/hurricane-sandy-facebook/We in our lab have quite a bit of Social Data Research going on. So I would like to focus on the use of social networks during these disasters/crisis.Twitter and Facebook are massively used during disasters. During Hurricane Sandy there were …Not only this a major outbreak of tweets were during Japan earthquake which crossed more that 2000 tweets/sec.So why do people intend to use social networks to this extent during disasters.
  18. Fraustino, Julia Daisy, Brooke Liu and Yan Jin. “Social Media Use during Disasters: A Review of the Knowledge Base and Gaps,” Final Report to Human Factors/Behavioral Sciences Division, Science and Technology Directorate, U.S. Department of Homeland Security. College Park, MD: START, 2012. Disaster communication deals with disaster information disseminated to the public by governments, emergency management organizations, and disaster responders as well as disaster information created and shared by journalists and the public. Disaster communication increasingly occurs via social media in addition to more conventional communication modes such as traditional media (e.g., newspaper, TV, radio) and word-of-mouth (e.g., phone call, face-to-face, group). Timely, interactive communication and user-generated content are hallmarks of social media, which include a diverse array of web- and mobile-based tools Disaster communication deals with (1) disaster information disseminated to the public by governments, emergency management organizations, and disaster responders often via traditional and social media; as well as (2) disaster information created and shared by journalists and affected members of the public often through word-of-mouth communication and social media. For information seeking. Disasters often breed high levels of uncertainty among the public (Mitroff, 2004), which prompts them to engage in heightened information seeking, (Boyle, Schmierbach, Armstrong, &amp; McLeod, 2004; Procopio &amp; Procopio, 2007). As expected, information seeking is a primary driver of social media use during routine times and during disasters (Liu et al., in press; PEW Internet, 2011). For timely information. Social media provide real-time disaster information, which no other media can provide (Kavanaugh et al., 2011; Kodrich &amp; Laituri, 2011). Social media can become the primary source of time-sensitive disaster information, especially when official sources provide information too slowly or are unavailable (Spiro et al., 2012). For example, during the 2007 California wildfires, the public turned to social media because they thought journalists and public officials were too slow to provide relevant information about their communities (Sutton, Palen, &amp; Shklovski, 2008). Time-sensitive information provided by social media during disasters is also useful for officials. For example, in an analysis of more than 500 million tweets, Culotta (2010) found Twitter data forecasted future influenza rates with high accuracy during the 2009 pandemic, obtaining a 95% correlation with national health statistics. Notably, the national statistics came from hospital survey reports, which typically had a lag time of one to two weeks for influenza reporting. For unique information. One of the primary reasons the public uses social media during disaster is to obtain unique information (Caplan, Perse, &amp; Gennaria, 2007). Applied to a disaster setting, which is inherently unpredictable and evolving, it follows that individuals turn to whatever source will provide the newest details. Oftentimes, individuals experiencing the event first-hand are on the scene of the disaster and can provide updates more quickly than traditional news sources and disaster response organization. For instance, in the Mumbai terrorist attacks that included multiple coordinated shootings and bombings across two days, laypersons were first to break the news on Twitter (Merrifield &amp; Palenchar, 2012). Research participants report using social media to satisfy their need to have the latest information available during disasters and for information gathering and sharing during disasters (Palen, Starbird, Vieweg, &amp; Hughes, 2010; Vieweg, Hughes, Starbird, &amp; Palen, 2010). For unfiltered information. To obtain crisis information, individuals often communicate with one another via social media rather than seeking a traditional news source or organizational website (Stephens &amp; Malone, 2009). The public check in with social media not only to obtain up-to-date, timely information unavailable elsewhere, but also because they appreciate that information may be unfiltered by traditional media, organizations, or politicians (Liu et al., in press).  To determine disaster magnitude. The public uses social media to stay apprised of the extent of a disaster (Liu et al., in press). They may turn to governmental or organizational sources for this information, but research has shown that if the public do not receive the information they desire when they desire it, they, along with others, will fill in the blanks (Stephens &amp; Malone, 2009), which can create rumors and misinformation. On the flipside, when the public believed that officials were not disseminating enough information regarding the size and trajectory of the 2007 California wildfires, they took matters into their own hands, using social media to track fire locations in real-time and notify residents who were potentially in danger (Sutton, Palen, &amp; Shklovski, 2008).  To check in with family and friends. While Americans predominately use social media to connect with family and friends (PEW Internet, 2011), during disasters those connections may shift. For those with family or friends directly involved with the disaster, social media can provide a way to ensure safety, offer support, and receive timely status updates (Procopio &amp; Procopio, 2007; Stephens &amp; Malone, 2009). In a survey of 1,058 Americans, the American Red Cross (2010) found that nearly half of their respondents would use social media to let loved ones know they are safe during disasters. After the 2011 earthquake and tsunami in Japan, the public turned to Twitter, Facebook, Skype, and local Japanese social networks to keep in touch with loved ones while mobile networks were down (Gao, Barbier, &amp; Goolsby, 2011). Researchers also note that disasters may enhance feelings of affection toward family members, and indeed survey participants reported expressing more positive emotions toward their loved ones than usual as a result of the September 11 terrorist attacks, even if they were not directly impacted by the disaster (Fredrickson et al., 2003). Finally, disasters can motivate the public to reconnect with family and friends via social media (Procopio &amp; Procopio, 2009; Semaan &amp; Mark, 2012).  To self-mobilize. During disasters, the public may use social media to organize emergency relief and ongoing assistance efforts from both near and afar. In fact, one research group dubbed those who surge to the forefront of digital and in-person disaster relief efforts as “voluntweeters” (Starbird &amp; Palen, 2011). Other research documents the role of Facebook and Twitter in disaster relief fundraising (Horrigan &amp; Morris, 2005; PEJ, 2010). Research also reveals how social media can help identify and respond to urgent needs after disasters. For example, just two hours after the 2010 Haitian earthquake Tufts University volunteers created Ushahidi-Haiti, a crisis map where disaster survivors and volunteers could send incident reports via text messages and tweets. In less than two weeks, 2,500 incident reports were sent to the map (Gao, Barbier, &amp; Gollsby, 2011).  To maintain a sense of community. During disasters the media in general and social media in particular may provide a unique gratification: sense of community. That is, as the public logs in online to share their feelings and thoughts, they assist each other in creating a sense of security and community, even when scattered across a vast geographical area (Lev-On, 2011; Procopio &amp; Procopio, 2007). As Reynolds and Seeger (2012) observed, social media create communities during disasters that may be temporary or may continue well into the future.  To seek emotional support and healing. Finally, disasters are often inherently tragic, prompting individuals to seek not only information but also human contact, conversation, and emotional care (Sutton et al., 2008). Social media are positioned to facilitate emotional support, allowing individuals to foster virtual communities and relationships, share information and feelings, and even demand resolution (Choi &amp; Lin, 2009; Stephens &amp; Malone, 2009). Indeed, social media in general and blogs in particular are instrumental for providing emotional support during and after disasters (Macias, Hilyard, &amp; Freimuth, 2009; PEJ New Media Index, 2011). Additionally, social media in general and Twitter in particular can aid healing, as research finds during both natural disasters, such as Hurricane Katrina (Procopio &amp; Procopio, 2007), and man-made disasters, such as the July 2011 attacks in Oslo, Norway (Perng et al., 2012).
  19. http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec -- Facebook help during Hurricane Sandyhttp://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html – Twitter page for Hurricane Sandyhttp://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.htmlCategorization of severity based on weather conditions. Actionable information is contextually dependent.
  20. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (in turn data) can help people during disasters. Data becomes smart data if it takes recipient into account - context.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  21. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_UShttp://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdfLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contact accordingly.Sensor data for emergency responders. Who in the population needs immidiate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  22. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contxt accordingly.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  23. http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandysDuring the storm last night, user @comfortablysmug was the source of a load of frightening but false information about conditions in New York City that spread wildly on Twitter and onto news broadcasts before Con Ed, the MTA, and Wall Street sources had to take time out of the crisis situation to refute them.
  24. Although we face challenges like these with data everytime. The most important thing is what you aim to do with the data. I mean what value do you intend to provide from the data
  25. http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  26. http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  27. -- Contextual Questioning – Potential Information needed from Humans
  28. Larry Smarr is a professor at the University of California, San DiegoAnd he was diagnosed with Crones DiseaseWhat’s interesting about this case is that Larry diagnosed himselfHe is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptomsThrough this process he discovered inflammation, which led him to discovery of Crones DiseaseThis type of self-tracking is becoming more and more common
  29. Massive amount of data will be collected by sensors and mobile devices yet patients and doctors care about “actionable” information.This data has all the four Vs of big data and we used knowledge enabled techniques to transform it into valueIn the context of PD, we analyzed massive amount of sensor data collected by sensors on a smartphones to understand detection and characterization of PD severity.
  30. Main idea: Prior knowledge of PD was used to facilitate its detection from massive sensor data by reducing the search spaceDetails:Declarative knowledge of PD includes PD severity and their symptoms as shown in the logical rule aboveEach PD severity level is a conjunction of a set of PD symptomsEach symptom was mapped to its manifestation in sensor observationsThe availability of declarative knowledge significantly improved the analytics by aiding feature selection processThe graphs above contrasts the physical movements and voice of two control group members and two PD patients
  31. kHealth:http://www.youtube.com/watch?v=btnRi64hJp4EventShop:*http://www.slideshare.net/jain49/eventshop-120721, http://dl.acm.org/citation.cfm?id=2488175
  32. - what if we could automate this sense making ability?- and what if we could do this at scale?
  33. sense making based on human cognitive models
  34. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  35. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  36. A single-feature (disease) assumption means that all the observed properties (symptoms) must be explained by a single feature.i.e., this framework is not expressive enough to model comorbidity where there may be more than one feature (disease) co-existing For example, if there are two diseases causing disjoint symptoms, and all the symptoms of both the diseases are observed, then this framework will not be able to find the coverage and returns no diseases.
  37. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  38. - With this ability,many problems could be solved- For example: we could help solve health problems (before they become serious health problems) through monitoring symptoms and real-time sense making, acting as an early warning system to detect problematic health conditions
  39. Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies
  40. Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologiesHenson et al. &apos;An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  41. compute machine perception inferences -- i.e., explanation and discrimination -- of high-complexity on a resource-constrained devices in milisecondsDifference between the other systems and what this system provides
  42. Intelligence at the age. Shipping computation and domain models to the edge (Distributed)
  43. “to help software reusability in order to allow new applications to be built faster and to share innovations (software components, novel approaches) amongst software developers” “to standardize and commoditize back-end data stores so client software may access any Open mHealth-compliant data store in a uniform way (interoperability)” “to produce examples and documentation of these concepts meaningfully and simply”
  44. Observe data from different sensors at the same time.
  45. System Architecture Fig. shows an overview of the SemHealth architecture. SensorsAll are bluetooth sensors already utilized by the current k-Health application to measure weight, heart rate, and blood pressureAndroid applicationReads sensor observations through bluetoothPerforms annotation on observations and generates percepts from those observationsUploads annotated observations and percepts to the server-side data storeRetrieves data using DSU API and feeds data to DPU and/or DVU APIsVisualizes data through DVU APIConsidered a “nice to have” as existing visualization may be used as-isWill utilize existing graphing library for Android with Open mHealth-style API that may be translated to browser at a later timeServer-sideOpen mHealth compliant DSU and DPU APIsTriple data storage replaces existing SQLite database in k-Health applicationExisting k-Health reasoner now the brains behind DPU
  46. http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  47. Much of the early work in Big data is being done with focusing on uni-directional among XYZ.
  48. http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
  49. http://knoesis.wright.edu/library/resource.php?id=1787
  50. Categorization of severity based on weather conditions. Actionable information is contextually dependent.
  51. - 1 (+half) minuteAlright, so let’s motivate by this situation during emergency - Various actors: resource seekers, responder teams, resource providers at remote siteAnd - each of these actor groups have questions --- - needs - providers - responders: wondering!Here we have social network to connect these actors and bridge the gap for communication platformBut it’s potential use is yet to be realized for effective helpBecause.. (next slide)
  52. Talk about what kind of smart data we provide that helps the actions of crisis response coordination.
  53. Source: Purohit et. al 2013 (https://docs.google.com/a/knoesis.org/document/d/1aBJ2egHICUwaWxR8jOoTIUfEYj1QAnUt0q7haIKoYGY/edit# , http://www.knoesis.org/library/resource.php?id=1865)
  54. http://twitris.knoesis.org/oklahomatornado
  55. (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  56. Highly rich interface for response team
  57. Definition of the event US Elections and some changes/subevents --- Primaries --- Debates -- People/Places/Organizations involved in the eventArab Spring -- Subevents during those -- Egypt protests
  58. Explain about continuous semantics
  59. Pucher, J., Korattyswaroopam, N., &amp; Ittyerah, N. (2004). The crisis of public transport in India: Overwhelming needs but limited resources. Journal of Public Transportation, 7(4), 1-30.
  60. Point of this slide: correlations
  61. Point of this slide: heterogeneity and uncertainty
  62. A single observation of slow moving traffic may have multiple explanations.
  63. Internal observations are limited to whatever the on-road sensors can observe. In the 511.org data we have analyzed, the internal observations are mentioned above.External events are obtained from sources beyond the on-road sensors e.g., some agency like 511.org which reports traffic incidents.Note that: Internal observations are mostly machine sensors External events are mostly textual observationsThe analogy in healthcare will be:Internal observations: on body sensors such as heart rate, temperatureExternal events: jogging, walking, taking stairs
  64. e.g. equation for projectile motion may not precisely compute the actual projectile. Air resistance may have been ignored
  65. Used of open data for parameters is promising and can be explored as future research.
  66. Some facts about the domain of traffic got from Conceptnet5The types of events are obtained by using the comprehensive subsumption relationship from 511.orgWe propose to use such a knowledge in complementing the PGM structure learning algorithmsCapableOf(traffic jam, occur twice each day)CapableOf(traffic jam, slow traffic)RelatedTo(accident, car crash)Causes(road ice, accident)CapableOf(bad weather, slow traffic)HasSubevent(go to concert, car crash)Causes(go to baseball game, traffic jam)Causes(traffic accident, traffic jam)BadWeather(road ice)BadWeather(bad weather)ScheduledEvent(go to concert)ScheduledEvent(go to baseball game)ActiveEvent(traffic accident)Delay(slow traffic)Delay(traffic jam)TimeOfDay(occur twice each day)
  67. Declarative knowledge + statistical correlationThis slide illustrates the three operations to enrich the correlation structure extracted using statistical methods These operations utilize declarative knowledge form ConceptNet5 as shown in each step
  68. Statistical correlation structure shown aboveThe enriched structure is shown belowThe enrichment of the graphical model will potentially allow us to capture the domain precisely and also improve our prediction as the model would get closer to the underlying probabilistic distribution in the real-worldLog-Likelihood score is one way of quantifying how good a structure is based on the observed data There may be many candidate structures extracted from data which result in the log likelihood scoreDeclarative knowledge will help us ground statistical models to reality which will allow us to pick one structure over the other Pramod Anantharam, KrishnaprasadThirunarayan and AmitSheth, &apos;Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases,&apos; 2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013) at SIAM International Conference on Data Mining (SDM13), pp. 13--20, Texas, USA, May 2-4, 2013.We stopped at structure extraction for our workshop paper (SIAM ACS workshop) since the declarative knowledge we used (ConceptNet5) and statistical model (nodes and edges) are at the same level of abstraction
  69. More at: http://wiki.knoesis.org/index.php/PCSAnd http://knoesis.org/projects/ssw/