(This is material presented as keynote at AMECSE 2014 on 21 Oct 2014 at Cairo, Egypt.)
State-of-the-art Artifical Intelligence (AI) and data management techniques have been demonstrated to process large volumes of noisy data to extract meaningful patterns and drive decisions in diverse applications ranging from space exploration (NASA's Curiosity), game shows (IBM's Watson in Jeopardy™ ) and even consumer products (Apple's SIRI™ voice-recognition). However, what stops them from helping us in more mundane things like fighting diseases, eliminating hunger, improving commuting
to work, or reducing financial frauds and corruption? Consumable data!
In this talk, Biplav will demonstrate and discuss how large volumes of data (Big), made available publicly (Open), can be productively used with semantic web and analytical techniques to drive day-to-day applications. One important source of this type of data is government open data which is from governments and free to be reused. Big Open Data is leading to early examples of "open innovations" - a confluence of open data (e.g., Data.gov, data.gov.in), accessible via API techniques (e.g., Open 311),
annotated with semantic information (e.g., W3C ontologies, Schema.org) and processed with analytical techniques (e.g., R, Weka) to drive actionable insights. The talk will illustrate how this can help bring increased benefits to citizens and discuss research issues that can accelerate its pace. It is increasingly being adopted by progressive businesses and governments to drive innovation that matters.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Big, Open, Data and Semantics for Real-World Application Near You
1. Big Open Data and Semantics for a
Real-World Application Near You
Dr. Biplav Srivastava, IBM Research – India
Keynote Talk at AMECSE 2014 on 21 October 2014
2. The Distinguished Speakers Program
is made possible by
For additional information, please visit http://dsp.acm.org/
3. About ACM
ACM, the Association for Computing Machinery is the world’s largest
educational and scientific computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the field’s
challenges.
ACM strengthens the computing profession’s collective voice through strong
leadership, promotion of the highest standards, and recognition of technical
excellence.
ACM supports the professional growth of its members by providing
opportunities for life-long learning, career development, and professional
networking.
With over 100,000 members from over 100 countries, ACM works to advance
computing as a science and a profession. www.acm.org
4. Real-World Applications of ICT: Ingredients
! Data – Available, Consumable with Semantics,
Visualization / Analysis
! Access - APIs, Apps (Applications), Usability - Human
Computer Interface
! Value – Providing benefits that matter, to people most
in need of, in a timely and cost-efficient manner.
Going beyond technology to process and people
aspects.
5. Running Example – Data from Conference
! Data – Technical Program
! Access – Website
! Value – To participants, organizers and wider ecosystem
Thought: Can any real-world application immediately benefit
from data created at this event?
6. Outline
! “Big Result”
! IBM’s Watson Q-A System: Intersection of Big Data, Analytics and Human Computer Interaction
! “Small Problem” – do it repeatedly and rapidly for key city services
! Data challenge: Make data available freely; Give semantics to data
! Open: World Wide Web Consortium, Data.gov movement
! Semantic: Linked Open Data, Ontologies
! Access - APIs: standards based access, composition
! Value - application challenge: Give benefit to citizens; create business opportunities
! Emerging Examples of Societal Applications with Analytical (AI) Techniques
and Open Government Data
! Tourism: attract people to visit for new experiences and spend their money as well
! Traffic: make public transportation attractive for commuting even without physical sensors
! Corruption: predictable, uniform, public services
! Public Health (covered more later in panel): reduce disease impact
! Not covered: Environment, Water, Public Safety, Energy, …
!
Call for action
! Make your data available in usable manner
Use more open data in your ongoing work (apps, research, monitoring, …)
! Build apps and make them available by citizens and other stakeholders
7. Big Result: Watson
7
Technical details: Ferrucci, D, et al. (2010), Building Watson: An Overview of the DeepQA
Project, AI Magazine (AI Magazine.) 31 (3)
Slides Courtesy: IBM Watson Team
8. Want to Play Chess or Just Chat?
! Chess
! A finite, mathematically well-defined search space
! Limited number of moves and states
! All the symbols are completely grounded in the mathematical rules of the game
! Human Language
! Words by themselves have no meaning
! Only grounded in human cognition
! Words navigate, align and communicate an infinite space of intended meaning
! Computers can not ground words to human experiences to derive meaning
9. IBM’s Watson is an emerging technology at the intersection of Big
Data, Analytics and Human / Computer Interaction trends
Wikipedia Definition
IBM Definition
“Built on IBM's DeepQA technology for hypothesis
generation, massive evidence gathering, analysis, and
scoring” – IBM (link)
Video: What is Watson?
IBM's Watson: A HorizonWatching Trend Report
AI Magazine
9
“Watson is an artificial intelligence computer system capable
of answering questions posed in natural language,
developed in IBM's DeepQA project” – Wikipedia (link)
“An application of advanced Natural Language Processing,
Information Retrieval, Knowledge Representation and
Reasoning, and Machine Learning technologies to the field
of open domain question answering” – IBM (link)
Enabling Technology Areas
• Natural Language Processing
• Semantic Analysis
• Information Retrieval
• Automated Reasoning
• Machine Learning
http://www.youtube.com/watch?v=dQmuETLeQcg
“DeepQA is an effective and extensible architecture that
can be used as a foundation for combining, deploying,
evaluating, and advancing a wide range of algorithmic
techniques to rapidly advance the field of question
answering (QA)” – AI Magazine (link)
10. Easy Questions?
ln((12,546,798 * π)) ^ 2 / 34,567.46 =
Owner Serial Number
David Jones 45322190-AK
Serial Number Type Invoice #
45322190-AK LapTop INV10895
10
Invoice # Vendor Payment
INV10895 MyBuy $104.56
David Jones
David Jones =
0.00885
Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,
Dave Jones
David Jones ≠
11. Hard Questions?
Computer programs are natively explicit, fast and exacting in their calculation over
numbers and symbols….But Natural Language is implicit, highly contextual,
ambiguous and often imprecise.
Person Birth Place
A. Einstein ULM
! Where was X born?
One day, from among his city views of Ulm, Otto chose a water color to
send to Albert Einstein as a remembrance of Einstein´s birthplace.
! X ran this?
Person Organization
J. Welch GE
If leadership is an art then surely Jack Welch has proved himself a master
painter during his tenure at GE.
Structured
Unstructured
12. The Jeopardy! Challenge: A compelling and notable
way to drive and measure the technology of automatic Question Answering along 5 Key
Dimensions
Broad/Open
Domain
Complex
Language
High
Precision
Accurate
Confidence
High
Speed
$200
If you're standing, it's the
direction you should look to
check out the wainscoting.
$600
In cell division, mitosis
splits the nucleus
cytokinesis splits this liquid
cushioning the nucleus
$1000
The first person mentioned
by name in ‘The Man in
the Iron Mask’ is this hero
of a previous book by the
same author.
$2000
Of the 4 countries in the
world that the U.S. does not
have diplomatic relations
with, the one that’s farthest
north
13. Basic Game Play
Technology Classics The Great
TECHNOLOGY
Outdoors
Speak of
the Dickens
Mind Your
Manners
Before and
After
$200 $200 $200 $200 $200 $200
$400 $400 $400 $400 $400 $400
$600 $600 $600 $600 $600 $600
$800 $800 $800 $800 $800 $800
$1000 $1000 $1000 $1000 $1000 $1000
6 Categories
5 Levels of
Difficulty
ALL POLICEMEN CAN THANK
STEPHANIE KWOLEK FOR HER
INVENTION OF THIS POLYMER
FIBER, 5 TIMES TOUGHER
THAN STEEL
q 1 of 3 Players Selects a Clue
q Host reads Clue out loud
q All Players compete to answer
q 1st to buzz-in gets to answer
q IF correct
Ø earns $ value
Ø selects Next Clue
q IF wrong
Ø loses $ value
Ø other players buzz again
(rebounds)
q Two Rounds Per Game + Final Question
q ONE Daily Double in First Round, TWO in 2nd Round
14. We do NOT attempt to anticipate all questions and build specialized databases.
3.00%
2.50%
2.00%
1.50%
1.00%
0.50%
14
Broad Domain
0.00%
In a random sample of 20,000 questions we found
2,500 distinct types*. The most frequent occurring 3% of the time. The distribution has a very long
tail.
And for each these types 1000’s of different things may be asked.
title
fruit
planet
there
person
language
holiday
he
film
group
capital
woman
song
singer
show
composer
Even going for the head of the tail will
barely make a dent
color
place
son
tree
line
product
birds
animals
site
lady
province
insect
way
founder
senator
substance
dog
maker
father
words
object
writer
novelist
heroine
disease
someone
form
dish
post
month
vegetable
hat
bay
countries
sign
*13% are non-distinct (e.g., it, this, these or NA)
Our Focus is on reusable NLP technology for analyzing volumes of as-is text.
Structured sources (DBs and KBs) are used to help interpret the text.
15. DeepQA: The Technology Behind Watson
Massively Parallel Probabilistic Evidence-Based Architecture
Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval,
Machine Learning and Reasoning Algorithms.
These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.
Answer
Scoring
. . .
Models
Answer
Confidence
Question
Evidence
Sources
Models
Models
Models
Models
Candidate
Answer
Generation
Models Primary
Search
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Final Confidence
Merging
Ranking
Synthesis
Answer
Sources
Question
Topic
Analysis
Evidence
Retrieval
Deep
Evidence
Scoring
Learned Models
help combine and
weigh the Evidence
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Question
Decomposition
1000’s of
Pieces of Evidence
Multiple
Interpretations
100,000’s Scores from
many Deep Analysis
Algorithms
100’s
sources
100’s Possible
Answers
Balance
Combine
16. Isaac Newton
Wilhelm Tempel
HMS Paramour
Christiaan Huygens
Halley’s Comet
Edmond Halley
Pink Panther
Peter Sellers
…
Example
Ques-on
Question
Analysis
Candidate Answer Generation
[0.58 0 -1.3 … 0.97]
[0.71 1 13.4 … 0.72]
[0.12 0 2.0 … 0.40]
[0.84 1 10.6 … 0.21]
[0.33 0 6.3 … 0.83]
[0.21 1 11.1 … 0.92]
[0.91 0 -8.2 … 0.61]
[0.91 0 -1.7 … 0.60]
Evidence
Scoring
IN 1698, THIS COMET
DISCOVERER TOOK A
SHIP CALLED THE
PARAMOUR PINK ON
THE FIRST PURELY
SCIENTIFIC SEA VOYAGE
Related Content
(Structured Unstructured)
Primary
Search
1) Edmond Halley (0.85)
2) Christiaan Huygens (0.20)
3) Peter Sellers (0.05)
Merging
Ranking
Evidence
Retrieval
Keywords: 1698, comet,
paramour, pink, …
AnswerType(comet discoverer)
Date(1698)
Took(discoverer, ship)
Called(ship, Paramour Pink)
…
17. One Jeopardy! question can take 2 hours on a single 2.6Ghz Core
Optimized Scaled out on 2,880-Core Power750 using UIMA-AS,
Watson is answering in 2-6 seconds.
Question
100s Possible
Answers
1000’s of
Pieces of Evidence
Multiple
Interpretations
100,000’s scores from many
simultaneous Text 100s sources Analysis Algorithms
Hypothesis
Generation
. . .
Hypothesis and
Evidence Scoring
Final Confidence
Merging
Ranking
Synthesis
Question
Topic
Analysis
Question
Decomposition
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Answer
Confidence
18. IBM’s Watson has been recognized as one of the most important
technology achievements of 2011
Video: Gartner The Future of Watson
Link: TED: Final Jeopardy and the Future of Watson
IBM's Watson: A HorizonWatching Trend Report
Forrester
IDC
18
“CIOs, business planners, enterprise architects, and
strategy teams should familiarize themselves with its
capabilities, and brainstorm ways in which human
decision processes can be supported” – Gartner (link)
“The impact of Watson…will be felt
far beyond the game show. This
technology could have significant
effect on business, government and
society.” – TED (link)
“Much of the technology that IBM built for Watson can
be deployed against other types of tasks besides
winning a Jeopardy game, to make solutions for these
tasks smarter. This technology addresses all of the
five A's of smart computing that we have identified, that
is, Awareness, Analysis, Alternatives, Actions, and
Auditability. ” – Forrester (link)
“What is thinking? What is intelligence? What is the role
that computers should and will play in our lives,
and what are the boundaries between humans and
computers? IBM's Watson demands that we reconsider
each of these questions” – IDC (link)
19. Watson – Additional Information and Resources
IBM's Watson: A HorizonWatching Trend Report
19
• AI Magazine:
Building Watson: An Overview of the DeepQA Project
• CIO Insight: IBM’s Watson: 11 Personal Apps
• eWeek: IBM’s Watson: The Future of Computing
• IDC: What is Watson: The IBM Jeopardy Challenge
• IBM’s Watson Portal: IBM Watson
• IBM: Watson press kit and Watson Facebook Page and
IBM Research: The DeepQA Project
• NY Times: What is IBM’s Watson?
• PBS Video: Smartest Machine on Earth
• Time: 10 Questions for Watson's Human
• Twitter: @IBMWatson and hasthag #ibmwatson
• YouTube: Watson playlist
• Wikipedia: Watson
“We believe this will be an
invaluable resource for our
partnering physicians and will
dramatically enhance the
quality and effectiveness of
medical care they deliver to our
members.” – Wellpoint (link)
20. Small Problem
20
Do it repeatedly and rapidly for core services
Data – Make data available freely; Give semantics to data
Access - APIs: standards based access, composition
Value – Give benefit to citizens; create business opportunities
21. Big Data
! Volume
! Variety
! Velocity
! Veracity
! …
Cartoon critical of big data application,
by T. Gregorius.
http://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/
Big_data_cartoon_t_gregorius.jpg/220px-Big_data_cartoon_t_gregorius.jpg
22. Open Data
! Open data is the notion that data should not be
hidden, but made available to everyone. The
idea is not new.
! Scientific publications follow this: “standing
on the shoulders of giants”
! Science stands for repeatability of results and
hence, sharing
! The scientific community asserts that open data
leads to increased pace of discovery.
(See: Ray P. Norris, How to Make the Dream Come True: The Astronomers' Data
Manifesto, At http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article, Accessed 2 Apr,
2012)
! Governments are the new source for open
data
! Data.gov efforts world-wide; 300+ governmental
bodies, including 20+ national agencies,
including India, have opened data
! In India, additional movement is “Right to
Information Act” 22
26. India: Right to Information Act
! Any citizen “may request information from a public authority (a
body of Government or instrumentality of State) which is
required to reply expeditiously or within thirty days.”
! Passed by Parliament on 15 June 2005 and came fully into force on 13
October 2005. Citation Act No. 22 of 2005
! Lauded and reviled
! Brought transparency
! Also,
! Increased bureaucracy
! Shortcomings in preventing corruption
! More information
! http://en.wikipedia.org/wiki/Right_to_Information_Act
! http://rti.gov.in
27. Does Opening Data Make It Reusable? No
Illustration
27
Source: http://5stardata.info/
1
2
3
4
5
28. Running Example – Temperature at
Conference Location
! Measurement System – Celsius, Fahrenheit, Kelvin, Color of
spectrum, …
! Indoor or Outdoor
! Indoor – should we need to capture events happening inside?
! Outdoor – should we have to capture predicted weather?
! Location - Latitude, Longitude, Address, Part of building
! Measuring equipment details
! Data quality - refresh rates, default values when equipment
broken
29. Data Quality in Public Data in India
! Right to Information
! Not even 1*
! Information available to requester, but no one else
! Data.gov.in
! 2-3*
! Available in CSV, etc but not uniquely referenceable
! Open data movements are moving to linked data
form for semantics
30. Linking of Open Data for Reusability
30
Source: http://lab.linkeddata.deri.ie/2010/star-scheme-
by-example/
Source: http://5stardata.info/
31. Illustration: W3C Organization
! Abstract:
This
document
describes
a
core
ontology
for
organiza-onal
structures,
aimed
at
suppor-ng
linked-‐data
publishing
of
organiza-onal
informa-on
across
a
number
of
domains.
It
is
designed
to
allow
domain-‐specific
extensions
to
add
classifica-on
of
organiza-ons
and
roles,
as
well
as
extensions
to
support
neighbouring
informa-on
such
as
organiza-onal
ac-vi-es.
1.
Introduc-on
2.
Conformance
3.
Namespaces
4.
Overview
of
ontology
5.
Design
notes
6.
Notes
on
style
7.
Organiza-onal
structure
7.1
Class:
Organiza-on
7.1.1
Property:
subOrganiza-onOf
7.1.2
Property:
transi-veSubOrganiza-onOf
7.1.3
Property:
hasSubOrganiza-on
7.1.4
Property:
purpose
7.1.5
Property:
hasUnit
7.1.6
Property:
unitOf
7.1.7
Property:
classifica-on
7.1.8
Property:
iden-fier
7.1.9
Property:
linkedTo
7.2
Class:
FormalOrganiza-on
7.3
Class:
Organiza-onalUnit
7.4
Notes
on
formal
organiza-ons
7.5
Notes
on
organiza-onal
hierarchy
7.6
Notes
on
organiza-onal
classifica-on
8.
Repor-ng
rela-onships
and
roles
8.1
Class:
Membership
8.1.1
Property:
member
8.1.2
Property:
organiza-on
8.1.3
Property:
role
8.1.4
Property:
hasMembership
8.1.5
Property:
memberDuring
8.1.6
Property:
remunera-on
8.2
Class:
Role
8.2.1
Property:
roleProperty
8.3
Property:
hasMember
8.4
Property:
reportsTo
8.5
Property:
headOf
8.6
Discussion
9.
Loca-on
9.1
Class:
Site
9.1.1
Property:
siteAddress
9.1.2
Property:
hasSite
9.1.3
Property:
siteOf
9.1.4
Property:
hasPrimarySite
9.1.5
Property:
hasRegisteredSite
9.1.6
Property:
basedAt
9.2
Property:
loca-on
10.
Projects
and
other
ac-vi-es
10.1
Class:
Organiza-onalCollabora-on
11.
Historical
informa-on
11.1
Class:
ChangeEvent
11.1.1
Property:
originalOrganiza-on
11.1.2
Property:
changedBy
11.1.3
Property:
resultedFrom
11.1.4
Property:
resul-ngOrganiza-on
A.
Change
history
B.
Acknowledgments
C.
References
C.1
Norma-ve
references
C.2
Informa-ve
references
http://www.w3.org/TR/vocab-org/
32. Usage of W3C’s Org Ontology – Community Directory
@prefix skos: http://www.w3.org/2004/02/skos/core# .
@prefix foaf: http://xmlns.com/foaf/0.1/ .
@prefix vcard: http://www.w3.org/2006/vcard/ns# .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix dir: http://dir.w3.org/directory/schema# .
@prefix directory: http://dir.w3.org/directory/orgtypes/ .
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix gr: http://purl.org/goodrelations/v1# .
@prefix org: http://www.w3.org/ns/org# .
foaf:primaryTopic #org .
#org a org:Organization, dir:Organization, gr:BusinessEntity, vcard:Organization
; rdfs:label International Business Machines
; gr:legalName International Business Machines
; vcard:organization-name International Business Machines
; skos:prefLabel International Business Machines
; dir:isOrganizationType directory:commercial
; vcard:url http://www.ibm.com
; vcard:logo http://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/200px-IBM_logo.svg.png
; rdfs:comment International Business Machines Corporation (NYSE: IBM), or IBM, is an American multinational technology
and consulting corporation, with headquarters in Armonk, New York, United States. IBM manufactures and markets computer
hardware and software, and offers infrastructure, hosting and consulting services in areas ranging from mainframe computers to
nanotechnology.
.
#org org:siteAddress #address-1NewOrchardRoad+Armonk+UnitedStates .
#address-1NewOrchardRoad+Armonk+UnitedStates a vcard:VCard, vcard:Address
; vcard:street-address 1 New Orchard Road
; vcard:locality Armonk
; vcard:country-name United States
; vcard:region New York
; vcard:postal-code 10504-1722
.
34. Peek into the Future - Amsterdam
34 http://citydashboard.waag.org/
35. Small Problem
35
Do it repeatedly and rapidly for core services
Data – Make data available freely; Give semantics to data
Access - APIs: standards based access, composition
Value – Give benefit to citizens; create business opportunities
36. API Example
36
http://www.programmableweb.com/api/sabre-instaflights-search
39. Business Capabilities as Services are being via APIs and delivered as-a-service,
allowing Businesses to engage with Clients and Partners with speed at Scale
Business
Source: Bessemer Venture Partners 2012
40. REST v/s Web Services?
40
REST
• support limited integration styles, and involves
fewer decisions on architectural alternatives
• This simplifies client-side integration steps (at
the cost of lessening automation in system
evolution); more focus on do-it-yourself
Source: Pautasso et al, RESTful Web Services vs. “Big” Web Services: Making the Right
Architectural Decision, WWW 2008
41. Running Example – APIs for Temperature
at Conference Location
! API examples
! Get temperature (input: current, last, input instant)
! Get temperature interval (input: day)
! Get average temperature (input: time range)
! REST or web-service
! Semantic annotation on input and output
42. Every citizen is a potential city event sensor
• Citizen notices 311 event worth reporting
• Reports event using mobile
• Launches mobile application
• Browses recent already-reported events
• Creates new event report
• [Is pre-enabled or gets any needed credentials to report event]
• Identifies service type for new event
• Shares location using mobile device (coordinates)
• Can add location annotations (road, district, city) and description
• Get confirmation of submission
• Get updates on service request
Extreme
Personalization
=
Location
Intelligence
Empowered
Citizen
+
Social
Analytics
+ +
42
ALLGOV SCENARIO: CROWDSOURCING 311* EVENT REPORTING
*311 data standard
• non-emergency events like graffiti,
garbage, down trees, abandoned car,
…; Not human life threatening
• 60+ cities support it world-wide;
demo works on 4 (Chicago, Boston,
Tucson – USA; Bonn – Germany),
and backend test of 10s more.
43. Browsing Services in One’s City:
Mary M. can look at the 311 services her city provides
On selecting the icon,
• She sees a small set of categories
(health, building, traffic, cityimage, others) around which all the
city’s services are grouped.
• She can look at a list of services and check out the agencies involved
• If there has been a change in agency responsible or new
services added for an agency, she can note that directly
Browsing Services in Other Cities:
Her colleagues from another city are visiting. She may want to bring a window
(instantiate an app with browse city pattern) to look at what that city offers to
their citizens
[Alternatively, if she is travelling to another city, she may be interested to know
how that city does compared to her’s, by which agency, etc.]
On selecting the icon,
• See sees a small set of familiar categories (health, building, traffic,
cityimage, others) regardless of what the city calls its services
• She can look at a list of services and check out the agencies involved
If her city does something different, she can show that to her colleagues in her
or other cities.
45. Applica-on
Pa]ern
! What
is
it?:
A
pa]ern
is
any
applica-on
using
APIs,
with
some
informa-on
generalized
(i.e.,
removed
and
parameterized)
! Business
Value:
A
pa]ern
! standardizes
the
usage
experience
by
promo-ng
similar
behavior
(for
users)
! simplifies
applica-on
development
by
templa-zing
API
interac-ons
(for
developers)
! serves
as
the
organiza-on’s
memory
of
the
best-‐prac-ces
in
developing
a
class-‐of-‐
applica-ons
even
when
the
specific
APIs
may
not
be
relevant
(for
business)
! Key
Technical
Issue
! What
pa]erns
should
one
build
?
Theore-cally,
there
exists
a
trivial
method
to
blindly
generate
a
pa]ern
from
any
applica-on.
Any
pa]ern
development
process
has
to
do
be]er
than
this
baseline.
! How
should
the
pa]erns
be
used
in
prac-ce?
! Building
a
tool-‐enabled
process
around
Pa]ern-‐based
programming
46. Applica-on
Pa]ern
! Approach
followed
in
AllGov
! Common
steps
taken
by
a
role
player
is
a
candidate
pa]ern
! Common
steps
that
can
be
executed
in
the
same
infrastructure
is
a
candidate
pa]ern
! Pa]ern
1:
Browse
city
services
pa]ern
[User
Role:
Govt.
Dept
Admin;
Environment:
PRODUCTION
system]
! find
a
city's
services
! find
a
service's
defini-on
! find
services
of
a
par-cular
high-‐level
category
(example:
building,
graffi-,
...)
! Pa]ern
2:
Create
service
request
pa]ern
[User
Role:
Developer;
Environment:
TEST
system]
! Browse
city
services
! Browse
raised
city
service
requests
! Create
a
new
service
request
! Pa]ern
3:
Create
service
request
pa]ern
[User
Role:
General
ci-zen
of
a
par,cular
City;
Environment:
PRODUCTION
system]
! Browse
city
services
! Browse
raised
city
service
requests
! Create
a
new
service
request
47. AllGov Scenario Deconstruction (flows)
Customer
Mobile
AllGov
City Services
1
2
External IBM Client
browse
events get recent events
Request
confirmation
get service types
create
request
Post location
coordinates
Post details on
Event, location
3
Notify service
completed
P1, P1+
P2, P3
48. Emerging Examples of
Societal Applications with
AI Techniques and Open Government Data
48
50. Why Tourism Matters
! Pros
! Promotes services jobs
! Helps upgrade infrastructure
! Gives alternative revenue source to government beyond
traditional agriculture and manufacturing
! Helps take local culture world-wide
! Promotes country image
! Cons
! Can lead to environmental impact if not planned well
! Can dilute local traditions and culture if unplanned
51. World Tourism in Numbers
Key Points
• In 2013, 1 billion people spent overnight in another city and spent
1 trillion USD
• Among oldest civilizations ( 5K years) in the world, of China, Egypt
and India, only China gets and sends tourists in top-5 by numbers
and money spent.
• Tourists go beyond language and history to spend their money for
novel experiences
Key Points for Africa and Middle East
• In 2013, there were over 55.7 million international tourist arrivals to
Africa, an increase of 5.4% over 2012.
• In 2013, there were over 51.5 million international tourist arrivals to
the Middle East, a decrease of 0.2% over 2012.
• Top countries are individually getting more tourists than Africa or
Middle-east as a whole (70-80M range v/s 50M-55M)
Tables Courtesy: http://en.wikipedia.org/wiki/World_Tourism_rankings (Accessed 20 Oct, 2014)
52. Top Cities
Tourists Visit
(by money spent)
Top cities are getting
money from tourists that
countries in Middle
East/ Africa are
planning by 2020
Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/
53. Top Cities in MEA
There is tremendous
scope to grow if things
are done differently
Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/
54. Possible Strategy to Promote Tourism
! Increase quality of experience for USPs using better
information availability. Examples:
! Service quality – Information on what is happening and what
to expect, when, at what cost; make it easy to consume
offerings
! Remove barriers to travel and spending - Remove perception
of lack-of-safety, increase transparency about supporting
services like roads, hospitals, taxis
! Promote domestic tourism in addition to international
tourism
! Helps natives inculcate service-industry culture, build capacity
55. City Concierge (CC): Serving People by Design
! Target users
! Citizens wanting to know more about their city
! Travellers planning to visit new cities with memorable experiences
! People (e.g., business, government) wanting to compare cities
! Group information along a small set of easy-to-follow categories
! We selected - Traffic, health, building, city image, others
! Easy to change to any set of categories
! Languages supported – English, Portuguese, Spanish, German
! Easy to extend to any
2nd place winner in Europe’s CitySDK App Hackathon in June 2014
Details: http://www.slideshare.net/biplavsrivastava/city-concierge-presentation10june2014
56. Serving People by Design
! Target users: Citizens, Travellers, People
Citizens, Travellers
Most events – Helsinki
Most open service requests - Lisbon
57. Check Services of Your
Favorite City – Chicago, in
example
Lisbon (in Portuguese)
Bonn(in German)
People, Travellers
Most city services – Lisbon; Traffic most common category in cities
58. CC Design Principles
! Focus on features that promote usage of city data
! Overcoming language barriers
! Overcoming API and data diversity barriers
! Highlight commonalities, promote comparison
! Follow standards
! CitySDK for tourism events upcoming
! Open 311 for city’s non-emergency services and service requests
! Programming level approach
! Overcome (City API) errors to stay useful
! Be resource efficient to promote mobile apps
! Standardize on output formats
60. Tourism Capacity Building
with Smarter Transportation
Details:
• Making Public Transportation Schedule Information Consumable for Improved Decision Making, Raj
Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE Annual Conference on
Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep 16-19, 2012.
• City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav
Srivastava, in 20th ITS World Congress 2013, Tokyo
61. Promoting Public Transportation: Before and After We Seek
Many cities around the world, and especially in India and emerging ones, are getting
their transportation infrastructure in shape.
– They have multiple, fragmented, transportation agencies in a region (e.g., city)
– They do not have instrumentation on their vehicles, like GPS, to know about their
operations in real-time
– Schedule of public transportation is widely available in semi-structured form. They
are also beginning to invest in new, novel, sensing technologies
– Cities give SMS-based alerts about events on the road.
Our approach seeks to accelerate time-to-value for such cities.
Kind of Information Today Available to
Bus User
With IRL-Transit+ Benefit
Bus Schedule (static) Available online and
pamphlets
Available from IT-enabled
devices( low-cost phones,
smart phones, web)
Increase accessibility
Bus Schedule Changes
(dynamic)
No information Infer from city updates Increase information
Analytics (Bus Selection
Decision Support)
No information Will be available (Transit)
Increase information
Standardization of
information
No support Will be supported
(SCRIBE, Transit)
Increase information’s
interoperability
62. Background: Public Transportation
Schedule Information
! Is widely available for public
transportation agencies around
the world
! Gives the basic, static,
information about
transportation service
! Usually in semi-structured
format with varying semantics
! Can have errors, missing data
63. Basic Solution Steps
! Use the widely available schedule information from individual operators
(agencies)
! Clean and consolidate it across agencies and modes to get a multi-modal
view for the region
! Optionally: Convert it into a standard form
! Optionally: Enhance (fuse) it with any real-time updates about
services for the region
! Perform what-if analysis on consolidated data
! Path finding using Djikstra’s algorithm
! Analyses can be pre-determined, analyses can also be user-created
and defined
! Make analysis results available as a service
! On any device
! To any subscriber
64. Multi-Mode Commuting Recommender in Delhi And Bangalore
64
Highlights
• Published data of multiple
authorities used; repeatable
process
• Multiple modes searched
• Preference over modes, time,
hops and number of choices
supported; more extensions, like
fare possible
• Integration of results with map
as future work; already done as
part of other projects, viz.
SCRIBE-STAT
65. Further Work*
! Invariant Inputs:
! The person
! has a vehicle (e.g., car), and
! can also walk short distances
! The city has taxis, buses, metros, autos, rickshaws
! Buses and metros have published routes, frequency and stops
! Autos and rickshaws can be available at stands, or opportunistically, on the road
! Taxis can be ordered over the phone
! Input:
! A person wants to travel from place A to B
! [Optional] City provides updates on ongoing events, some may
affect traffic
! Output
! Suggest to the person which mode or combination of modes to
select
! Observation: Using preferences over factors that matter to users
to keep commuting convenient, while making best use of
available public and para-transit commute methods
* City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav Srivastava, in 20th ITS
World Congress 2013, Tokyo
66. Number of SMS messages for bus stops in
Delhi for 2 years (Aug 2010 – Aug 2012)*
• 344 stops
with updates
• 3931 total stops
* using Exact Matching
67. IRL – Transit in Aug 2012
Key Points
• SMS message from city
• Event and location identified
• Impact assessed
• Impact used in search
68. Increase Accessibility and Availability of Bus Information to Passengers
Kind of
Information
Today
Available to
Bus Users
With Project in
Bangalore
Mysore ITS (for
reference)*
Benefit
Bus Schedule (static) Available online
and pamphlets
Available from low-cost
phones (Spoken
Web – Static)
Available online and
pamphlets
Increase
accessibility
Bus Schedule
Changes (dynamic)
No information
today
Will be available
(Spoken Web -
Human)
No information but in
plan
Increase
information
Bus Location No information
today
Will be available
(GPS)
Will be available
(GPS)
Increase
information
Bus Condition No information
today
Will be available
(Spoken Web -
Human)
No information today
Increase
information
Analytics (Bus
Selection Decision
Support)
No information
today
Will be available
(Transit)
No information but in
plan
Increase
information
Last –mile Connectivity
to/ from nearest stop
No information
today
Will be available
(Spoken Web -
Human)
No information today Increase
information
Standardization of
information
No support Will be supported
(SCRIBE, Transit)
Some support due to
GPS
Increase
information’s
interoperability
* Opinion based on only public information
69. Our End Vision: Information to Commuters to Reach Destination in All Eventuality
A Flexible Journey Plan
69
Pilots
running
in
Dublin,
Ireland
70. Resources
! Tutorial on AI-Driven Analytics In Traffic Management, in conjunction with International
Joint Conference on Artificial Intelligence (IJCAI-13), Biplav Srivastava, Akshat Kumar, at
Beijing, China, Aug 3-5, 2013 (tutorial-slides).
! Tutorial on Traffic Management and AI, in conjunction with 26th Conference of Association
for Advancement of Artificial Intelligence (AAAI-12), Biplav Srivastava, Anand Ranganathan,
at Toronto, Canada, July 22-26, 2012 (tutorial-slides).
! Making Public Transportation Schedule Information Consumable for Improved Decision
Making, Raj Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE
Annual Conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep
16-19, 2012.
! Mythologies, Metros Future Urban Transport , by Prof. Dinesh Mohan, TRIPP, 2008
! A new look at the traffic management problem and where to start, by Biplav Srivastava, In 18th
ITS Congress, Orlando, USA, Oct 16-20, 2011.
! Arnott, Richard and K.A. Small, 1994, “The Economics of Traffic Congestion,” American
Scientist, Vol. 82, No. 5, pp. 446-455.
! Chengri Ding and Shunfeng Song , Paradoxes of Traffic Flow and Congestion Pricing,
71. Tourism Capacity Building
with Corruption Prevention
Details:
• A Computational Model for Corruption Assessment, Nidhi Rajshree, Nirmit V. Desai and Biplav
Srivastava, IJCAI 2013 Workshop on Semantic Cities, Beijing, 2013 [Corruption-FormalModels]
• Open Government Data for Tackling Corruption – A Perspective, Nidhi Rajshree, Biplav Srivastava,
in AAAI 2012 Workshop on Semantic Cities, Toronto, July 2012. [Area: Open data-Corruption]
72. Corruption
“the misuse of public office for personal gains”
* Source: http://cpi.transparency.org/cpi2012/results/
Corruption afflicts both
public and corporate
services world wide. It is
known that it has
a significant negative
impact on the growth of
economies and hence, is
universally considered
undesirable.
Corruption : “Monopoly + Discretion –
Accountability” (Klitgaard, Robert E. Controlling
corruption. Berkeley: U. of California Press, 1988)
73. A Nation’s Competitiveness
and Corruption Perception
Don’t Go Hand-in-Hand
For Promoting Tourism,
Corruption Perception has to
be Removed
75. Some Key Questions Related to Corruption
• Exchange of money: can a service for which the customer does
not pay a fee (free service) be termed corrupt? Or conversely, can
a corrupt practice only happen if the customer pays for a service?
• Human agents: can a service be corrupt if the agent delivering
the service is not a human but an automated agent?
• Contention for resources: can corruption happen if delivering
it requires no contention of resources? Alternatively, if resources
are scarce, will an objective way of allocating them help remove
corruption?
76. Metamodel – Expressing Key Concepts for Corruption
Provider
Process
Organiza-on
Ac-vity
Escala-on
Inputs
Outputs
Task
Decision
Requestor
0..1
*
1
+
Person
1
1
1
1
1
1
1
*
Execu-on
Time
Process
Instance
*
Ac-vity
Instance
1
+
Execu-on
Cost
1
1
1
1
1
77. Framework Evaluation, by Example
National Registration - Kenya
1. Submit
supporting
documents
2.
Validate
docs
- Form 101
- Form 136 A
- Form 136 C
4. Handover
serialized
App Form
9. Receive waiting card and
11. App signed and
stamped by Chief
Asst. Officer
17.
Collect
ID Card
12. Submit
documents to
NRB
13. Verify
identity of the
applicant
14. Process
ID Card
- Proof of birth
- Proof of citizenship
- Proof of residence
5. Fill and submit
application form
6. Take finger
prints
7. Click photograph
for ID card
wait for processing
8. Handover the
waiting card
10. Submit
documents
to Chief
Insufficient
documents
Sufficient
documents
Ancestral home town is a
border district or age 18
16. Receive ID
Card from
NRB
3. Vetting 15. Send ID
card to the
Registration
- Additional proof of Office
residence
Registration Citizen
Officer
Satisfied
Not
satisfied
Vetting
Committee
Ch. Asst. Officer
NRB Officer
78. National Registration
Kenya India (Aadhar) USA (Social Security)
• The decision node, 3 - vetting, and the
activity, 13 - verify identity, are
discretionary with no clear mechanism
on how to accomplish them.
• In contrast, the checks for documents
having been submitted are objective.
• There is no Service Level Agreement
(SLA) for the process.
• The ID process is monopolistic since
only a single authority
• (registration office) can process it.
• The process has little reviewability and
low visibility since there is no
escalation mechanism.
• 18 Proofs of Identity (PoI) and 33
Proofs of Address (PoA) documents are
permitted for making the request.
• The process also allows discretion by
allowing at- tested documents from
high-level officials.
• The cost and time limits for the service
are prescribed.
• The process, however, can only be
handled by a single agency creating a
monopoly.
• In SS, a clear list of documents proving
US citizenship (or legal residence), age
and identity is listed.
• There is little room for discretion
because no category allows a signed
attestation by a high-level official to be
acceptable
• The cost and time limits for the service
are prescribed.
• The process, however, can only be
handled by a single agency creating a
monopoly.
79. Framework Evaluation, by Example
International Driving Permit (IDP)
1. Submit
supporting
documents
2. Validate
docs
5. Handover
Appl Form
10.
Stamp and sign the
IDP
13. Collect
IDP
- Driver’s license
- Passport
- Air tickets
- VISA
5. Fill and submit
application form
- Form CMV1
+
4. DL Address change
process
DL address not under RTO
jurisdiction
8.
Verify
applicants
driving skills
Insufficient
documents
DL address
under RTO
jurisdiction
Citizen
Front Desk Officer
Satisfied
Not
satisfied
Inspector
Regional
Transport
Officer
3. Validate
address
7. Send applicant for
DL Test
6.
Verify DL
issuance date
9. Send application to
Regional Transport Officer
12. Receive IDP
from Regional
Transport Officer
11.
Send IDP to front
desk officer
Address has
not changed
DL issued within 3
months
Address has
changed
DL issued within more
than 3 months
80. International Driving License
India (IDP) USA (AAA)
• Service execution cost is specified (of Rs
500) but not service execution time
given.
• There is no escalation mechanism
• The check whether all documents have
been sub- mitted is objective.
• The IDP is monopolistic since only a
single authority (RTO) can process it.
• The process has little reviewability and
low visibility since there is no
escalation mechanism.
Procedure involves filling a form online,
visiting the office of an authorized agency
with a valid state-issued driver’s license,
photos and fees, and getting the permit.
Here, there are multiple agencies to
process the request and the prerequisite
driver license can be verified objectively
(e.g., with social security databases).
• No monopoly
• Objective criteria
81. Tackling Corruption
Tackling corruption pro-actively:
! Open Gov. Data
! Increases transparency hence increasing the risk of being caught in the
act of corruption
! Makes measurements by SLAs possible
! Process Redesign
! Ensures a robust process design reducing corruption hotspots
! Reduce monopoly, discretion
! Automation
! Automation needs outcomes to be formally defined
! Reduces discretion, requires data (input, output, outcome) to be
adequately captured
Corruption : “Monopoly + Discretion – Accountability” (Klitgaard,
Robert E. Controlling corruption. Berkeley: U. of California Press, 1988)
82. Running Example – Potential Applications of
Temperature at Conference Location (Over Time)
! External temperature
! Environment models, weather forecasting, pollution
spread models, disease spread rates, …
! Internal temperature
! Energy management, security management, building
management, traffic management, …
! Temperature is unrelated to technical program. Imagine
what all can be enabled with conference’s technical
content if made machine consumable with APIs and
used for real applications ?
83. Call for Action
! Main message
! Use more open data in your research
! Build apps and make them out available
! Specifics
! Governments should
! Come out with data sharing/ disclosure policies, and
! Example: USA - US Executive Order 13556, Controlled Unclassified Information, At
http://www.whitehouse.gov/the-pressoffice/2010/11/04/executive-order-controlled-unclassifiedinformation
! Example: India - National Data Sharing and Accessibility Policy (NDSAP) at http://dst.gov.in/NDSAP.pdf
! Come out with specific application licensing guidelines
! Implement them!
! Academia must
! Lead research in this area
! Make their own data available in linked open form (LOD)
! Industry and standardization bodies should help
! by documenting best practices
! building necessary tools
! using open standards, and
! reporting case studies.
84. Dr. Biplav Srivastava, sbiplav@in.ibm.com
http://www.research.ibm.com/people/b/biplav/
Teşekkür ederim
Thank You
Merci
Grazie
Gracias
Obrigado
Danke
Japanese
French
Russian
Italian German
Spanish
Portuguese
Arabic
Traditional Chinese
Simplified Chinese
Hindi
Romanian
Korean
Multumesc
Turkish
English