Helping Haiti - a semantic
  web approach to crisis
information management
 a different translational informatics project
...
Questions of interest
                Has anyone done any expression
                studies using congenic rats?
        ...
Data hidden in plain sight
GEO + GMiner + OBA
GEO + GMiner + OBA
GEO Records


               Create Annotation
               Jobs & Queue Up

                        ...
Browse/Review Results
Browse/Review Results
Browse/Review Results
http://chicagofree.info/2010/01/public-service-haiti-pictures-via-twitter/
http://www.nytimes.com/interactive/2010/01/18/world/americas/0118-haiti-assess-maps.html
URGENT Christopher Frecynet is still alive under
his house. 64 Rue Nord Alexis.(RUELLE NAZON,
AVENUE POUPELARD
Mirna Nazaire lives in P-A-P at Bizoton 6#12.
Entire neighborhood without food. People are
dying.
French hospital is now open and ready to receive
the wounded at the french lycee in rue
marcadieux bourdon
Questions of interest
 Which hospitals are open?
          Who is in trouble
Does anyone have any tents?
       Where are ...
Who needs this info?
• Aid Agencies, Non-Governmental Organizations
   Red Cross, UN, etc.
• Military & other relief suppl...
Structured data in biology
http://epic.cs.colorado.edu/helping_haiti_tweak_the_twe.html
Main Hashtags
Data Tags
Keywords
Mirna Nazaire lives in P-A-P at Bizoton 6#12.
Entire neighborhood without food. People are
dying.



#haiti #need food #na...
French hospital is now open and ready to receive
the wounded at the french lycee in rue
marcadieux bourdon



#haiti #offe...
GMiner Ontologies
OWL Ontologies

 Classes - categories of things you care
 about


 Properties - attributes of the things
 you care about
‘Triples’ of data
  c1         p2       “Value”

subject   predicate   object
‘Triples’ of data
  c1            p2        “Value”

subject      predicate    object


simont     hasHairColor   brown
si...
TtT Classes
TtT Properties
TtT Properties
TtT Properties
Inference
trapped

           domain

          has_trapped
Inference
trapped                 The ontology asserts
                        that any thing that has
           domain  ...
Inference
trapped                    The ontology asserts
                           that any thing that has
           do...
Inference
trapped                     The ontology asserts
                            that any thing that has
           ...
Inference
trapped                     The ontology asserts
                            that any thing that has
           ...
tweetneed.org
RDF Graph
maison des anges                            Insulin

                                 has need
          has con...
RDF Graph
7953197721
                                           Delmas

              twitter id
                         ...
7953197721
                                                                                                        Delmas
...
7953197721
                                                                                         Delmas

              ...
Duplication
              1.21pm
Duplication
              1.21pm




                   31x




              9.29pm
Using MD5 Hashes

         simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c
Using MD5 Hashes

          simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c

          haiti loc info


26e7c844f0c80a886...
Using MD5 Hashes

          simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c

          haiti loc info


26e7c844f0c80a886...
Using MD5 Hashes
          rt @baybe_doll: #haiti #need help #name mr. bernard jean
          louis #loc lumiere evangelic...
Who is using tweet data?



  http://haiti.ushahidi.com
                                 http://haiti.sahanafoundation.org...
How to integrate?
How to integrate?
                               Topi
                                                    Topi
           ...
How to integrate?
                               Topi
                                                                    ...
How to integrate?
                               Topi
                                                                    ...
Timeline of incident reports at haiti.ushahidi.com
       January 12th - February 4th 2010
Timeline of incident reports at haiti.ushahidi.com
       January 12th - February 4th 2010
Crisis Commons
Helping Haiti - a semantic web approach to crisis information management
Helping Haiti - a semantic web approach to crisis information management
Helping Haiti - a semantic web approach to crisis information management
Helping Haiti - a semantic web approach to crisis information management
Helping Haiti - a semantic web approach to crisis information management
Upcoming SlideShare
Loading in …5
×

Helping Haiti - a semantic web approach to crisis information management

973 views
937 views

Published on

Slides & Notes (which can be shown below slides) from a recent presentation I gave outlining some ideas on how we could utilize some of the tools and approaches being developed in bio/clinical informatics to assist in data analysis and integration in crises such as the Haiti earthquake. This is a 'straw man', I can see reasons for and against this approach so I thought I'd throw it out for comment in the hope that others can help me refine it to the point where it could be useful.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
973
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • We build databases that help researchers get access to and use rat data. Here’s a selection of problems that many rat researchers face, trying to answer questions based on masses of data that is too prolific to read, hard to get to, inconsistently organized and hard to integrate.
  • NCBI’s Gene Expression Omnibus has a lot of relevant data, either as text or raw data. Can we start to capture some of this informaiton in an informatically-tractable fashion using ontologies and the OBA tools at the National Center for Biomedical Ontology in an annotation pipeline? The red boxes highlight some concepts of interest - rat strains and tissues being used in this experiment. A human can read these and know whats going on but what about a computer?
  • We built a pipeline to take snippets of text from GEO records, fire them off into a queue and have them annotated by various ontologies at NCBO. The results are returned to another queue and loaded into the database. We then do a manual review of the automated annotations (not shown here) using a customized curation interface.
  • Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
  • Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
  • Here’s a different area of need - the Haiti earthquake from mid January.
  • This is the type of information that started flowing across Twitter very soon after the earthquake hit.
  • This has valuable information but again, there is a lot of it, its unstructured and hence hard to a computer to pull out actionable data.
  • Here are the questions facing organizations and individuals in haiti and around the world providing support
  • Lots of people need the information but pulling it out opf plain text tweets is hard
  • We already have somewhat structure information in biological databases, etc. there is still a lot of free text but at least we know what’s being talked about which makes interpretation somewhat easier. Nothing like this existed in the twitterverse until...
  • The UC Boulder team came up with TweakTheTweet (TtT) as a way to structure tweets to get more information out of it
  • Based on the tags, we can pull out information - but it still relatively unstructured. What is a ‘need’, does something tagged as ‘Need’ on this site mean the same as ‘Need’ on another site, is the Loc a lat/long, a house address, etc? Can we use ontologies as we used for biological data in GMiner to add structure and facilitate interpretation? Do these ontologies exist? No.
  • Here are two of the ontologies we are using in GMiner - they list concepts of interest related to inbred Rat strains and mouse anatomy. These form the controlled vocabularies of relevant facts that we use to go looking in the plain text of a GEO record. Ontologies provide a more structured format for the concepts of interest and go beyond keyword lists as a way to organize and analyze annotated data.
  • OWL ontologies have two main types of thing - Classes (things you care about) and Properties (attributes of the things you care about)
  • The data is expressed in triples- a subject (the thing we are talking about), predicate (what type of info we are talking about, the property it posseses) and object (the information we know about the subject). Here are some examples relating to me...
  • Created at TweakTheTweet ontology using Protege and RDF/OWL
  • tag_terms are used to store the potential text matches in tweets - english and french (and more to come?)
  • tag_terms are used to store the potential text matches in tweets - english and french (and more to come?)
  • Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  • Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  • Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  • Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  • I put together a simple ruby on rails site at http://tweetneed.org that has been grabbing tweets since around 19th of January using the main TtT hashtags. Its certainly not a complete set but I’ve been using this as a platform for exploring the data and developing approaches to filter the tweets and extract useful information.
  • Parse the tweet into useful information, trying to pull out as much useful data as possible - now have lat and long as specific fields, etc and each set of data is expressed as a triple - subject (the tweet), predicate (the property of interest) and the value. These triples can be dumped out as RDF (N-triples) and placed into a triple store.
  • RDF data is a graph of nodes and edges - nodes are the subjects and objects, the edges correspond to the predicates in the RDF.
  • This other tweet can be parsed to extract its relevant data - this one also contains Insulin as a need.
  • The RDF generated for this tweet also corresponds to a graph of nodes and edges.
  • Graphs can self-assemble based on shared properties, plus inference and Reasoners can be used to infer new class membership and organize the data in other ways as needed.
  • Graphs can self-assemble based on shared properties, plus inference and Reasoners can be used to infer new class membership and organize the data in other ways as needed.
  • A feature of Twitter is that people can retweet an existing tweet - this is good in that the retweet will probably reach a different audience than the original, however, in a crisis this results in a lot of duplicated data that has to be filtered through. Here’s just one tweet from the tweetneed.org database that I have in there 31x over a period of 8 hours and Im sure that is missing some as the original tweet is a RT.
  • MD5 algorithm takes a text string and generates a unique alphanumerical string based on that text string - change any character and the MD5 value changes. Helpful as a way to take a variable length string of characters and boil it down to a unique, fixed length identifier. Often used to sign digital files - change one bit in a file and the MD5 checksum changes so you can detect if its the slightest bit different from the official value.
  • MD5 algorithm takes a text string and generates a unique alphanumerical string based on that text string - change any character and the MD5 value changes. Helpful as a way to take a variable length string of characters and boil it down to a unique, fixed length identifier. Often used to sign digital files - change one bit in a file and the MD5 checksum changes so you can detect if its the slightest bit different from the official value.
  • MD5 hash of the full text shows there are 3 other tweets in the database that are identical copies - this gets rid of obvious duplicates but its very conservative. Using just the hash tags isnt much good - 13 other tweets and many are different from this one, too promiscious. Using the keywords from the tweet (remove hashtags, @names, stop words and other short strings and take what is left) does a better job, identifying 4 duplicates.
  • You can explore how this is working for a particular tweet on tweetneed.org. The tweets identified by the keyword hash include the original tweet
  • A variety of organizations and groups are following the Tweet stream and extracting useful facts that are stored in their local databases, here’s just a few with Ushahidi and Sahana being two of the more central locations. Ushahidi is developing Swift River, a specific app to filter the stream of information from Twitter and other sources. This is still in development.

    Some benefits of multiple organizations tracking the same source of data is that they may each add unique and useful information to the original source - one site may verify the info, another may find the lat/long of the location, one site may have other info that increases the urgency of a particular report. They may also serve and reach different communities with different needs. However, one downside is that each organization may do the same work multiple times and the new info added by one organization may not be available to the others.

    Bringing this all back together to avoid duplication of effort and share data is not a trivial task.
  • One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  • One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  • One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  • For some perspective, here’s an animated timeline of incident reports flowing into the haiti.ushahidi.com site since January 12th. These reports come from a wide variety of sources, SMS messages, individuals entering data on the ushahidi website and also Twitter.
  • TtT is one of a variety of Crisis Commons projects where developers from around the world are volunteering and getting engaged building software, some related to Haiti but also for use in the next crisis that comes along.
  • Helping Haiti - a semantic web approach to crisis information management

    1. 1. Helping Haiti - a semantic web approach to crisis information management a different translational informatics project Simon Twigger, Ph.D.
    2. 2. Questions of interest Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? What expression data is Are any of these genes known for SD (aka SD/NHsd, Harlan Sprague Dawley, associated with my Sprague Dawley) rats? phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)?
    3. 3. Data hidden in plain sight
    4. 4. GEO + GMiner + OBA
    5. 5. GEO + GMiner + OBA GEO Records Create Annotation Jobs & Queue Up Q-Out 1..n Annot. Workers RabbitMQ Index text at OBA Parse Q-In Results Results saved to Put results in to GMiner database queue for save
    6. 6. Browse/Review Results
    7. 7. Browse/Review Results
    8. 8. Browse/Review Results
    9. 9. http://chicagofree.info/2010/01/public-service-haiti-pictures-via-twitter/
    10. 10. http://www.nytimes.com/interactive/2010/01/18/world/americas/0118-haiti-assess-maps.html
    11. 11. URGENT Christopher Frecynet is still alive under his house. 64 Rue Nord Alexis.(RUELLE NAZON, AVENUE POUPELARD
    12. 12. Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
    13. 13. French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
    14. 14. Questions of interest Which hospitals are open? Who is in trouble Does anyone have any tents? Where are the open roads? Any information on Person ABC? What help is needed?
    15. 15. Who needs this info? • Aid Agencies, Non-Governmental Organizations Red Cross, UN, etc. • Military & other relief suppliers • Individuals in Haiti • Donors - matching needs to offers • etc.
    16. 16. Structured data in biology
    17. 17. http://epic.cs.colorado.edu/helping_haiti_tweak_the_twe.html
    18. 18. Main Hashtags
    19. 19. Data Tags
    20. 20. Keywords
    21. 21. Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying. #haiti #need food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
    22. 22. French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon #haiti #offering hospital rooms #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive
    23. 23. GMiner Ontologies
    24. 24. OWL Ontologies Classes - categories of things you care about Properties - attributes of the things you care about
    25. 25. ‘Triples’ of data c1 p2 “Value” subject predicate object
    26. 26. ‘Triples’ of data c1 p2 “Value” subject predicate object simont hasHairColor brown simont inOfficeNumber H8808 simont hasPhoneNumber 456-1234
    27. 27. TtT Classes
    28. 28. TtT Properties
    29. 29. TtT Properties
    30. 30. TtT Properties
    31. 31. Inference trapped domain has_trapped
    32. 32. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class
    33. 33. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet:123 #haiti #trapped 5 people #loc Pap
    34. 34. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet 123 “5 people” Tweet:123 #haiti #trapped 5 people #loc Pap
    35. 35. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet 123 “5 people” Tweet:123 #haiti #trapped 5 people #loc Pap
    36. 36. tweetneed.org
    37. 37. RDF Graph maison des anges Insulin has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
    38. 38. RDF Graph 7953197721 Delmas twitter id has location tweet:7953197721 has need has need Insulin medication
    39. 39. 7953197721 Delmas twitter id has location tweet:7953197721 has need has need Insulin medication maison des anges Insulin has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
    40. 40. 7953197721 Delmas twitter id has location tweet:7953197721 has need has need maison des anges Insulin Insulin medication has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
    41. 41. Duplication 1.21pm
    42. 42. Duplication 1.21pm 31x 9.29pm
    43. 43. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c
    44. 44. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c haiti loc info 26e7c844f0c80a8860d6835591117639
    45. 45. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c haiti loc info 26e7c844f0c80a8860d6835591117639
    46. 46. Using MD5 Hashes rt @baybe_doll: #haiti #need help #name mr. bernard jean louis #loc lumiere evangelical chapel at rue midway 22 in carrefour #contact phone # 3778-2506 3d609759195d03a059baca1e063be4eb [3] contact haiti loc name need b767eeb9c16e74bfb22ee6ec0998a670 [13] help bernard jean louis lumiere evangelical chapel rue midway carrefour ... d4dc5272669dee93721b4c005307cfc7 [4]
    47. 47. Who is using tweet data? http://haiti.ushahidi.com http://haiti.sahanafoundation.org/ http://swift.ushahidi.com/ http://haiti.managingnews.com/
    48. 48. How to integrate?
    49. 49. How to integrate? Topi Topi c Topi c c Topi c Topi c Topi c Topi c Topi c Topi c Topi c
    50. 50. How to integrate? Topi Topi c Topi c c Topi c Topi c Top c i Topi Top c c Topi i c Topi Topi c Top c c i Topi Topi c c To Topi p c i Topi c To p c c i Topi Topi c c
    51. 51. How to integrate? Topi Topi c Topi c c Topi c Topi c Top c i Topi Top c c Topi i c Topi Topi c Top c c i Topi Topi c c To Topi p c i Topi c To p c c i Topi c Topi Topi c c
    52. 52. Timeline of incident reports at haiti.ushahidi.com January 12th - February 4th 2010
    53. 53. Timeline of incident reports at haiti.ushahidi.com January 12th - February 4th 2010
    54. 54. Crisis Commons

    ×