Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013

Uploaded on

El impacto de open data en el mundo y en Venezuela. Profesora Maria Esther Vidal. Universidad Simón Bolivar. Presentacion realizada durante el boot camp sobre periodismo de datos-Venezuela.

El impacto de open data en el mundo y en Venezuela. Profesora Maria Esther Vidal. Universidad Simón Bolivar. Presentacion realizada durante el boot camp sobre periodismo de datos-Venezuela.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Impact of Open Data and Linked Open Data Venezuela Maria-Esther Vidal Universidad Simón Bolívar   h1p://   Twi1er  @Maria11576561   Skype:  mevs2006   1  
  • 2. Lights  around  the  London's  2012  Olympic  stadium  describe  Sir  Tim  Berners-­‐Lee's  invenKon,  the     World  Wide  Web.  The  Open  Data  InsKtute,  which  he  co-­‐founded,  declares  a  mandate  of    'Knowledge  for  Everyone'.  
  • 3. “The  ODI  announced  new  13  nodes:   US,  Canada,  France,  Dubai,  Italy,   Russia,  Sweden  and  ArgenKna.”   Oct  29    2103   Sir  Tim  Berners-­‐Lee  (right)  and  Sir  Nigel  Shadbolt  (leT)  
  • 4. Agenda Ø Open Data Ø Linked Open Data ü Linked Open Data in Journalism Ø Linked Open Data Applications ü Linked Open Data at USB Ø Conclusions and Future Directions
  • 5. OPEN  DATA  
  • 6. Open Data Definition “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” Open_Data_stickers.jpg 1,024×768 pixels 7/1/13 9:33 PM Availability and access Reuse and Distribution Universal Participation 6  
  • 7. Open Data  Availability and Access: Data should be available as a whole, preferably downloading via the Internet. Data should be available in a convenient format. Should be free or at most at a reproduction cost. 7  
  • 8. Open Data   Reuse and Distribution: Data should be offered in a way that it can be reused, distributed and be interrelated with other datasets. 8  
  • 9. Open Data   Universal Participation: Any person should be able to use, reuse and distribute. NO discrimination: Commercial vs. NOT commercial Educational vs. NOT educational Profit vs. No Profit 9  
  • 10. Type of Open Data
  • 11. Why Open Data? Interoperability   Transparency   11  
  • 12. Why Open Data? Avoid  CorrupKon   Wealth   Only  in  Europe  over  140  billion  of  euros  per  year   h1p://­‐making-­‐official-­‐data-­‐publi could-­‐spur-­‐lots-­‐innovaKon-­‐new-­‐goldmine   12  
  • 13. Why Open Data? Research  and  Development   Quality  of  Life   13  
  • 14. Why Open Data? Improve  Public  AdministraKon   Data  Quality   14  
  • 15. Why Open Data? Citizens can express themselves and unite so that their voices can be heard. 15   h1p://  
  • 16. Open   Licenses   Open  Data   Open   ParKcipaKon   Open  Source   Open   Standards  
  • 17. What is and what is not Open Data Open  Data.   “A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.” Difference between open data and data that is publicly available lies in the use of formats that may be read, used and redistributed by any citizen. Examples of public data that is not open data: data in spreadsheets, pdf, etc. Usually open data are csv. h1p://­‐“open-­‐data”-­‐means-­‐–-­‐and-­‐what-­‐it-­‐doesn’t  
  • 18. Opening Up Data Rules Ø  Keep it simple Ø  Engage early and engage often Ø  Address common fears and misunderstandings Four Steps Ø  Choose your Dataset(s) Ø  Apply an Open License Ø  Make the data available Ø  Make it discoverable
  • 19. Open Data Conditions Data Providers Requirements Distributing Open Data Ø  Attribution: data providers may require to receive credit. Ø  Integrity: data providers may require that users indicate if data change. Ø  Share-alike: data providers may impose that any dataset created using their data are also open. Ø  Data is machine-readable Ø  Data is available in bulk more than using an API. h1p://  
  • 22. Some Open Data Applications Al  menos  77  países  cumplen  nivel  >  2   h1p://   22  
  • 23. Why Open Data? Citizens may unite so that their voices can be heard
  • 24. Why Open Data? Monrovia  África   h1p://
  • 25. Why Open Data? Tanzanía   h1p://
  • 26. Why Open Data? Tanzanía   h1p://
  • 27. Why Open Data? Tanzanía   h1p://
  • 28. h1p://­‐deprivaKon  
  • 30. Vaccines and Immunisation in Australia h1p://­‐vaccinaKon-­‐australia-­‐map  
  • 31. h1p://  
  • 32. Applications of Open Data   h1p://­‐data-­‐applicaKons/   Kenia  
  • 33. OPEN  DATA  AND  SOCIETY    
  • 34. ApplicaKons  of  Open  Data   h1p://  
  • 35. h1p://  
  • 37. h1p://   Who Owns Who
  • 38. h1p://   Who Owns Who
  • 40. Why Open Data? Improve  Public  Services   Smart  CiKes   Mejorar  la   Administración  Pública   40  
  • 41. h1p://  
  • 42. h1p://­‐screen/27036  
  • 43. h1p://­‐screen/27036  
  • 44. Applications of Open Data   h1p://    
  • 45. LINKED  OPEN  DATA   h1p://  
  • 46. What to do with Open Data? 46  
  • 47. What to do with Open Data? At  least  77  countries  comply  level  >  2   h1p://­‐en-­‐el-­‐ararteko   47  
  • 48. What to do with Open Data? At least 11 countries comply level > 4 h1p://­‐en-­‐el-­‐ararteko   48  
  • 49. What to do with Open Data?   h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare#!  
  • 50. h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare#!  
  • 51. Bottom 10 by Open Data Index Score   h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare  
  • 52. Local  Governments  must  use   Open  Data  to  stay  connected   with  the  ciKzens!  
  • 53. MoKvaKon  SemanKc  Web  EvoluKon   The Linked Open Data cloud, using the Web to connect related data that was not previously linked! Published Data are enhanced with semantics! Standards to annotate and describe data: XML, RDF, RDFS, OWL. Standards to query data: SPARQL. Ontologies representing almost any domain. Hyperlink-based systems. Protocols: http, uri, html Documents and data were published Arpanet: four servers connected Files were transferred Tools: ftp, telnet, e-mail 80’s IRMLs  2010-­‐ESWC  2010   90’s 00’s Now
  • 54. The Linked Open Data Cloud • Explosion in the number of:     – Linking Open Data resources and databases – Different quality parameters. Molecular databases 1170, 95 more – Controlledthan 2008 and 110 more than the year before ! vocabularies: – MeSH, GO, PO… tools published Services and – Highly interconnected by these databases follow a similar progression! data sources: In October 2007, Cloud of Linked Data Different Sizes datasets consisted of over two billion RDF triples, Many links which were interlinked by over two million RDF links. • Different in- and outBy May 2009 this had grown to 4.2 billion RDF triples, degrees, etc interlinked by around 142 billions RDF links! Today • Biological Web: large the Linked Open Data cloud has at least 295 datasets, datasets of linking data. 31,634,213,770  triples, and 503,998,829  links. • Genes, Diseases, Clinical Drugs, Proteins, and so on.
  • 55. StaKsKcs  
  • 57. Open Data in Journalism   Ø It may be trendy but not new. Ø Open Data implies Open Data Journalism. Ø Data is not necessarily curated. Ø Bigger Datasets and Small Things. Ø Data Journalism is 80% perspiration, 10% great ideas, 10% output. Ø Long and short-form. Ø Anyone can do it. Ø Visualization is important. Ø Data publishers do not have to be programmers. Ø It is all about stories. h1p://­‐journalism  
  • 58. Breaking   News   Open   Data   Running   Events   Shared  Data   Open Data in Journalism
  • 59. Breaking   News   Open   Data   Open Data in Journalism Running   Events   Shared  Data   •  Data Cleansing •  Conflict Resolution Data  IntegraKon   SemanKficaKon   •  Meta-Data Annotation •  Vocabularies •  Visualization •  Publishing the Story PublicaKon  
  • 60. Meta-Data BBC News   This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories. h1p://­‐Linked-­‐Data-­‐Ontology  
  • 61. Data Management ToolsBBC News   h1p://­‐Data-­‐ConnecKng-­‐together-­‐the-­‐BBCs-­‐Online-­‐Content  
  • 62. h1p://­‐data-­‐on-­‐the-­‐bbc-­‐2638734  
  • 63. More   Ontologies  to   represent   Meta-­‐Data  
  • 65. Challenges  for                                                                    Linked  Data   Visualization   •  Enabling  user  interacKon   –  Users  must  be  able  to  navigate  through  the  data  by  exploiKng  the   connecKons  between  Linked  Data  resources   –  The  user  might  edit  the  underlying  data  to  enrich  it  by:     •  CreaKng  addiKonal  metadata   •  HighlighKng  or  correcKng  errors   •  ValidaKng  data   •  SupporKng  data  reusability   –  The  output  (the  plo1ed  data  or  the  visualizaKon  itself)  might  be   encoded  using  standard  ontologies  and  vocabularies       •  Scalability   –  Linked  Data  visualizaKon  techniques  should  support  the  display  of   large  amount  of  data  in  an  efficient  way   EUCLID  –  InteracKon  with  Linked  Data   74  
  • 66. Challenges  for                                                                    Linked  Open   Data  Visualization   •  ExtracKng  data  from  different  repositories   –  A  Linked  Data  set  might  be  parKKoned  into  several  repositories     –  The  region  of  interest  (ROI)  might  include  data  from  different  data   sets,  requiring  the  access  to  distributed  repositories   •  Handling  heterogeneous  data   –  The  same  data  (concepts)  might  be  modeled  differently,  for  example,   using  different  vocabularies   –  Certain  values  might  have  different  formats,  for  example,  dates   represented  as  DD-­‐MM-­‐YYYY,  MM-­‐DD-­‐YYYY  or  just  YYYY   •  Dealing  with  missing  values   –  Due  to  the  semi-­‐structuredness  of  Linked  Data,  some  instances  might   have  missing  values  for  certain  properKes   EUCLID  –  InteracKon  with  Linked  Data   75  
  • 67. Linked  Open  Data  VisualizaKon  Techniques     View   EUCLID  –  InteracKon  with  Linked  Data   76  
  • 68. Comparison  of                                                                                                       A1ributes  /  Values   Bar/column  chart     Pie  chart   Allows  the  comparison  of  values  of   different  categories.       Useful  for  performing  comparison   of  percentages  or  proporKons.       Image  source:  h1p://   Image  source:  h1p://       Line  chart   Histogram   Allows  visualizing  data  as  a  series  of   data  points,  where  the  measurement   points  (x-­‐axis)  are  ordered.       Graphical  representaKon  of  the   distribuKon  of  the  data.     Image  source:  h1p://       Image  source:  h1p://   EUCLID  –  InteracKon  with  Linked  Data   77  
  • 69. Analysis  of                                          RelaKonships  and   Hierarchies     Graph     Arc  diagram   The  data  entries  are  represented  as   nodes  and  the  links  as  edges.         The  nodes  are  displayed  in  one   dimension,  and  the  arcs  represent   the  connecKons.       Adjacency  Matrix  diagram   Node-­‐link  visualizaKons   The  nodes  are  displayed  as  rows  and   columns,  and  the  links  between  the   nodes  are  entries  in  the  matrix.   The  data  is  organized  in  hierarchies.     Source  of  images:  h1p://       EUCLID  –  InteracKon  with  Linked  Data   78  
  • 70. Analysis  of                                          RelaKonships  and   Hierarchies  (2)     Space-­‐filling  techniques   Treemaps   Icicles  and  sunburst   Subdivide  area  into  rectangles.   Hierarchies  are  represented  by   adjacencies.     Circle-­‐packing       Rose  diagrams   Containment  is  used  to  represent  the   hierarchies.   Areas  are  equal  angles  and  the  data   is  represented  by                                                             the  extension  of                                                                                       the  area.   Source  of  images:  h1p://       EUCLID  –  InteracKon  with  Linked  Data   79  
  • 71. Analysis  of    Temporal  or  Geographical   Events     ConKnuous  data  in  Kme   Timeline     Discrete  data  points  in  Kme   Source:  h1p//   Source:  h1p://­‐movie-­‐box-­‐office-­‐chart   Display  geo-­‐points  on  a  map   Choropleth  maps   Dorling  cartograms   Aggregate  data  by   geographical  area   Aggregate  data  and  replace   each  area  with  a  circle     Maps   LocaKon  maps   Source:  Google  Map  API   Source:  h1p//   EUCLID  –  InteracKon  with  Linked  Data   Source:  h1p://       80  
  • 72. Libraries     h1ps://  
  • 74. Tasks  to  be  Solved  …   Traverse and Consume Linked Data from the LOD cloud or locally. SPARQL endpoints have been developed to access data from the LOD cloud. 83  
  • 76. select  disKnct  *  where  {<h1p://>  ?p  ?o}    
  • 77. h1p://   All  the  informaKon  related  to  Venezuela   SPARQL  Query  
  • 78. h1p://  
  • 79. SPARQL  Endpoint    URL   SPARQL  Query  
  • 80. SPARQL  Query  
  • 81. Data:   foaf:made   dbpedia:   The_Beatles   foaf:made   foaf:made   <h1p:// record/...>   dc:Ktle   <h1p:// record/...>   dc:Ktle   "Help!"   "Abbey  Road"   <h1p:// record/...>   dc:Ktle   "Let  It  Be"  
  • 82. SELECT  ?x  ?name  ?mbox  ?country  ?reviewer  ?product  ?title   WHERE  {     <http://www4.wiwiss.fu-­‐ Review2883011>  rev:reviewer  ?x  .        ?x  <­‐rdf-­‐syntax-­‐ns#type>  <>  .      ?x  <>  ?name  .      ?x  <>  ?mbox  .      ?x  <http://www4.wiwiss.fu-­‐>  ?country  .      ?reviewer  <>  ?x  .      ?reviewer  <http://www4.wiwiss.fu-­‐>  ?product  .      ?reviewer  <>  ?title  }  
  • 83. Graph  Databases  
  • 85. ANAPSID   SPARQL-DQP   Federations of Endpoints
  • 86. h1ps://   Federated Queries ANAPSID   “Genes and diseases that have been studied for drugs tested in clinical trials where Breast Cancer was studied” SELECT  DISTINCT  ?D1?TGD  ?GN1  ?GN2   WHERE  {    ?CT1  <>  ?C1  .    ?CT1<>  ?I  .      ?CT1<>  ?I  .    ?I<>  "Drug"  .    ?C1  <­‐schema#seeAlso>  ?D1  .    ?I  <­‐schema#seeAlso>  ?I1  .    ?C  <>  "Breast  Cancer"  .    ?CT  <>  ?I  .    ?CT  <>  ?A4  .    ?II  <http://www4.wiwiss.fu-­‐>  ?TGD  .    ?TGD  <http://www4.wiwiss.fu-­‐>  ?GN1  .    ?D1  <http://www4.wiwiss.fu-­‐>  ?GN2  .   }   Life Sciences Query: 97  
  • 87. Federated Queries h1ps://   ANAPSID   SELECT  DISTINCT  ?D1  ?TGD  ?GN1  ?GN2   WHERE  {            {  SERVICE  <>  {                    ?C1  <>  "Breast  Cancer"  .                    ?C1  <­‐schema#seeAlso>  ?D1  .                    ?C3  <­‐schema#seeAlso>  ?D1  .                    ?CT3  <>  ?C3  }}  .              {  SERVICE  <>  {                    ?C1  <>  "Breast  Cancer"  .                    ?I  <>  "Drug"  .                    ?CT1  <>  ?C1  .                    ?CT1  <>  ?I  }}  .              {  SERVICE  <http://www4.wiwiss.fu-­‐>  {                    ?I1  <http://www4.wiwiss.fu-­‐>  ?TGD  .                    ?TGD  <http://www4.wiwiss.fu-­‐>  ?GN1  }}  .              {  SERVICE  <>  {                    ?I  <>  "Drug"  .                    ?I  <­‐schema#seeAlso>  ?I1  .                    ?CT3  <>  ?I  .                    ?CT3  <>  ?C3  }}  .     }     S1:   S2:   S3:   S4:   98  
  • 88. Federated Queries h1ps://   ANAPSID   S1   S2   S3   S4   99  
  • 89. ANAPSID   ANAPSID   ANAPSID   h1p://  
  • 90. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   101  
  • 91. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   102  
  • 92. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   103  
  • 94. Tasks  to  be  Solved  …(2)   Patterns of connections between people to understand functioning of society.
  • 95.    
  • 96.   
  • 97.  
  • 98. 
  • 99.    
  • 100.  
  • 101.     ! "#
  • 102.  $%%  # 
  • 103. 
  • 104. & 
  • 105. 
  • 106.  Topological properties of graphs can be used to identify patterns that reveal phenomena, anomalies and potentially lead to a discovery. A significant increase of graph data in the form of social & biological information. 105  
  • 107. Annotation Graph 107  
  • 108. Pa1erns  or  Signatures   Brentuzumab_vedoKn   And  Catumaxomab     108  
  • 109. Annotation Similarity between two genes based on shared GO annotations Vacuolar   GO  Paths   Membrane   Vacuolar   Membrane   Golgi     apparatus   Plant-­‐type   vacuole   Chloroplast   Gene   AtVHA-­‐C5   Vacuole     proton-­‐   TransporKng     V-­‐type     ATPase  ,  V1     domain     Gene   AtVHA-­‐C   Chloroplast   Vacuole   GO  Terms   Vacuole   GO  Terms   109  
  • 110.   Pa1erns  or  Signatures  between   genes  AtVHA-­‐C5  and  AtVHA-­‐C     110  
  • 111. Drug-Target Interaction Network     Pa1erns   Between   InteracKons     PotenKal  new   interacKon   112  
  • 112. Patterns of connections between people to understand functioning of society. 113  
  • 113. h1p://­‐screen/27036  
  • 114. Conclusions Ø  Open Data: ü  Transparency ü  Interoperability ü  Avoid Corruption ü  Impulse research and development ü  Data Quality Ø  Linked Open Data: ü  RDF data ü  Linked to existing datasets ü  Endpoints can be used to access data 116  
  • 115. Conclusions Ø Open Data Applications: ü Citizens can developed applications to take control of their lives. Ø (Linked) Open Data can be used: Ø Link Prediction Ø Discover Complex Patterns. 117  
  • 116. Future Directions
  • 117. THANKS! QUESTIONS Maria-Esther Vidal Universidad Simón Bolívar   h1p://   Twi1er  @Maria11576561   Skype:  mevs2006   119