Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013


Published on

El impacto de open data en el mundo y en Venezuela. Profesora Maria Esther Vidal. Universidad Simón Bolivar. Presentacion realizada durante el boot camp sobre periodismo de datos-Venezuela.

Published in: Social Media, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013

  1. 1. Impact of Open Data and Linked Open Data Venezuela Maria-Esther Vidal Universidad Simón Bolívar   h1p://   Twi1er  @Maria11576561   Skype:  mevs2006   1  
  2. 2. Lights  around  the  London's  2012  Olympic  stadium  describe  Sir  Tim  Berners-­‐Lee's  invenKon,  the     World  Wide  Web.  The  Open  Data  InsKtute,  which  he  co-­‐founded,  declares  a  mandate  of    'Knowledge  for  Everyone'.  
  3. 3. “The  ODI  announced  new  13  nodes:   US,  Canada,  France,  Dubai,  Italy,   Russia,  Sweden  and  ArgenKna.”   Oct  29    2103   Sir  Tim  Berners-­‐Lee  (right)  and  Sir  Nigel  Shadbolt  (leT)  
  4. 4. Agenda Ø Open Data Ø Linked Open Data ü Linked Open Data in Journalism Ø Linked Open Data Applications ü Linked Open Data at USB Ø Conclusions and Future Directions
  5. 5. OPEN  DATA  
  6. 6. Open Data Definition “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” Open_Data_stickers.jpg 1,024×768 pixels 7/1/13 9:33 PM Availability and access Reuse and Distribution Universal Participation 6  
  7. 7. Open Data  Availability and Access: Data should be available as a whole, preferably downloading via the Internet. Data should be available in a convenient format. Should be free or at most at a reproduction cost. 7  
  8. 8. Open Data   Reuse and Distribution: Data should be offered in a way that it can be reused, distributed and be interrelated with other datasets. 8  
  9. 9. Open Data   Universal Participation: Any person should be able to use, reuse and distribute. NO discrimination: Commercial vs. NOT commercial Educational vs. NOT educational Profit vs. No Profit 9  
  10. 10. Type of Open Data
  11. 11. Why Open Data? Interoperability   Transparency   11  
  12. 12. Why Open Data? Avoid  CorrupKon   Wealth   Only  in  Europe  over  140  billion  of  euros  per  year   h1p://­‐making-­‐official-­‐data-­‐publi could-­‐spur-­‐lots-­‐innovaKon-­‐new-­‐goldmine   12  
  13. 13. Why Open Data? Research  and  Development   Quality  of  Life   13  
  14. 14. Why Open Data? Improve  Public  AdministraKon   Data  Quality   14  
  15. 15. Why Open Data? Citizens can express themselves and unite so that their voices can be heard. 15   h1p://  
  16. 16. Open   Licenses   Open  Data   Open   ParKcipaKon   Open  Source   Open   Standards  
  17. 17. What is and what is not Open Data Open  Data.   “A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.” Difference between open data and data that is publicly available lies in the use of formats that may be read, used and redistributed by any citizen. Examples of public data that is not open data: data in spreadsheets, pdf, etc. Usually open data are csv. h1p://­‐“open-­‐data”-­‐means-­‐–-­‐and-­‐what-­‐it-­‐doesn’t  
  18. 18. Opening Up Data Rules Ø  Keep it simple Ø  Engage early and engage often Ø  Address common fears and misunderstandings Four Steps Ø  Choose your Dataset(s) Ø  Apply an Open License Ø  Make the data available Ø  Make it discoverable
  19. 19. Open Data Conditions Data Providers Requirements Distributing Open Data Ø  Attribution: data providers may require to receive credit. Ø  Integrity: data providers may require that users indicate if data change. Ø  Share-alike: data providers may impose that any dataset created using their data are also open. Ø  Data is machine-readable Ø  Data is available in bulk more than using an API. h1p://  
  22. 22. Some Open Data Applications Al  menos  77  países  cumplen  nivel  >  2   h1p://   22  
  23. 23. Why Open Data? Citizens may unite so that their voices can be heard
  24. 24. Why Open Data? Monrovia  África   h1p://
  25. 25. Why Open Data? Tanzanía   h1p://
  26. 26. Why Open Data? Tanzanía   h1p://
  27. 27. Why Open Data? Tanzanía   h1p://
  28. 28. h1p://­‐deprivaKon  
  30. 30. Vaccines and Immunisation in Australia h1p://­‐vaccinaKon-­‐australia-­‐map  
  31. 31. h1p://  
  32. 32. Applications of Open Data   h1p://­‐data-­‐applicaKons/   Kenia  
  33. 33. OPEN  DATA  AND  SOCIETY    
  34. 34. ApplicaKons  of  Open  Data   h1p://  
  35. 35. h1p://  
  37. 37. h1p://   Who Owns Who
  38. 38. h1p://   Who Owns Who
  40. 40. Why Open Data? Improve  Public  Services   Smart  CiKes   Mejorar  la   Administración  Pública   40  
  41. 41. h1p://  
  42. 42. h1p://­‐screen/27036  
  43. 43. h1p://­‐screen/27036  
  44. 44. Applications of Open Data   h1p://    
  45. 45. LINKED  OPEN  DATA   h1p://  
  46. 46. What to do with Open Data? 46  
  47. 47. What to do with Open Data? At  least  77  countries  comply  level  >  2   h1p://­‐en-­‐el-­‐ararteko   47  
  48. 48. What to do with Open Data? At least 11 countries comply level > 4 h1p://­‐en-­‐el-­‐ararteko   48  
  49. 49. What to do with Open Data?   h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare#!  
  50. 50. h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare#!  
  51. 51. Bottom 10 by Open Data Index Score   h1p://­‐top-­‐open-­‐data-­‐index-­‐how-­‐countries-­‐compare  
  52. 52. Local  Governments  must  use   Open  Data  to  stay  connected   with  the  ciKzens!  
  53. 53. MoKvaKon  SemanKc  Web  EvoluKon   The Linked Open Data cloud, using the Web to connect related data that was not previously linked! Published Data are enhanced with semantics! Standards to annotate and describe data: XML, RDF, RDFS, OWL. Standards to query data: SPARQL. Ontologies representing almost any domain. Hyperlink-based systems. Protocols: http, uri, html Documents and data were published Arpanet: four servers connected Files were transferred Tools: ftp, telnet, e-mail 80’s IRMLs  2010-­‐ESWC  2010   90’s 00’s Now
  54. 54. The Linked Open Data Cloud • Explosion in the number of:     – Linking Open Data resources and databases – Different quality parameters. Molecular databases 1170, 95 more – Controlledthan 2008 and 110 more than the year before ! vocabularies: – MeSH, GO, PO… tools published Services and – Highly interconnected by these databases follow a similar progression! data sources: In October 2007, Cloud of Linked Data Different Sizes datasets consisted of over two billion RDF triples, Many links which were interlinked by over two million RDF links. • Different in- and outBy May 2009 this had grown to 4.2 billion RDF triples, degrees, etc interlinked by around 142 billions RDF links! Today • Biological Web: large the Linked Open Data cloud has at least 295 datasets, datasets of linking data. 31,634,213,770  triples, and 503,998,829  links. • Genes, Diseases, Clinical Drugs, Proteins, and so on.
  55. 55. StaKsKcs  
  57. 57. Open Data in Journalism   Ø It may be trendy but not new. Ø Open Data implies Open Data Journalism. Ø Data is not necessarily curated. Ø Bigger Datasets and Small Things. Ø Data Journalism is 80% perspiration, 10% great ideas, 10% output. Ø Long and short-form. Ø Anyone can do it. Ø Visualization is important. Ø Data publishers do not have to be programmers. Ø It is all about stories. h1p://­‐journalism  
  58. 58. Breaking   News   Open   Data   Running   Events   Shared  Data   Open Data in Journalism
  59. 59. Breaking   News   Open   Data   Open Data in Journalism Running   Events   Shared  Data   •  Data Cleansing •  Conflict Resolution Data  IntegraKon   SemanKficaKon   •  Meta-Data Annotation •  Vocabularies •  Visualization •  Publishing the Story PublicaKon  
  60. 60. Meta-Data BBC News   This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories. h1p://­‐Linked-­‐Data-­‐Ontology  
  61. 61. Data Management ToolsBBC News   h1p://­‐Data-­‐ConnecKng-­‐together-­‐the-­‐BBCs-­‐Online-­‐Content  
  62. 62. h1p://­‐data-­‐on-­‐the-­‐bbc-­‐2638734  
  63. 63. More   Ontologies  to   represent   Meta-­‐Data  
  65. 65. Challenges  for                                                                    Linked  Data   Visualization   •  Enabling  user  interacKon   –  Users  must  be  able  to  navigate  through  the  data  by  exploiKng  the   connecKons  between  Linked  Data  resources   –  The  user  might  edit  the  underlying  data  to  enrich  it  by:     •  CreaKng  addiKonal  metadata   •  HighlighKng  or  correcKng  errors   •  ValidaKng  data   •  SupporKng  data  reusability   –  The  output  (the  plo1ed  data  or  the  visualizaKon  itself)  might  be   encoded  using  standard  ontologies  and  vocabularies       •  Scalability   –  Linked  Data  visualizaKon  techniques  should  support  the  display  of   large  amount  of  data  in  an  efficient  way   EUCLID  –  InteracKon  with  Linked  Data   74  
  66. 66. Challenges  for                                                                    Linked  Open   Data  Visualization   •  ExtracKng  data  from  different  repositories   –  A  Linked  Data  set  might  be  parKKoned  into  several  repositories     –  The  region  of  interest  (ROI)  might  include  data  from  different  data   sets,  requiring  the  access  to  distributed  repositories   •  Handling  heterogeneous  data   –  The  same  data  (concepts)  might  be  modeled  differently,  for  example,   using  different  vocabularies   –  Certain  values  might  have  different  formats,  for  example,  dates   represented  as  DD-­‐MM-­‐YYYY,  MM-­‐DD-­‐YYYY  or  just  YYYY   •  Dealing  with  missing  values   –  Due  to  the  semi-­‐structuredness  of  Linked  Data,  some  instances  might   have  missing  values  for  certain  properKes   EUCLID  –  InteracKon  with  Linked  Data   75  
  67. 67. Linked  Open  Data  VisualizaKon  Techniques     View   EUCLID  –  InteracKon  with  Linked  Data   76  
  68. 68. Comparison  of                                                                                                       A1ributes  /  Values   Bar/column  chart     Pie  chart   Allows  the  comparison  of  values  of   different  categories.       Useful  for  performing  comparison   of  percentages  or  proporKons.       Image  source:  h1p://   Image  source:  h1p://       Line  chart   Histogram   Allows  visualizing  data  as  a  series  of   data  points,  where  the  measurement   points  (x-­‐axis)  are  ordered.       Graphical  representaKon  of  the   distribuKon  of  the  data.     Image  source:  h1p://       Image  source:  h1p://   EUCLID  –  InteracKon  with  Linked  Data   77  
  69. 69. Analysis  of                                          RelaKonships  and   Hierarchies     Graph     Arc  diagram   The  data  entries  are  represented  as   nodes  and  the  links  as  edges.         The  nodes  are  displayed  in  one   dimension,  and  the  arcs  represent   the  connecKons.       Adjacency  Matrix  diagram   Node-­‐link  visualizaKons   The  nodes  are  displayed  as  rows  and   columns,  and  the  links  between  the   nodes  are  entries  in  the  matrix.   The  data  is  organized  in  hierarchies.     Source  of  images:  h1p://       EUCLID  –  InteracKon  with  Linked  Data   78  
  70. 70. Analysis  of                                          RelaKonships  and   Hierarchies  (2)     Space-­‐filling  techniques   Treemaps   Icicles  and  sunburst   Subdivide  area  into  rectangles.   Hierarchies  are  represented  by   adjacencies.     Circle-­‐packing       Rose  diagrams   Containment  is  used  to  represent  the   hierarchies.   Areas  are  equal  angles  and  the  data   is  represented  by                                                             the  extension  of                                                                                       the  area.   Source  of  images:  h1p://       EUCLID  –  InteracKon  with  Linked  Data   79  
  71. 71. Analysis  of    Temporal  or  Geographical   Events     ConKnuous  data  in  Kme   Timeline     Discrete  data  points  in  Kme   Source:  h1p//   Source:  h1p://­‐movie-­‐box-­‐office-­‐chart   Display  geo-­‐points  on  a  map   Choropleth  maps   Dorling  cartograms   Aggregate  data  by   geographical  area   Aggregate  data  and  replace   each  area  with  a  circle     Maps   LocaKon  maps   Source:  Google  Map  API   Source:  h1p//   EUCLID  –  InteracKon  with  Linked  Data   Source:  h1p://       80  
  72. 72. Libraries     h1ps://  
  73. 73. APPLICATIONS  
  74. 74. Tasks  to  be  Solved  …   Traverse and Consume Linked Data from the LOD cloud or locally. SPARQL endpoints have been developed to access data from the LOD cloud. 83  
  75. 75. SPARQL  ENDPOINTS    
  76. 76. select  disKnct  *  where  {<h1p://>  ?p  ?o}    
  77. 77. h1p://   All  the  informaKon  related  to  Venezuela   SPARQL  Query  
  78. 78. h1p://  
  79. 79. SPARQL  Endpoint    URL   SPARQL  Query  
  80. 80. SPARQL  Query  
  81. 81. Data:   foaf:made   dbpedia:   The_Beatles   foaf:made   foaf:made   <h1p:// record/...>   dc:Ktle   <h1p:// record/...>   dc:Ktle   "Help!"   "Abbey  Road"   <h1p:// record/...>   dc:Ktle   "Let  It  Be"  
  82. 82. SELECT  ?x  ?name  ?mbox  ?country  ?reviewer  ?product  ?title   WHERE  {     <http://www4.wiwiss.fu-­‐ Review2883011>  rev:reviewer  ?x  .        ?x  <­‐rdf-­‐syntax-­‐ns#type>  <>  .      ?x  <>  ?name  .      ?x  <>  ?mbox  .      ?x  <http://www4.wiwiss.fu-­‐>  ?country  .      ?reviewer  <>  ?x  .      ?reviewer  <http://www4.wiwiss.fu-­‐>  ?product  .      ?reviewer  <>  ?title  }  
  83. 83. Graph  Databases  
  85. 85. ANAPSID   SPARQL-DQP   Federations of Endpoints
  86. 86. h1ps://   Federated Queries ANAPSID   “Genes and diseases that have been studied for drugs tested in clinical trials where Breast Cancer was studied” SELECT  DISTINCT  ?D1?TGD  ?GN1  ?GN2   WHERE  {    ?CT1  <>  ?C1  .    ?CT1<>  ?I  .      ?CT1<>  ?I  .    ?I<>  "Drug"  .    ?C1  <­‐schema#seeAlso>  ?D1  .    ?I  <­‐schema#seeAlso>  ?I1  .    ?C  <>  "Breast  Cancer"  .    ?CT  <>  ?I  .    ?CT  <>  ?A4  .    ?II  <http://www4.wiwiss.fu-­‐>  ?TGD  .    ?TGD  <http://www4.wiwiss.fu-­‐>  ?GN1  .    ?D1  <http://www4.wiwiss.fu-­‐>  ?GN2  .   }   Life Sciences Query: 97  
  87. 87. Federated Queries h1ps://   ANAPSID   SELECT  DISTINCT  ?D1  ?TGD  ?GN1  ?GN2   WHERE  {            {  SERVICE  <>  {                    ?C1  <>  "Breast  Cancer"  .                    ?C1  <­‐schema#seeAlso>  ?D1  .                    ?C3  <­‐schema#seeAlso>  ?D1  .                    ?CT3  <>  ?C3  }}  .              {  SERVICE  <>  {                    ?C1  <>  "Breast  Cancer"  .                    ?I  <>  "Drug"  .                    ?CT1  <>  ?C1  .                    ?CT1  <>  ?I  }}  .              {  SERVICE  <http://www4.wiwiss.fu-­‐>  {                    ?I1  <http://www4.wiwiss.fu-­‐>  ?TGD  .                    ?TGD  <http://www4.wiwiss.fu-­‐>  ?GN1  }}  .              {  SERVICE  <>  {                    ?I  <>  "Drug"  .                    ?I  <­‐schema#seeAlso>  ?I1  .                    ?CT3  <>  ?I  .                    ?CT3  <>  ?C3  }}  .     }     S1:   S2:   S3:   S4:   98  
  88. 88. Federated Queries h1ps://   ANAPSID   S1   S2   S3   S4   99  
  89. 89. ANAPSID   ANAPSID   ANAPSID   h1p://  
  90. 90. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   101  
  91. 91. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   102  
  92. 92. “Drugs that possibly target Leukemia” SELECT  DISTINCT  ?drug1         WHERE  {   ?drug1  drugbank:possibleDiseaseTarget   diseasome:673  .         ?drug1  drugbank:target  ?o.           ?o  drugbank:genbankIdGene  ?g.           ?o  drugbank:locus  ?l.           ?o  drugbank:molecularWeight  ?mw.           ?o  drugbank:hprdId  ?hp.           ?o  drugbank:swissprotName  ?sn.           ?o  drugbank:proteinSequence  ?ps.         ?o  drugbank:generalReference  ?gr.           ?drug  drugbank:target?o.           ?drug  drugbank:synonym?o1  .      OPTIONAL  {          ?drug  owl:sameAs  ?drug5  .                        ?drug5  rdf:type  dbcategory:Drug  .                        ?drug  drugbank:keggCompoundId  ?cpd  .                          ?enzyme  kegg:xSubstrate  ?cpd  .                          ?enzyme  rdf:type  kegg:Enzyme  .                          ?reaction  kegg:xEnzyme  ?enzyme  .                        ?reaction  kegg:equation  ?equation  .       }  }   h1p://   103  
  94. 94. Tasks  to  be  Solved  …(2)   Patterns of connections between people to understand functioning of society.
  95. 95. !#
  96. 96. $%% #
  97. 97. Topological properties of graphs can be used to identify patterns that reveal phenomena, anomalies and potentially lead to a discovery. A significant increase of graph data in the form of social biological information. 105  
  98. 98. Annotation Graph 107  
  99. 99. Pa1erns  or  Signatures   Brentuzumab_vedoKn   And  Catumaxomab     108  
  100. 100. Annotation Similarity between two genes based on shared GO annotations Vacuolar   GO  Paths   Membrane   Vacuolar   Membrane   Golgi     apparatus   Plant-­‐type   vacuole   Chloroplast   Gene   AtVHA-­‐C5   Vacuole     proton-­‐   TransporKng     V-­‐type     ATPase  ,  V1     domain     Gene   AtVHA-­‐C   Chloroplast   Vacuole   GO  Terms   Vacuole   GO  Terms   109  
  101. 101.   Pa1erns  or  Signatures  between   genes  AtVHA-­‐C5  and  AtVHA-­‐C     110  
  102. 102. Drug-Target Interaction Network     Pa1erns   Between   InteracKons     PotenKal  new   interacKon   112  
  103. 103. Patterns of connections between people to understand functioning of society. 113  
  104. 104. h1p://­‐screen/27036  
  105. 105. Conclusions Ø  Open Data: ü  Transparency ü  Interoperability ü  Avoid Corruption ü  Impulse research and development ü  Data Quality Ø  Linked Open Data: ü  RDF data ü  Linked to existing datasets ü  Endpoints can be used to access data 116  
  106. 106. Conclusions Ø Open Data Applications: ü Citizens can developed applications to take control of their lives. Ø (Linked) Open Data can be used: Ø Link Prediction Ø Discover Complex Patterns. 117  
  107. 107. Future Directions
  108. 108. THANKS! QUESTIONS Maria-Esther Vidal Universidad Simón Bolívar   h1p://   Twi1er  @Maria11576561   Skype:  mevs2006   119