Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Analysis in a Changing Discourse |
Presented By
Date
Data Analysis in a Changing Discourse
The Challenges of Scholarl...
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse | 3
Data Analysis in a Changing Discourse | 4
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
queri
consum
correl
hierarch
profillognorm
graph
ws-bpel
to
program
decis
global
e...
Data Analysis in a Changing Discourse |
represent
monet
queri
consum
collabor
paper
semantic/data
reput
languag
entiti
web...
Data Analysis in a Changing Discourse |
Figure 1. Evolution of the number of classes of the three branches of the Gene Ont...
Data Analysis in a Changing Discourse |
Table 2. Gene Ontology complexity variations.
Dameron	
  O,	
  Be@embourg	
  C,	
 ...
Data Analysis in a Changing Discourse |
•  The most recent changes to the GO term “apoptotic process” as displayed in Quic...
Data Analysis in a Changing Discourse |
Ramifications
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse | 13
What happens to the long tail?
Data Analysis in a Changing Discourse |
CHEMBL 15: Targets are now proteins
h@p://chembl.blogspot.nl/2013/01/chembl-­‐15-­...
Data Analysis in a Changing Discourse | 15
Data Analysis in a Changing Discourse | 16
Downstream effects
Data Analysis in a Changing Discourse |
The growth of data munging
17
Data Analysis in a Changing Discourse |
h@ps://storify.com/chenghlee/dataformathell	
  
h@p://isps.yale.edu/sites/default/...
Data Analysis in a Changing Discourse |
“60 % of time is spent on data
preparation”
NASA, A.40 Computational Modeling Algo...
Data Analysis in a Changing Discourse |
Search target Oxidoreductase: 481 targets from different species
Selection of all ...
Data Analysis in a Changing Discourse |
The Seven Deadly
Sins of
Bioinformatics
Professor Carole Goble
carole.goble@manche...
Data Analysis in a Changing Discourse |
22
Andy Law's Third Law
•  “The number of unique identifiers assigned to
an indivi...
PubChemDrugbankChemSpider
Imatinib
Mesylate
What Is Gleevec?
Data Analysis in a Changing Discourse |
Some Solutions
24
Data Analysis in a Changing Discourse |
Issue:
Identifiers aren’t the same and
we can’t agree on when one
thing equals ano...
Data Analysis in a Changing Discourse | 26
Issue:	
  There’s	
  no	
  one	
  data	
  
model	
  of	
  science	
  
	
  
Solu...
Data Analysis in a Changing Discourse |
provbook.org	
  
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
My Questions:
15/03/15	
  
29
Data Analysis in a Changing Discourse |
[Gray	
  et	
  al.	
  ISWC	
  2014]	
  
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
We have to rely on computers
32
Data Analysis in a Changing Discourse |
Contact: Elsevier Labs
•  Paul Groth p.groth@elsevier.com
•  http://pgroth.com
•  ...
Data Analysis in a Changing Discourse |
•  What is the interplay between data munging and concept drift?
•  What happens w...
Upcoming SlideShare
Loading in …5
×

Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarly Communication​

500 views

Published on

Paul Groth (Elsevier) “Data Analysis in a Changing Discourse: The Challenges of Scholarly Communication​“
Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam

  • Be the first to comment

  • Be the first to like this

Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarly Communication​

  1. 1. Data Analysis in a Changing Discourse | Presented By Date Data Analysis in a Changing Discourse The Challenges of Scholarly Communication Paul Groth @pgroth
  2. 2. Data Analysis in a Changing Discourse |
  3. 3. Data Analysis in a Changing Discourse | 3
  4. 4. Data Analysis in a Changing Discourse | 4
  5. 5. Data Analysis in a Changing Discourse |
  6. 6. Data Analysis in a Changing Discourse | queri consum correl hierarch profillognorm graph ws-bpel to program decis global electron mechan imbalanc cook word bottleneck brows relev recip geograph markov graph-bas rate design click spectral index section access petri conduct net usag modular clickstream implicit valu search forum auction technolog anchor rdf anycast social opinion semant approxim prefer folksonomi tag-bas substr mobil select use from & recommend on relat probabilist uddi prototyp cach ict4d retriev scalabl annot tag learn stream process share templat topic minimum explor onlin secur travel answer product resourc peer-to-p usabl geoloc bloom domin sparql goal-driven issu inform suggest composit feedback telecom keyboard taxonomi dynam entiti reinforc monitor polici delici handl gadget framework spatio-tempor discuss workload sidejack submodular mode found citat hard combinatori meta sponsor energi extract orient network join space publish research content on-lin adapt internet integr partit navig reason theori complianc thread clickthrough filter length regress frequent independ denorm rank evolut script data interact system messag circl privaci gps eavesdrop fuzzi crawl keyword tree structur h-index balanc video schema browser and function comput mine engin rout technology-enhanc (well soap distribut track price object eye-track regular segment model co-clust multi-keyword determin bulletin commerc qos text cdn random session reput find xml locat winner activ cloak local express mainten cost-per-act requirorgan statist mediat microbusi view wiki set knowledg 2.0 expertis disjunct detect expert pattern review wikipedia debat languag chemic flickr approach email attribut spars isol extens p2p news advertis popul protect instant axiomat dissemin voicesit tempor facet instanc context logic load ontolog walk distil suppli trust communiti duplic invert devic compon interest basic imag bayesian repetit educ hidden semantic-bas novel datalog servic near behavior anonym incentive-cent region server-sid propag metric cross-languag cluster pharm lightweight develop minim media medic econom complex dht infer optim effect user extern task semantics) person programm the paradigm isoton monet photo rest collabor demograph web cut character board persuas subsequ match applic classfic webpag traffic associ measur microformat collect cascad soft page sitemap crawler shed excerpt maxim mirror guarante p3p transport viral for overlay characteris larg market machin same-origin compress web-bas vs. comparison of label semistructur disabl owl effici log task-bas spam question aspect-ori fast interfac analysi semi-supervis wireless cloud pagerank categor consist isid problem similar query-log classif featur evalu pseudo abstract diagnosi proven generat mutual mashup discoveri virtual bpel field communic phish architectur longev svm algorithm fsg reliabl descript visual rule Keyword  co-­‐occurrence  network  in  WWW  2008   web,  query,  online,  mobile  
  7. 7. Data Analysis in a Changing Discourse | represent monet queri consum collabor paper semantic/data reput languag entiti web locat polici with explain desktop blog to analyz rich geo/tempor analyt applic digit tangible/hapt spell (slas) traffic relev measur unstructur level h negat authent correct sensemak statist soft manag crawler wiki enterpris properti aspect porn natur creation rate design structur extract click index network for open review multimedia definit publish discoveri content method communiti internet approach defens metadata machin real-world agreement rich-media market base theori repositori news advertis vertic on search auction of page filter context social fine-grain improv semistructur produc control semant e-commerc effici appli qualiti rank system right mobil summar select use from log spam interact compos avail their attack interfac includ recommend corpus large-scal ontolog deliveri that tool privaci site trail visual link ling harvest cach replic novel retriev evolut scalabl servic access annot contextu learn browser object-ori analysi classif comput evalu context-awar process in share mine cluster tag explor generat onlin facet develop techniqu secur perform media research exchang econom other exploratori combin document divers sub/super-docu relat distribut compress discov virus user component-bas engin data model feder audit sentiment algorithm author issu person text inter-organiz suggest mechan the opinion Keyword  co-­‐occurrence  network  in  WWW2010   search,  social,  data  
  8. 8. Data Analysis in a Changing Discourse | Figure 1. Evolution of the number of classes of the three branches of the Gene Ontology. Dameron  O,  Be@embourg  C,  Le  Meur  N  (2013)  Measuring  the  EvoluKon  of  Ontology  Complexity:  The  Gene  Ontology  Case  Study.  PLoS  ONE   8(10):  e75993.  doi:10.1371/journal.pone.0075993   h@p://127.0.0.1:8081/plosone/arKcle?id=info:doi/10.1371/journal.pone.0075993  
  9. 9. Data Analysis in a Changing Discourse | Table 2. Gene Ontology complexity variations. Dameron  O,  Be@embourg  C,  Le  Meur  N  (2013)  Measuring  the  EvoluKon  of  Ontology  Complexity:  The  Gene  Ontology  Case  Study.  PLoS  ONE   8(10):  e75993.  doi:10.1371/journal.pone.0075993   h@p://127.0.0.1:8081/plosone/arKcle?id=info:doi/10.1371/journal.pone.0075993  
  10. 10. Data Analysis in a Changing Discourse | •  The most recent changes to the GO term “apoptotic process” as displayed in QuickGO [20]. In total there have been 54 changes over the lifetime of the term. •  Huntley et al. GigaScience 2014 3:4 doi:10.1186/2047-217X-3-4 Definitions change
  11. 11. Data Analysis in a Changing Discourse | Ramifications
  12. 12. Data Analysis in a Changing Discourse |
  13. 13. Data Analysis in a Changing Discourse | 13 What happens to the long tail?
  14. 14. Data Analysis in a Changing Discourse | CHEMBL 15: Targets are now proteins h@p://chembl.blogspot.nl/2013/01/chembl-­‐15-­‐schema-­‐changes.html   14
  15. 15. Data Analysis in a Changing Discourse | 15
  16. 16. Data Analysis in a Changing Discourse | 16 Downstream effects
  17. 17. Data Analysis in a Changing Discourse | The growth of data munging 17
  18. 18. Data Analysis in a Changing Discourse | h@ps://storify.com/chenghlee/dataformathell   h@p://isps.yale.edu/sites/default/files/files/ IDCC14_DQR_PeerGreenStephenson.pdf  
  19. 19. Data Analysis in a Changing Discourse | “60 % of time is spent on data preparation” NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
  20. 20. Data Analysis in a Changing Discourse | Search target Oxidoreductase: 481 targets from different species Selection of all the oxidoreductases and filtering bioactivities with the criteria IC50 < 100 (no units could be selected): 11497 data obtained Table exported to a excel spreadsheet and manually filtered From Mabel Loza - USC team
  21. 21. Data Analysis in a Changing Discourse | The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project OMII-UK
  22. 22. Data Analysis in a Changing Discourse | 22 Andy Law's Third Law •  “The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”... and is frequently many, many more. h@p://bioinformaKcs.roslin.ac.uk/lawslaws.html    
  23. 23. PubChemDrugbankChemSpider Imatinib Mesylate What Is Gleevec?
  24. 24. Data Analysis in a Changing Discourse | Some Solutions 24
  25. 25. Data Analysis in a Changing Discourse | Issue: Identifiers aren’t the same and we can’t agree on when one thing equals another Solution: Adaptive identifier mapping based on profiles Strict   Relaxed   Analysing   Browsing  
  26. 26. Data Analysis in a Changing Discourse | 26 Issue:  There’s  no  one  data   model  of  science     SoluKon:   Simple  “common  sense”   driven  data  model  primarily   focused  on  user  interface   needs  
  27. 27. Data Analysis in a Changing Discourse | provbook.org  
  28. 28. Data Analysis in a Changing Discourse |
  29. 29. Data Analysis in a Changing Discourse | My Questions: 15/03/15   29
  30. 30. Data Analysis in a Changing Discourse | [Gray  et  al.  ISWC  2014]  
  31. 31. Data Analysis in a Changing Discourse |
  32. 32. Data Analysis in a Changing Discourse | We have to rely on computers 32
  33. 33. Data Analysis in a Changing Discourse | Contact: Elsevier Labs •  Paul Groth p.groth@elsevier.com •  http://pgroth.com •  @pgroth 15/03/15   33
  34. 34. Data Analysis in a Changing Discourse | •  What is the interplay between data munging and concept drift? •  What happens when humans are not in the loop? •  What’s our tolerance for fuzziness? •  Should we worry about the long tail? 34 Questions

×