• Save
Building Data
Upcoming SlideShare
Loading in...5
×
 

Building Data

on

  • 301 views

Slides I used at BioHackathon 2012 in Toyama.

Slides I used at BioHackathon 2012 in Toyama.

Statistics

Views

Total Views
301
Views on SlideShare
300
Embed Views
1

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Building Data Building Data Presentation Transcript

  • Building DataYasunori Yamamoto
  • NCBI Taxonomy 4,000 biomedical journals Database indexed at NLM 1994 4 DBs GenBank SWISSPROT PIR EMBL PRF DDBJ PDB dbEST GenBank dbSTS EMBL DDBJ 3442 Nucleic Acids Research, 1994, Vol. 22, No. 17 LANL Patent LANL Patent35 DBs2012 http://www.ncbi.nlm.nih.gov/sites/gquery Database Center for Life Science
  • NAR database issue 1400 1380 1330 1300 1230 1200 1170 1078 1100 2008 2009 2010 2011 2012 Source: Oxford University Press 92 databases added every year 93 dullhunk Database Center for Life Science
  • How to find a relevant database is an important topic, and, at the same time,to discuss what kind of databases are “good” is also significant. Database Center for Life Science
  • Data before applications / services NASA Goddard Photo and Video Database Center for Life Science
  • Good fishes first y ! m u m ! Y y m u m Y Database Center for Life Science
  • Aziz T. SaltikNature provides good fishes Chef mashes up good materials mrjorgen Database Center for Life Science
  • What should be considered? and how can these be assessed?Interesting, useful & reliable Reliable in terms of content and structure Peer-reviewed → Published on NAR database issue or another scientific journal.Sustainable, reusable & discoverable Appropriate licenses, bulk downloadable via the Internet, Linked Data...Fresh & stable Frequent updates with the least amount of down time. Database Center for Life Science
  • We should focus on building “good” data or developing tools to help it. Database Center for Life Science
  • AllieAbbreviation / long form pairs in life sciences Japanese translation CC 2.1 (Japan) Allie Monthly update http://allie.dbcls.jp/ SPARQL endpoint / bulk downloadable (N-triples or tab delimited plain text) Links to PubMed and DBpedia (currently, RDF data only)Web search service 7000+ unique visits / mo to the search service Database Center for Life Science
  • Allie data model absorption of lexical variantsPairCluster ShortForm LongForm SPF specific pathogen-free appearsIn PubMedIDList contains CoocurringShort cooccursWithPairList FormList Pair ShortForm LongForm SPF specific pathogen-free inResearch AreaOf ResearchArea Pair ShortForm LongForm spf specified pathogen free frequency Database Center for Life Science
  • Allie class hierarchy http://purl.org/allie/ontology/201108 Database Center for Life Science
  • Allie RDF data excerpted "特定病原体除去の"@ja allie:LongFormAbbreviation SPF "specific pathogen-free"@en rdfs:label rdf:type Long form rdfs:label specific pathogen-free http://purl.org/allie/id/longform/1528191 English allie:hasLongFormOf 特定病原体除去の Japanese http://purl.org/allie/id/pair/1547869 rdf:type allie:hasShortFormOf allie:EachPair http://purl.org/allie/id/pair/1547869 rdfs:label rdf:type "SPF"@en allie:ShortForm Database Center for Life Science
  • Useful / reliable? Database, Vol. 2011, Article ID bar013, doi:10.1093/database/bar013 ............................................................................................................................................................................................................................................................................................. Original article Allie: a database and a search service of abbreviations and long forms Yasunori Yamamoto1,*, Atsuko Yamaguchi1, Hidemasa Bono1 and Toshihisa Takagi2 1 Database Center for Life Science, Bunkyo-ku, Tokyo and 2Department of Computational Biology, University of Tokyo, Kashiwa, Chiba, Japan *Corresponding author: Tel: +81 (0)3 5841 0251; Fax: +81 (0)3 5841 8090; Email: yy@dbcls.rois.ac.jp Downloaded from http://database.oxfordjournals.org/ at University of Tokyo on Submitted 25 November 2010; Revised 25 March 2011; Accepted 28 March 2011 ............................................................................................................................................................................................................................................................................................. Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbre- viations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. ............................................................................................................................................................................................................................................................................................. Database Center for Life Science
  • Discoverable?http://thedatahub.org/dataset/allie-abbreviation-and-long-form-database-in-life-science Database Center for Life Science
  • Reliable?http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/validate.php Database Center for Life Science
  • Reliable/stable? http://stats.lod2.eu/rdfdocs Database Center for Life Science
  • Stable? http://labs.mondeca.com/sparqlEndpointsStatus/http://labs.mondeca.com/sparqlEndpointsStatus/details/allie-abbreviation-and-long-form-database-in-life-science.html Database Center for Life Science
  • consider to be on the right track. Database Center for Life Science
  • Projects in this hackathon Database Center for Life Science
  • RDFization of Life Science DictionaryLife Science Dictionary English - Japanese / Japanese - English dictionary in life sciences Thesaurus and concordance Project started in 1993. 110k English words and 120k Japanese words (as of Mar. 2011)Can be used to inter- or intra-connect life science databases Bridge English-Japanese resources in life sciencesPrefix would be http://purl.org/lsd/ Database Center for Life Science
  • http://lsd.pharm.kyoto-u.ac.jp/en/service/weblsd/index.html Database Center for Life Science
  • RDFization of ColilComments on Literature in Literature (Colil) Citation data extracted from PMC OA subset Citing comments on each cited literature (Citation context) Relevant literature based on co-citation data Similar to the MS academic search serviceCan be used to a literature recommendation service Curation/annotation assistance servicesBulk downloadable Database Center for Life Science
  • Colil Database Center for Life Science
  • Enjoy hack & Toyama! digicacy