Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text mining and Environmental Metadata Suggestion - Flash Talk - GSC17

748 views

Published on

Text Mining and Environmental
Metadata Suggestion GSC17 Flash Talk
ahead of the coming BioCreative V

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Text mining and Environmental Metadata Suggestion - Flash Talk - GSC17

  1. 1. GSC17 – 5th May 2015 – Walnut Creek, CA Evangelos Pafilis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece pafilis@hcmr.gr, http://epafilis.info Text Mining and Environmental Metadata Suggestion GSC17 Flash Talk
  2. 2. GSC17 – 5th May 2015 – Walnut Creek, CA Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008) Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis In-house HCMR document (Polymenakou, Oulas, et al.) Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015) Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…] Scientific web pages Literature (abstracts, full-text articles, legends) In-house documents
  3. 3. GSC17 – 5th May 2015 – Walnut Creek, CA Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008) Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis In-house HCMR document (Polymenakou, Oulas, et al.) Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015) Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…] Scientific web pages Literature (abstracts, full-text articles, legends) In-house documents
  4. 4. GSC17 – 5th May 2015 – Walnut Creek, CA http://species.hcmr.gr, http://species.jensenlab.org http://environments.hcmr.gr, http://environments.jensenlab.org http://www.environmentontology.org/ Buttigieg PL, et al. 2013, J Biomed Semant.4:43. http://www.ncbi.nlm.nih.gov/Taxonomy Benson DA, et al. 2009, NAR
  5. 5. GSC17 – 5th May 2015 – Walnut Creek, CA •  Dictionary based approaches •  Flexible matching, e.g. hyphenation •  Orthographic dictionary expansion, e.g. adjective and plural forms, shorthand taxon name forms •  Manually Curated Stopword-list
  6. 6. GSC17 – 5th May 2015 – Walnut Creek, CA Command line
  7. 7. GSC17 – 5th May 2015 – Walnut Creek, CA •  Interactive •  Lightweight •  Term look up assistant •  Standards-compliant term suggestions
  8. 8. GSC17 – 5th May 2015 – Walnut Creek, CA Prototype: http://environments.hcmr.gr/biocreative.html
  9. 9. GSC17 – 5th May 2015 – Walnut Creek, CA https://gold.jgi-psf.org/studies?Study.Metagenomic+Study=Yes&Study.Is+Public=Yes A B C
  10. 10. GSC17 – 5th May 2015 – Walnut Creek, CA Prototype: http://environments.hcmr.gr/biocreative.html
  11. 11. GSC17 – 5th May 2015 – Walnut Creek, CA Prototype: http://environments.hcmr.gr/biocreative.html
  12. 12. GSC17 – 5th May 2015 – Walnut Creek, CA Prototype: http://environments.hcmr.gr/biocreative.html retroactive prospective
  13. 13. GSC17 – 5th May 2015 – Walnut Creek, CA BioCreative V Track 5: Interactive Curation (IAT) Dr. L. Hirschman, Dr. C. Arighi et al. September 2015, Sevilla, Spain Under Development •  Entity highlighting •  Suggested Term: •  sorting •  selection •  addition •  exporting •  Integration with Metagenomics Resources
  14. 14. GSC17 – 5th May 2015 – Walnut Creek, CA http://jensenlab.org/ http://tissues.jensenlab.org/ - Santos A et al. (under review), preprint: http://biorxiv.org/content/early/2014/11/10/010975 http://diseases.jensenlab.org/ - Pletscher-Frankild,S., et al. (2014) DISEASES: Text mining and data integration of disease- gene associations. Methods, 74, 83–89.
  15. 15. GSC17 – 5th May 2015 – Walnut Creek, CA This prototype is based on components from: ENVIRONMENTS, SPECIES/ORGANISMS, SEQenv (https://bitbucket.org/seqenv), and Reflect (http://reflect.ws) It is developed by the LifeWatchGreece Research Infrastructure and the group of Dr. Lars Juhl Jensen (Uni Copehagen); with input from the groups of Genomes OnLine Database, Virome / Metagenomes Online, and Dr. Pier L. Buttigieg (AWI). BioCreative: Dr. L. Hirschman (MITRE), DoE Award No DE-SC0010838 Funding: DoE – BioCreative V, LifeWatch Greece, NNF-CPR,”SEQenv” Hackathons (COST ES1103) Thank You! Amvrakikos Lagoons, May 2011 ACTION ES1103

×