Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Buzzbang

17 views

Published on

EBI Bioschemas samples event about the alpha Bioschems Buzzbang search engine

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Buzzbang

  1. 1. Bioschemas crawl and search frontend (alpha) Justin Clark-Casey, Software Architect, Micklem Lab
  2. 2. Buzzbang ● Motivation ○ InterMine (my main project) is a life sciences data integration platform ○ Humanmine.org, mousemine.org, yeastmine.org ■ See http://registry.intermine.org ○ InterMine will embed Bioschemas ○ But need demonstrate cross-site findability ● Hence Buzzbang ○ A prototype crawler and search frontend for Bioschemas data ○ Google-like frontend ○ Explore the practicalities ○ Crawl everything ○ Python, Open-source, open development, etc.
  3. 3. Architecture - crawl sitemap.xml <urlset> <url><loc>http://synbiomine.org/2000005</loc></url> <url><loc>http://synbiomine.org/2000011</loc></url> ... 1:crawler Crawl DB 2:JSON-LD extractor 3:indexer Sites.cfg https://fairsharing.org https://www.ebi.ac.uk/biosamples/ ...
  4. 4. Architecture - frontend
  5. 5. Live demo http://buzzbang.science
  6. 6. Future work ● Very proof-of-concept ● Scalability, reliability, security ○ Look at using Scrapy and Frontera ● Better understanding of the JSON-LD ○ Pyld, SHACL, SHEX ● Crawl on demand ● Common crawl? ● Skunk-works ○ Need more hardware ○ Hope to have a GSOC student this summer ○ Collaboration very welcome
  7. 7. Thankyou! github: https://github.com/justinccdev/bsbang-crawler github: https://github.com/justinccdev/bsbang-frontend live: http://buzzbang.science ● Justin Clark-Casey (justincc@intermine.org, http://justincc.org) ● Pranjal Aswani (github:aswanipranjal) ● Hao Xiangpeng (github:HaoPatrick) ● Ankit Lohar (github:innovationchef) ● And others!

×