BioTorrents: A File Sharing Service for Scientific Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

BioTorrents: A File Sharing Service for Scientific Data

on

  • 1,116 views

I present an overview of BioTorrents.net. This was presented at the Open Science Summit 2010 conference in Berkeley, CA.

I present an overview of BioTorrents.net. This was presented at the Open Science Summit 2010 conference in Berkeley, CA.

Statistics

Views

Total Views
1,116
Views on SlideShare
1,114
Embed Views
2

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 2) Mention strict structure of existing dbs
  • Bandwidth limited because of intermediate links (geospatial)
  • 25-50% of all Internet traffic is BitTorrent traffic
  • Link to paper
  • Existing data providers (NCBI, EBI, JGI, etc.) Scientists sharing manuscript supplementary data All data is bundled together and given a unique id Easier than setting up a Web/FTP server Scientists that want to provide immediate access to their data and results Pre/post publication Data that might not be suitable for existing databases Results that may not be sufficient for publication

BioTorrents: A File Sharing Service for Scientific Data Presentation Transcript

  • 1. Morgan Langille, PhD Open Science Summit 2010 Berkeley, California July 29 st , 2010
  • 2. Acknowledgements
    • iSEEM project
    • Dr. Jonathan Eisen
    • UC Davis
    • Questions/Comments
    • Twitter: @BetaScience
  • 3. Motivation
    • Data in science is growing rapidly
    • Transfer times increasing
    • Reliability of data transfer
    • Sharing scientific data openly
  • 4. Personal Challenges
    • Improve download speed and reliability from large data providers
    • Encourage sharing of all data associated with a study
    • Allow easier sharing of unpublished data
  • 5. Traditional file transfer methods
    • Single source server
    • Bandwidth limitations
    • No data redundancy
    • No data verification
  • 6. Peer-to-peer file transfer: BitTorrent
    • Data is shared between all computers
    • Bandwidth grows as users increases
    • Data redundancy
    • Data is verified
      • Sha1 cryptographic hash
    • 25-50% of all Internet traffic is BitTorrent
  • 7. BitTorrent: How it works
    • User installs BitTorrent client software
    • User downloads a small “.torrent” descriptor file
    • Client software connects to “Tracker” to obtain a list of other “peers” with same data
    • Client begins downloading/uploading
    .torrent “ Tracker” server
  • 8. Other BitTorrent Advantages
    • Every dataset is given a unique id (Sha1 hash)
    • Distributed Hash Table (DHT) & Peer Exchange (PEX)
      • Tracker-less peer identification
    • Local Peer Discovery (LPD)
      • Finds peers on local area network (LAN) allowing much faster data transfer
    • Web Seeds
      • FTP or HTTP resources can be added to the torrent
  • 9. BitTorrent Trackers
    • Many trackers already exist
    • Almost all have legal issues with copyright infringement issues
    • None are tailored to hosting scientific datasets
  • 10.
    • BioTorrents is a file sharing website for scientists
    • BioTorrents provides a central listing of datasets
    • Anyone can upload their own data
    • All data must be “open”; no illegal file sharing
    • Data is not hosted on BioTorrents**
    Langille & Eisen, 2010, PLoS ONE 5: e10071 .
  • 11. BioTorrents: Advanced Features
    • Browse and search by
      • Keyword (dataset title and description)
      • Category (Genomics, Proteomics, Chemistry, etc.)
      • License (Public Domain, Creative Commons, GPL, etc.)
      • Username (mlangill, jeisen, NCBI, etc.)
    • RSS feeds and automatic downloading
    • Torrents linked into “Versions”
    • Upload script for bulk torrent creation
  • 12. BioTorrents progress
    • 1000 registered users
    • 43 datasets (107 GB)
    • 766 downloads
    • 1386 GB data transferred
  • 13. Real Example
    • Download GenBank (~230GB) from NCBI
    NCBI to UC Davis Download speed Time Max 30MB/s 2 hours FTP to other server ~10MB/s 6 hours FTP to NCBI ~.5MB/s 5 days
  • 14. Who will use BioTorrents?
    • Existing large data providers
      • More reliable and faster downloads for users
      • Less bandwidth requirements for provider
    • Scientists sharing published data
      • All data is bundled together and given a unique id
      • Easier than setting up a Web/FTP server
    • Scientists sharing unpublished data
      • Data that might not be suitable for existing databases
      • Results that may not be sufficient for publication
  • 15. Issues
    • BitTorrent works best for large, popular datasets
    • Long term seeding
      • At least 1 seeder has to exist
    • Many institutions block/limit BitTorrent activity
  • 16. Future
    • Metalink
      • XML Link Protocol
      • Combines multiple sources
        • FTP, HTTP, BitTorrent, etc.
    • Volunteer Storage
      • Parallel to volunteer computing
  • 17. Final Message
    • Data transfer should be fast and easy
    • Scientific community should embrace existing technologies such as BitTorrent
    • BioTorrents uses the strengths of BitTorrent and provides features unique to scientific data