Your SlideShare is downloading. ×
BioTorrents: A File Sharing Service for Scientific Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

BioTorrents: A File Sharing Service for Scientific Data

807
views

Published on

I present an overview of BioTorrents.net. This was presented at the Open Science Summit 2010 conference in Berkeley, CA.

I present an overview of BioTorrents.net. This was presented at the Open Science Summit 2010 conference in Berkeley, CA.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
807
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 2) Mention strict structure of existing dbs
  • Bandwidth limited because of intermediate links (geospatial)
  • 25-50% of all Internet traffic is BitTorrent traffic
  • Link to paper
  • Existing data providers (NCBI, EBI, JGI, etc.) Scientists sharing manuscript supplementary data All data is bundled together and given a unique id Easier than setting up a Web/FTP server Scientists that want to provide immediate access to their data and results Pre/post publication Data that might not be suitable for existing databases Results that may not be sufficient for publication
  • Transcript

    • 1. Morgan Langille, PhD Open Science Summit 2010 Berkeley, California July 29 st , 2010
    • 2. Acknowledgements
      • iSEEM project
      • Dr. Jonathan Eisen
      • UC Davis
      • Questions/Comments
      • Twitter: @BetaScience
    • 3. Motivation
      • Data in science is growing rapidly
      • Transfer times increasing
      • Reliability of data transfer
      • Sharing scientific data openly
    • 4. Personal Challenges
      • Improve download speed and reliability from large data providers
      • Encourage sharing of all data associated with a study
      • Allow easier sharing of unpublished data
    • 5. Traditional file transfer methods
      • Single source server
      • Bandwidth limitations
      • No data redundancy
      • No data verification
    • 6. Peer-to-peer file transfer: BitTorrent
      • Data is shared between all computers
      • Bandwidth grows as users increases
      • Data redundancy
      • Data is verified
        • Sha1 cryptographic hash
      • 25-50% of all Internet traffic is BitTorrent
    • 7. BitTorrent: How it works
      • User installs BitTorrent client software
      • User downloads a small “.torrent” descriptor file
      • Client software connects to “Tracker” to obtain a list of other “peers” with same data
      • Client begins downloading/uploading
      .torrent “ Tracker” server
    • 8. Other BitTorrent Advantages
      • Every dataset is given a unique id (Sha1 hash)
      • Distributed Hash Table (DHT) & Peer Exchange (PEX)
        • Tracker-less peer identification
      • Local Peer Discovery (LPD)
        • Finds peers on local area network (LAN) allowing much faster data transfer
      • Web Seeds
        • FTP or HTTP resources can be added to the torrent
    • 9. BitTorrent Trackers
      • Many trackers already exist
      • Almost all have legal issues with copyright infringement issues
      • None are tailored to hosting scientific datasets
    • 10.
      • BioTorrents is a file sharing website for scientists
      • BioTorrents provides a central listing of datasets
      • Anyone can upload their own data
      • All data must be “open”; no illegal file sharing
      • Data is not hosted on BioTorrents**
      Langille & Eisen, 2010, PLoS ONE 5: e10071 .
    • 11. BioTorrents: Advanced Features
      • Browse and search by
        • Keyword (dataset title and description)
        • Category (Genomics, Proteomics, Chemistry, etc.)
        • License (Public Domain, Creative Commons, GPL, etc.)
        • Username (mlangill, jeisen, NCBI, etc.)
      • RSS feeds and automatic downloading
      • Torrents linked into “Versions”
      • Upload script for bulk torrent creation
    • 12. BioTorrents progress
      • 1000 registered users
      • 43 datasets (107 GB)
      • 766 downloads
      • 1386 GB data transferred
    • 13. Real Example
      • Download GenBank (~230GB) from NCBI
      NCBI to UC Davis Download speed Time Max 30MB/s 2 hours FTP to other server ~10MB/s 6 hours FTP to NCBI ~.5MB/s 5 days
    • 14. Who will use BioTorrents?
      • Existing large data providers
        • More reliable and faster downloads for users
        • Less bandwidth requirements for provider
      • Scientists sharing published data
        • All data is bundled together and given a unique id
        • Easier than setting up a Web/FTP server
      • Scientists sharing unpublished data
        • Data that might not be suitable for existing databases
        • Results that may not be sufficient for publication
    • 15. Issues
      • BitTorrent works best for large, popular datasets
      • Long term seeding
        • At least 1 seeder has to exist
      • Many institutions block/limit BitTorrent activity
    • 16. Future
      • Metalink
        • XML Link Protocol
        • Combines multiple sources
          • FTP, HTTP, BitTorrent, etc.
      • Volunteer Storage
        • Parallel to volunteer computing
    • 17. Final Message
      • Data transfer should be fast and easy
      • Scientific community should embrace existing technologies such as BitTorrent
      • BioTorrents uses the strengths of BitTorrent and provides features unique to scientific data