Do you work with big data? Do you need to share your data and analyses with collaborators at multiple institutions? Does it take days to run your analyses on your laptop? The iPlant Collaborative (http://www.iplantcollaborative.org/) provides free cyberinfrastructure to biologists to address these very challenges. We are an NSF-funded initiative with the mission to facilitate the transformation of life sciences research and education by providing the computing infrastructure and expertise needed to answer biological questions that were previously difficult or impossible to address. Despite the name, iPlant’s scope includes any life sciences research, be it genomics or ecology; in plants, animals, or microbes; from single-researcher investigations to community-wide collaborations. Our cyberinfrastructure is suitable for ecological research that requires access to shared data storage, very large data sets, high performance computing, or cloud computing. iPlant also provides a platform for developers and informaticians to share their tools with the ecological community.
This presentation will provide an overview of the tools and services available through iPlant, with an emphasis on their utility to ecologists and ecological informaticians. These include: data storage, sharing, and metadata mark-up via the iPlant Data Store; data publishing and discovery through the iPlant Data Commons; cloud-based computing through Atmosphere; web-based access to dozens of applications through the Discovery Environment; iPlant Application Programming Interfaces (APIs); an image management and analysis system with a high performance computing back-end (Bisque); access to high-resolution global environmental layers; and educational and training resources. Projects that use iPlant’s infrastructure will be touched upon, including the iMicrobe project. iPlant’s flexible, open-source architecture should be of interest to anyone who needs to organize and analyze very large data sets, is using genomic or metagenomic methods to address ecological questions, or is developing or using ecological models that require large memory or parallel computations.
3. Cyberinfrastructure (CI) is:
• data storage
• software
• hardware
• high-performance
computing
• people
…used to solve problems of size and scope that
would not otherwise be solvable.
5. iPlant Tools and Services
https://community.lego.com/t5/LEGO-General/What-
does-LEGO-mean/td-p/4318550
6. iPlant Data Store is the heart
of our CI
All registered users have 100 GB free allocation, can
easily request up to 1TB for shared projects
Cyberduck
7. Fast Parallel Data Transfers
Source Time (sec)
CD 320
Berkeley Server 150
External Drive 36*
USB2.0 Flash 30
iPlant Data
Store
18*
My Computer 15
Time to move 1GB of data from UC Berkeley to iPlant Data Store and other
locations. Based on 100 GB transfer.
20. Interacting with iPlant
Extended Collaborative Support (ECS) – Take
advantage of iPlant’s existing components to build and
share a new workflow.
Powered by iPlant – develop your own front end, use
our authentication, storage, and permission services
and APIs
iPlant Community Collaborations – use iPlant to host
data and analyses for your research community
Thanks to the iPlant staff.
Community collaborations are the key to iPlant’s success. We don’t develop any analysis algorithms. We just make them available to people and help make them work better.
I put together in Latin or play well in Danish.
Can now use docker to add applications
Run analyses remotely; share analyses set-ups with collaborators; publish software, parameters and data all on one VM for reproducability
Images stored on iDS. Allows both graphical and textual annotations.
Automatic batch processing. Not yet interactive.
Agave API runs in the cloud as a hosted, multi-tenant service, so there is nothing to install. It allows you to define your own compute and storage resources so you can interact with the resources you already have, as well as with the ones Agave provides. It allows you to build up your own app store of scientific codes and workflows and share them with anyone. And because Agave is about performance, it leverages the nation's fastest, high-speed networks to move your data as fast as physics allow. Concerned about security? Agave gives you the flexibility to interact with your data and computation without ever leaving your own network. In short, the Agave API helps you do your science your way.
Queries!
Asking questions of data that can’t be answered with a spreadsheet.
show an example here