SlideShare a Scribd company logo
1 of 25
Download to read offline
Clouds, Clusters, and Containers: Tools for
responsible, collaborative computing
Matt Vaughn @mattdotvaughn, John Fonner
#cyverse #agaveapi #usetacc
Part One: Overview and Introductions
You should have a Cyverse user account https://user.cyverse.org/ ready to go in order to be
productive in the next sessions
AGENDA https://github.com/johnfonner/AKES2016
user.cyverse.org
What is Cloud?
We generally care about reliably expanding our
capacity and capability
We generally don’t want to care about monitoring,
business models, developments in systems
architecture, hardware
Cloud is a useful abstraction that means that the
things we don’t want to mess with are someone
else’s problem
But… it can bring its own challenges
• Reproducibility
• Need for high-level IT skills to use it
• Paying for it
Hammers, scalpels, and scopes
Hammers
• Leadership systems: Stampede. Comet
• Big clusters: Lonestar, Hikari, Bridges
Scalpels
• Data intensive systems: Wrangler, Rustler
• Architecture Experiments: Catapult, Fabric
• Viz and GPU compute: Maverick, Stallion, Lasso
Scopes
• User-provisioned cloud: Chameleon, Jetstream
• Global FS: Stockyard
• Specialized interfaces: APIs, SaaS
What kind of characteristics are commonly associated with Big Data?
1. Physical constraints
2. Big (meta)data volume
3. Big compute
4. Big memory
5. Slow networks
6. Bad algorithms
What does Big Data feel like?
• MapReduce: Hadoop, Storm
• Event & Streaming processing: Kinesis, Azure Stream Analytics, Camel,
Streambase
• Machine Learning: Watson, Azure BI, SAS
• In-memory processing: Kognito, Apache Spark
• New data warehouse: Snowflake,
• FauxSQL
How are people handling Big Data?
Today’s Big Data solutions strangely resemble distributed execution
frameworks with slightly different schedulers.
Mental challenges
• (Enterprise) Integration scenarios
• Software portability
• IT administration
• Performance tuning
• Security
• Provenance
• Reproducibility
• Technology changes
Scientific Big Data is a cultural problem
Social challenges
• Collaboration
• Publishing
• Ownership
• Attribution
• Team dynamics
Scientific Big Data is a cultural problem
Economic challenges
• Infrastructure operations
• Data preservation
• Software maintenance
• Copyright
Scientific Big Data is a cultural problem
Legal challenges
• Copyright
• Purchasing
• HIPAA (and other privacy frameworks)
• Export control
Scientific Big Data is a cultural problem
Impactful “Big Data” solutions won’t be found along a single axis. The next
silver bullet will look like a shotgun.
What is Agave?
Agave is a multi-tenant PaaS solution delivering
Science-as-a-Service
capabilities across hybrid cloud environments.
What does it do?
● Run application codes
your own or community provided codes
● ...on HPC, HTC, and cloud resources
your own, shared, or commercial systems
● ...and manage your data
reliable, multi-protocol, async data movement
● …in a collaborative way
fine grain ACL for working securely with others
● ...from the web
webhooks, rest, json, cors, oauth2
● ...and remember how you did it
deep provenance, history, and reproducibility built in
No, seriously, what does it do?
White Label PaaS
• Build and brand for your organization
• Customize with your own services and features.
• Let us operate it
or host it yourself
• Interacts with existing compute & storage
• Leverages your existing workload manager(s)
• Delegates to your existing IdP & security
• Uses your existing apps
• Creates a cohesive platform for
your dev and user communities
Zero Install Deployment
Web friendly
• JSON in | JSON out
• Global ACLs on every resource
• Role-based management
• Public and private scopes for web publishing
• Sync and async interfaces
• Email & webhook notifications
• Event-driven design
Reproducibility As A Feature
• Deep provenance on everything
• Auto-capture contextual
metadata
• Ability to re-run pipelines,
processes, and data transfers
baked in
Containers for science
• Research is hard
• Coding is hard
• Research code is
• well designed,
• documented,
• leverages design patterns,
• highly reusable,
• portable,
• and usually open source.
Scientists, with few exceptions, are not trained programmers
Containers for science
• Truth be told, they don’t actually even care.
• The ROI of better higher code quality ≈ 0
• No funding available for cleaning up code.
Despite the quality of the code, the science represented by the code is
valuable and necessary for future discovery.
Containers for science
Compute containers are the Magic 8 Ball of
science...
• Compartmentalize code
• Eliminate build and run complexities
• Introduce portability, reuse, & versioning
• Widgetize the creation of a scientific pipeline
...but better because results are reproducible.
Compute containers enable reproducible science via composition.
Containers for science
Data containers can serve as universal adapters
between compute containers
• Transform data
• Bridge file systems
• Enable distributed data access
• Virtualize interfaces
Data containers enable clean integration between containers and standardize how
we interact with distributed data.
Containers are changing the landscape
• Cyverse has been an early adopter of container tech
• Magic wand to make scientific software deployed
and usable
• Pushbutton Interfaces
• Language-specific libraries
• Scriptable CLI tools
• Galaxy, NIH Cancer Cloud Pilots, and lots of other folks
are using them too
But they have their perils too...
• Managing and orchestrating
containers + data + networking
can be complicated
• There are a lot of emergent
solutions
• We won’t touch on this today,
but be careful in your technology
selections

More Related Content

What's hot

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceOla Spjuth
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPTUpender Upr
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the CloudNati Shalom
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big dataJazan University
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
2010 EGITF Amsterdam - Gap between GRID and Humanities
2010 EGITF Amsterdam - Gap between GRID and Humanities2010 EGITF Amsterdam - Gap between GRID and Humanities
2010 EGITF Amsterdam - Gap between GRID and HumanitiesDirk Roorda
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUBAhmed Salman
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computingabhijeetnawal
 
Long Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC DatacenterLong Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC Datacenterinside-BigData.com
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing gridThang Nguyen
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloudsornalathaNatarajan
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
Managing The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageManaging The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageDell World
 

What's hot (20)

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big data
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
2010 EGITF Amsterdam - Gap between GRID and Humanities
2010 EGITF Amsterdam - Gap between GRID and Humanities2010 EGITF Amsterdam - Gap between GRID and Humanities
2010 EGITF Amsterdam - Gap between GRID and Humanities
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computing
 
Long Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC DatacenterLong Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC Datacenter
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing grid
 
Grid computing
Grid computingGrid computing
Grid computing
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloud
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Managing The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageManaging The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing Storage
 

Viewers also liked

Brochure servicios 2015 v.1.1
Brochure servicios 2015 v.1.1Brochure servicios 2015 v.1.1
Brochure servicios 2015 v.1.1Cencosud S.A.
 
Prime company profile
Prime company profilePrime company profile
Prime company profilePrimeplc
 
Japan Asia Group Limited Review: Corporate Philosophy
Japan Asia Group Limited Review: Corporate PhilosophyJapan Asia Group Limited Review: Corporate Philosophy
Japan Asia Group Limited Review: Corporate Philosophygoergehampton
 
Building a Marketing Team - Brand+Aid 2016
Building a Marketing Team - Brand+Aid 2016Building a Marketing Team - Brand+Aid 2016
Building a Marketing Team - Brand+Aid 2016Ryan Hegreness
 
Gobierno de it seguridad de la informacion v.1.1
Gobierno de it   seguridad de la informacion v.1.1Gobierno de it   seguridad de la informacion v.1.1
Gobierno de it seguridad de la informacion v.1.1Cencosud S.A.
 
Decentralising product development controls for sustainability and profit
Decentralising product development controls for sustainability and profitDecentralising product development controls for sustainability and profit
Decentralising product development controls for sustainability and profitVytautas Dauksa
 
RightFax 10.6 Feature Pack 2
RightFax 10.6 Feature Pack 2RightFax 10.6 Feature Pack 2
RightFax 10.6 Feature Pack 2The Fax Guys
 
remisi bagi koruptor
remisi bagi koruptorremisi bagi koruptor
remisi bagi koruptorIr. Soekarno
 
process efficiency review
process efficiency review process efficiency review
process efficiency review Santa Salvatore
 
SEJARAH KETATANEGARAAN INDONESIA
SEJARAH KETATANEGARAAN INDONESIASEJARAH KETATANEGARAAN INDONESIA
SEJARAH KETATANEGARAAN INDONESIABayu Rizky Aditya
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slideseisolomon
 
14 Simple Foods for Better Health
14 Simple Foods for Better Health14 Simple Foods for Better Health
14 Simple Foods for Better HealthDan Erickson
 

Viewers also liked (20)

Brochure servicios 2015 v.1.1
Brochure servicios 2015 v.1.1Brochure servicios 2015 v.1.1
Brochure servicios 2015 v.1.1
 
Catalogo appsweb
Catalogo appswebCatalogo appsweb
Catalogo appsweb
 
Prime company profile
Prime company profilePrime company profile
Prime company profile
 
Japan Asia Group Limited Review: Corporate Philosophy
Japan Asia Group Limited Review: Corporate PhilosophyJapan Asia Group Limited Review: Corporate Philosophy
Japan Asia Group Limited Review: Corporate Philosophy
 
Who are you?
Who are you?Who are you?
Who are you?
 
Gerakan Senam
Gerakan SenamGerakan Senam
Gerakan Senam
 
Building a Marketing Team - Brand+Aid 2016
Building a Marketing Team - Brand+Aid 2016Building a Marketing Team - Brand+Aid 2016
Building a Marketing Team - Brand+Aid 2016
 
Gobierno de it seguridad de la informacion v.1.1
Gobierno de it   seguridad de la informacion v.1.1Gobierno de it   seguridad de la informacion v.1.1
Gobierno de it seguridad de la informacion v.1.1
 
Decentralising product development controls for sustainability and profit
Decentralising product development controls for sustainability and profitDecentralising product development controls for sustainability and profit
Decentralising product development controls for sustainability and profit
 
RightFax 10.6 Feature Pack 2
RightFax 10.6 Feature Pack 2RightFax 10.6 Feature Pack 2
RightFax 10.6 Feature Pack 2
 
Nilai nilai toleransi
Nilai nilai toleransiNilai nilai toleransi
Nilai nilai toleransi
 
P point2
P point2P point2
P point2
 
remisi bagi koruptor
remisi bagi koruptorremisi bagi koruptor
remisi bagi koruptor
 
Enterprise Computing
Enterprise ComputingEnterprise Computing
Enterprise Computing
 
process efficiency review
process efficiency review process efficiency review
process efficiency review
 
SEJARAH KETATANEGARAAN INDONESIA
SEJARAH KETATANEGARAAN INDONESIASEJARAH KETATANEGARAAN INDONESIA
SEJARAH KETATANEGARAAN INDONESIA
 
Proposal nelayan
Proposal nelayanProposal nelayan
Proposal nelayan
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slides
 
Soaps and deteregnt
Soaps and deteregntSoaps and deteregnt
Soaps and deteregnt
 
14 Simple Foods for Better Health
14 Simple Foods for Better Health14 Simple Foods for Better Health
14 Simple Foods for Better Health
 

Similar to Clouds, Clusters, and Containers: Tools for responsible, collaborative computing

Cloud computing infrastructure
Cloud computing infrastructure Cloud computing infrastructure
Cloud computing infrastructure Dr. Anita Goel
 
The Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an InstituteThe Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an InstituteSandra Gesing
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
 
Cloud computing 13 principal enabling technologies
Cloud computing 13 principal  enabling technologiesCloud computing 13 principal  enabling technologies
Cloud computing 13 principal enabling technologiesVaibhav Khanna
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureMicrosoft Tech Community
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 

Similar to Clouds, Clusters, and Containers: Tools for responsible, collaborative computing (20)

Cloud computing infrastructure
Cloud computing infrastructure Cloud computing infrastructure
Cloud computing infrastructure
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
The Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an InstituteThe Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an Institute
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
Cloud computing 13 principal enabling technologies
Cloud computing 13 principal  enabling technologiesCloud computing 13 principal  enabling technologies
Cloud computing 13 principal enabling technologies
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 

More from Matthew Vaughn

On-Demand Cloud Computing for Life Sciences Research and Education
On-Demand Cloud Computing for Life Sciences Research and EducationOn-Demand Cloud Computing for Life Sciences Research and Education
On-Demand Cloud Computing for Life Sciences Research and EducationMatthew Vaughn
 
Towards a (united) federation of Bioinformatics resources
Towards a (united) federation of Bioinformatics resourcesTowards a (united) federation of Bioinformatics resources
Towards a (united) federation of Bioinformatics resourcesMatthew Vaughn
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURECYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTUREMatthew Vaughn
 
Jetstream: Accessible cloud computing for the national science and engineerin...
Jetstream: Accessible cloud computing for the national science and engineerin...Jetstream: Accessible cloud computing for the national science and engineerin...
Jetstream: Accessible cloud computing for the national science and engineerin...Matthew Vaughn
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureJetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureMatthew Vaughn
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesMatthew Vaughn
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataMatthew Vaughn
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportMatthew Vaughn
 
Dinosaur bioinformatics
Dinosaur bioinformaticsDinosaur bioinformatics
Dinosaur bioinformaticsMatthew Vaughn
 
aip-developer-intro_pag2015
aip-developer-intro_pag2015aip-developer-intro_pag2015
aip-developer-intro_pag2015Matthew Vaughn
 
iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015Matthew Vaughn
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialMatthew Vaughn
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014Matthew Vaughn
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Matthew Vaughn
 

More from Matthew Vaughn (16)

On-Demand Cloud Computing for Life Sciences Research and Education
On-Demand Cloud Computing for Life Sciences Research and EducationOn-Demand Cloud Computing for Life Sciences Research and Education
On-Demand Cloud Computing for Life Sciences Research and Education
 
Towards a (united) federation of Bioinformatics resources
Towards a (united) federation of Bioinformatics resourcesTowards a (united) federation of Bioinformatics resources
Towards a (united) federation of Bioinformatics resources
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURECYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
 
Jetstream: Accessible cloud computing for the national science and engineerin...
Jetstream: Accessible cloud computing for the national science and engineerin...Jetstream: Accessible cloud computing for the national science and engineerin...
Jetstream: Accessible cloud computing for the national science and engineerin...
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureJetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
 
Dinosaur bioinformatics
Dinosaur bioinformaticsDinosaur bioinformatics
Dinosaur bioinformatics
 
aip-developer-intro_pag2015
aip-developer-intro_pag2015aip-developer-intro_pag2015
aip-developer-intro_pag2015
 
iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorial
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 

Clouds, Clusters, and Containers: Tools for responsible, collaborative computing

  • 1. Clouds, Clusters, and Containers: Tools for responsible, collaborative computing Matt Vaughn @mattdotvaughn, John Fonner #cyverse #agaveapi #usetacc Part One: Overview and Introductions You should have a Cyverse user account https://user.cyverse.org/ ready to go in order to be productive in the next sessions AGENDA https://github.com/johnfonner/AKES2016 user.cyverse.org
  • 2. What is Cloud? We generally care about reliably expanding our capacity and capability We generally don’t want to care about monitoring, business models, developments in systems architecture, hardware Cloud is a useful abstraction that means that the things we don’t want to mess with are someone else’s problem But… it can bring its own challenges • Reproducibility • Need for high-level IT skills to use it • Paying for it
  • 3.
  • 4. Hammers, scalpels, and scopes Hammers • Leadership systems: Stampede. Comet • Big clusters: Lonestar, Hikari, Bridges Scalpels • Data intensive systems: Wrangler, Rustler • Architecture Experiments: Catapult, Fabric • Viz and GPU compute: Maverick, Stallion, Lasso Scopes • User-provisioned cloud: Chameleon, Jetstream • Global FS: Stockyard • Specialized interfaces: APIs, SaaS
  • 5. What kind of characteristics are commonly associated with Big Data? 1. Physical constraints 2. Big (meta)data volume 3. Big compute 4. Big memory 5. Slow networks 6. Bad algorithms What does Big Data feel like?
  • 6. • MapReduce: Hadoop, Storm • Event & Streaming processing: Kinesis, Azure Stream Analytics, Camel, Streambase • Machine Learning: Watson, Azure BI, SAS • In-memory processing: Kognito, Apache Spark • New data warehouse: Snowflake, • FauxSQL How are people handling Big Data? Today’s Big Data solutions strangely resemble distributed execution frameworks with slightly different schedulers.
  • 7. Mental challenges • (Enterprise) Integration scenarios • Software portability • IT administration • Performance tuning • Security • Provenance • Reproducibility • Technology changes Scientific Big Data is a cultural problem
  • 8. Social challenges • Collaboration • Publishing • Ownership • Attribution • Team dynamics Scientific Big Data is a cultural problem
  • 9. Economic challenges • Infrastructure operations • Data preservation • Software maintenance • Copyright Scientific Big Data is a cultural problem
  • 10. Legal challenges • Copyright • Purchasing • HIPAA (and other privacy frameworks) • Export control Scientific Big Data is a cultural problem Impactful “Big Data” solutions won’t be found along a single axis. The next silver bullet will look like a shotgun.
  • 11.
  • 12. What is Agave? Agave is a multi-tenant PaaS solution delivering Science-as-a-Service capabilities across hybrid cloud environments.
  • 13. What does it do? ● Run application codes your own or community provided codes ● ...on HPC, HTC, and cloud resources your own, shared, or commercial systems ● ...and manage your data reliable, multi-protocol, async data movement ● …in a collaborative way fine grain ACL for working securely with others ● ...from the web webhooks, rest, json, cors, oauth2 ● ...and remember how you did it deep provenance, history, and reproducibility built in
  • 14. No, seriously, what does it do?
  • 15. White Label PaaS • Build and brand for your organization • Customize with your own services and features. • Let us operate it or host it yourself
  • 16. • Interacts with existing compute & storage • Leverages your existing workload manager(s) • Delegates to your existing IdP & security • Uses your existing apps • Creates a cohesive platform for your dev and user communities Zero Install Deployment
  • 17. Web friendly • JSON in | JSON out • Global ACLs on every resource • Role-based management • Public and private scopes for web publishing • Sync and async interfaces • Email & webhook notifications • Event-driven design
  • 18. Reproducibility As A Feature • Deep provenance on everything • Auto-capture contextual metadata • Ability to re-run pipelines, processes, and data transfers baked in
  • 19.
  • 20. Containers for science • Research is hard • Coding is hard • Research code is • well designed, • documented, • leverages design patterns, • highly reusable, • portable, • and usually open source. Scientists, with few exceptions, are not trained programmers
  • 21. Containers for science • Truth be told, they don’t actually even care. • The ROI of better higher code quality ≈ 0 • No funding available for cleaning up code. Despite the quality of the code, the science represented by the code is valuable and necessary for future discovery.
  • 22. Containers for science Compute containers are the Magic 8 Ball of science... • Compartmentalize code • Eliminate build and run complexities • Introduce portability, reuse, & versioning • Widgetize the creation of a scientific pipeline ...but better because results are reproducible. Compute containers enable reproducible science via composition.
  • 23. Containers for science Data containers can serve as universal adapters between compute containers • Transform data • Bridge file systems • Enable distributed data access • Virtualize interfaces Data containers enable clean integration between containers and standardize how we interact with distributed data.
  • 24. Containers are changing the landscape • Cyverse has been an early adopter of container tech • Magic wand to make scientific software deployed and usable • Pushbutton Interfaces • Language-specific libraries • Scriptable CLI tools • Galaxy, NIH Cancer Cloud Pilots, and lots of other folks are using them too
  • 25. But they have their perils too... • Managing and orchestrating containers + data + networking can be complicated • There are a lot of emergent solutions • We won’t touch on this today, but be careful in your technology selections