Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructureinside-BigData.com
Matt Vaughn from TACC presented this deck at the HPC User Forum in Tucson.
"Jetstream is the first user-friendly, scalable cloud environment for XSEDE. The system enables researchers working at the "long tail of science" and the creation of truly customized virtual machines and computing architectures. It has a web-based user interface integrated with XSEDE via Globus Auth. The architecture is derived from the team's collective experience with CyVerse Atmosphere, Chameleon and Quarry. The system also fosters reproducible, sharable computing with geographically isolated clouds located at Indiana University and TACC."
Watch the video presentation: http://wp.me/p3RLHQ-fcq
Learn more: https://www.tacc.utexas.edu/systems/jetstream
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
Keynote presentation given 13/9/16 @ ESA Earth Observation Open Science workshop 2016.
"The rise in cloud computing as an e-infrastructure model is one that has the power to democratise access to computational and data resources throughout the research communities. We have seen the difference that Infrastructure as a Service (IaaS) has made for different communities and are now only beginning to understand what different models further up the stack can make. It is also becoming clear that with the increase in research data volumes, the number of sources and the possibility of utilising data from different regulatory regimes that a different model of how analysis is performed on the data is possible. Utilising a "Desktop as a Service" model, with community focused applications installed on a common and well understood virtual system image that is directly connected to community relevant data allows the researcher to no longer have to consider moving data but only the final analysed results. This massively simplifies both the user model and the data and resource owner model. We will consider the specific example of the Environmental Ecomics Synthesis Cloud and how it could easily be generalised to other areas."
Tutorial at K-Cap 2015:
Knowledge Processing with Big Data and
Semantic Web Technologies.
Session 0: Motivation
Session 1: Infrastructure
Session 2: Data Curation
Session 3: Query Federation
Session 4: Analyze
Session 5: Visualization
Session 6: Hands On Session
On-Demand Cloud Computing for Life Sciences Research and EducationMatthew Vaughn
The Jetstream cloud is a collaboration between Cyverse partners TACC and University of Arizona, University of Chicago, Johns Hopkins University, and Indiana University to bring the flexibility and ease-of-use of CyVerse Atmosphere to the entire community of science, at a much larger scale. Jetstream is a cloud resource operated as part of XSEDE, and built from two independent OpenStack clusters, each capable of supporting thousands of virtual machines and data volumes. The clusters are integrated via the user-friendly "Atmosphere" interface developed by CyVerse, with authentication enabled by Globus, and, unlike the CyVerse cloud also offer full access to Openstack web service APIs. Jetstream features a diverse catalog virtual machine templates. One can launch a personal Galaxy server, do advanced biostatistics, use Matlab, or experiment with new technologies like Docker, all on Jetstream. This talk highlights the unique capabilities of Jetstream and provides information about how researchers from all over can access it.
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructureinside-BigData.com
Matt Vaughn from TACC presented this deck at the HPC User Forum in Tucson.
"Jetstream is the first user-friendly, scalable cloud environment for XSEDE. The system enables researchers working at the "long tail of science" and the creation of truly customized virtual machines and computing architectures. It has a web-based user interface integrated with XSEDE via Globus Auth. The architecture is derived from the team's collective experience with CyVerse Atmosphere, Chameleon and Quarry. The system also fosters reproducible, sharable computing with geographically isolated clouds located at Indiana University and TACC."
Watch the video presentation: http://wp.me/p3RLHQ-fcq
Learn more: https://www.tacc.utexas.edu/systems/jetstream
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
Keynote presentation given 13/9/16 @ ESA Earth Observation Open Science workshop 2016.
"The rise in cloud computing as an e-infrastructure model is one that has the power to democratise access to computational and data resources throughout the research communities. We have seen the difference that Infrastructure as a Service (IaaS) has made for different communities and are now only beginning to understand what different models further up the stack can make. It is also becoming clear that with the increase in research data volumes, the number of sources and the possibility of utilising data from different regulatory regimes that a different model of how analysis is performed on the data is possible. Utilising a "Desktop as a Service" model, with community focused applications installed on a common and well understood virtual system image that is directly connected to community relevant data allows the researcher to no longer have to consider moving data but only the final analysed results. This massively simplifies both the user model and the data and resource owner model. We will consider the specific example of the Environmental Ecomics Synthesis Cloud and how it could easily be generalised to other areas."
Tutorial at K-Cap 2015:
Knowledge Processing with Big Data and
Semantic Web Technologies.
Session 0: Motivation
Session 1: Infrastructure
Session 2: Data Curation
Session 3: Query Federation
Session 4: Analyze
Session 5: Visualization
Session 6: Hands On Session
On-Demand Cloud Computing for Life Sciences Research and EducationMatthew Vaughn
The Jetstream cloud is a collaboration between Cyverse partners TACC and University of Arizona, University of Chicago, Johns Hopkins University, and Indiana University to bring the flexibility and ease-of-use of CyVerse Atmosphere to the entire community of science, at a much larger scale. Jetstream is a cloud resource operated as part of XSEDE, and built from two independent OpenStack clusters, each capable of supporting thousands of virtual machines and data volumes. The clusters are integrated via the user-friendly "Atmosphere" interface developed by CyVerse, with authentication enabled by Globus, and, unlike the CyVerse cloud also offer full access to Openstack web service APIs. Jetstream features a diverse catalog virtual machine templates. One can launch a personal Galaxy server, do advanced biostatistics, use Matlab, or experiment with new technologies like Docker, all on Jetstream. This talk highlights the unique capabilities of Jetstream and provides information about how researchers from all over can access it.
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersAlan Sill
Learn about standards studied in the US National Science Foundation Cloud and Autonomic Computing Industry/University Cooperative Research Center Cloud Standards Testing Lab and how you can get involved to extend the successes from these results in your own cloud software settings. Presented at the O'Reilly OSCON 2014 Open Cloud Day.
Video available at https://www.youtube.com/watch?v=eD2h0SqC7tY
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
The exponential increase in data available to analyze power system events is universally recognized, but in many cases the approach to using this data is to do what we already do but do it faster, or get more people to do it. Unfortunately, spinning the hamster wheel faster is not keeping up with the demand to make decisions faster in support of grid modernization. Open source software (OSS) tools offer tremendous opportunity for collaboration that encourages innovation, and the speed and flexibility of development to keep pace with these demands.
Interop ITX: Moving applications: From Legacy to Cloud-to-CloudSusan Wu
Cloud computing provides an array of hosting and service options to fit your overall company strategy. Sometimes a public cloud is your best option and other times your data requirements demand a private cloud. As needs converge, a hybrid solution continues to gain popularity. Developers must consider if their applications might be run on either or both.
Hear about Midokura.com's journey going from the colos to cloud servers to AWS.
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
The grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access grid resources can be quite complicated, especially if it has to be replicated for every application. In this presentation, we present Opal which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based grid security and data management in an easy-to-use and configurable manner
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
In this presentation from the DDN User Meeting at SC13, Erik Deumans from SSERCA describes how the institution is sharing data with WOS from DDN.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
Data science is a key discipline in a data-driven organization. Through analytics, data scientists can uncover previously unknown relationships in data to help an organization make better decisions. However, data science is often performed from local machines with limited resources and multiple datasets on a variety of databases. Moving to the cloud can help organizations provide scalable compute and storage resources to data scientists, while freeing them from the burden of setting up and managing infrastructure.
In this session, FINRA, the Financial Industry Regulatory Authority, shares best practices and lessons learned when building a self-service, curated data science platform on AWS. A project that allowed us to remove the technology middleman and empower users to choose the best compute environment for their workloads. Understand the architecture and underlying data infrastructure services to provide a secure, self-service portal to data scientists, learn how we built consensus for tooling from of our data science community, hear about the benefits of increased collaboration among the scientists due to the standardized tools, and learn how you can retain the freedom to experiment with the latest technologies while retaining information security boundaries within a virtual private cloud (VPC).
OpenStack at the speed of business with SolidFire & Red Hat NetApp
When it comes to OpenStack® and the enterprise, it’s critical that you can rapidly deploy a plug-and-play solution that delivers mixed workload capabilities on a shared infrastructure. Join Red Hat and SolidFire to see how Agile Infrastructure for OpenStack can help your cloud move at the speed of business.
Engage 2013 - Leveraging the cloud for ultimate flexibilityAvtex
A case study of how the Twin Cities Marathon used the cloud to scale without introducing significant infrastructure cost and delivered a great fan experience.
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersAlan Sill
Learn about standards studied in the US National Science Foundation Cloud and Autonomic Computing Industry/University Cooperative Research Center Cloud Standards Testing Lab and how you can get involved to extend the successes from these results in your own cloud software settings. Presented at the O'Reilly OSCON 2014 Open Cloud Day.
Video available at https://www.youtube.com/watch?v=eD2h0SqC7tY
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
The exponential increase in data available to analyze power system events is universally recognized, but in many cases the approach to using this data is to do what we already do but do it faster, or get more people to do it. Unfortunately, spinning the hamster wheel faster is not keeping up with the demand to make decisions faster in support of grid modernization. Open source software (OSS) tools offer tremendous opportunity for collaboration that encourages innovation, and the speed and flexibility of development to keep pace with these demands.
Interop ITX: Moving applications: From Legacy to Cloud-to-CloudSusan Wu
Cloud computing provides an array of hosting and service options to fit your overall company strategy. Sometimes a public cloud is your best option and other times your data requirements demand a private cloud. As needs converge, a hybrid solution continues to gain popularity. Developers must consider if their applications might be run on either or both.
Hear about Midokura.com's journey going from the colos to cloud servers to AWS.
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
The grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access grid resources can be quite complicated, especially if it has to be replicated for every application. In this presentation, we present Opal which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based grid security and data management in an easy-to-use and configurable manner
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
In this presentation from the DDN User Meeting at SC13, Erik Deumans from SSERCA describes how the institution is sharing data with WOS from DDN.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
Data science is a key discipline in a data-driven organization. Through analytics, data scientists can uncover previously unknown relationships in data to help an organization make better decisions. However, data science is often performed from local machines with limited resources and multiple datasets on a variety of databases. Moving to the cloud can help organizations provide scalable compute and storage resources to data scientists, while freeing them from the burden of setting up and managing infrastructure.
In this session, FINRA, the Financial Industry Regulatory Authority, shares best practices and lessons learned when building a self-service, curated data science platform on AWS. A project that allowed us to remove the technology middleman and empower users to choose the best compute environment for their workloads. Understand the architecture and underlying data infrastructure services to provide a secure, self-service portal to data scientists, learn how we built consensus for tooling from of our data science community, hear about the benefits of increased collaboration among the scientists due to the standardized tools, and learn how you can retain the freedom to experiment with the latest technologies while retaining information security boundaries within a virtual private cloud (VPC).
OpenStack at the speed of business with SolidFire & Red Hat NetApp
When it comes to OpenStack® and the enterprise, it’s critical that you can rapidly deploy a plug-and-play solution that delivers mixed workload capabilities on a shared infrastructure. Join Red Hat and SolidFire to see how Agile Infrastructure for OpenStack can help your cloud move at the speed of business.
Engage 2013 - Leveraging the cloud for ultimate flexibilityAvtex
A case study of how the Twin Cities Marathon used the cloud to scale without introducing significant infrastructure cost and delivered a great fan experience.
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
Cyverse.org designs, builds, and operates an innovative, integrated life sciences cyberinfrastructure. It provides data management and analysis capabilities with point and click, cloud, API, and command-line interfaces that engage users of any computing proficiency and is based on an extensible platform that integrates local and national-scale HPC, storage, and cloud resources. Cyverse directly supports thousands of users who store and access over 2PB of research data, use millions of compute hours annually, and participate in the platform's improvement, plus a secondary user community from partner projects that have built atop it. Cyverse is organized around "Data Store" and "App Catalog" services, each of which enables users to upload digital research assets that can be kept private, shared, or made public. Recently, Cyverse has been transitioning from passively enabling digital sharing towards active facilitation. It is partnering with repositories like NCBI SRA to enable direct submission from Cyverse applications, adopting commonly-used ontologies, enabling import/export of virtual machine images, developing metadata-driven persistent landing pages for data sets, and providing DOI (and other identifier) services. These new features are expected to further catalyze growth of an interoperable, interconnected network of shared research infrastructure across the biological sciences.
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingMatthew Vaughn
Intro slides from AKES workshop at ISMB2016. This workshop addresses the challenges and requirements for working effectively on cloud computing and high performance computing resources, discusses the key principles that should guide responsible scientific computation and collaboration, and using hands-on sessions presents practical solutions using emergent software tools that are becoming widely adopted in the global scientific community. Specifically, we will look at using “containers” to bundle software applications and their full execution environment in a portable way. We will look at managing and sharing data across distributed resources. And finally, we will tackle how to orchestrate job execution across systems and capture metadata on the results (and the process) so that parameters and methodologies are not lost.
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
A typical instance of computational biology software is composed of interpreted code, compiled binaries, shared libraries, and shell scripts, sometimes mixed in with use of web services or databases, running in the context of a complex computer operating system, atop increasingly sophisticiated physical resources. How can we expect computations to be sharable and reproducible, and how can we hope to train people to use such resources? This talk will describe how the Texas Advanced Computing Center enables distribution and use scientific software via various approaches, including Jupyter notebooks, Github repositories, computation-oriented web service APIs, virtual machine images, and container technologies such as Docker, and how these approaches complement one another for training and education.
Scaling People, Not Just Systems, to Take On Big Data ChallengesMatthew Vaughn
Here, I describe how the Texas Advanced Computing Center has shifted its focus from traditional modeling and simulation towards fully embracing big data analytics performed by users with diverse technical backgrounds.
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataMatthew Vaughn
Araport is an innovative model organism database resource that offers users the ability to bring their own visualizations, data sets, algorithms, and genome browser tracks and share them with their colleagues.
Arabidopsis Information Portal overview from Plant Biology Europe 2014Matthew Vaughn
An overview of the design, technical decisions, and implementation of the Arabidopsis Information Portal community-extensible data sharing and analytics platform.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
1. funded by the National Science Foundation
Award #ACI-1445604
Jetstream: Adding Cloud-based
Computing to the National
Cyberinfrastructure
Matthew Vaughn(@mattdotvaughn)
ORCID 0000-0002-1384-4283
Director, Life Science Computing
co-PI, Jetstream | Cyverse | Araport
Texas Advanced Computing Center
2. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
What is Jetstream?
• A resource to expand the community of users who benefit from NSF
investment in shared cyberinfrastructure
• Production cloud system supporting all domains of science and
engineering research sponsored by the NSF
• Provide on-demand interactive computing and analysis
• Enable configurable environments and architectures
• Support computational reproducibility and sharing
• Democratizes access to cloud-native technology and software
• Focuses on ease of use, but also on maintaining flexiblility
3. Expanding NSF XD’s reach and impact
Around 299,000 researchers, educators, & learners received NSF support in
2012-2013
– Only 1.5% completed a computation, data analysis, or visualization
task on XD program resources
– Less than 3% had an XSEDE Portal account
– 70% of researchers surveyed* claimed to be resource constrained
Why aren’t they using XD systems?
– Activation energy is pretty high
– HPC resources are scarce and not well-matched to their needs
– They just don’t need that much capability
* https://www.xsede.org/xsede-nsf-release-cloud-survey-report
4. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Expanding NSF XD’s reach and impact
Capability class machines
Traditional HPC, HTC systems
5. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Who do we expect will use Jetstream?
• Researchers, developers, and scientists who…
– Require somewhere between 4 and a few hundred cores RIGHT
NOW and for the foreseeable future, but not forever
– Want the ability to fully customize the OS and configuration of
their computing setup
– Wish to move cloud-native workflows to/from an academic
environment
– Need interactive mode access to a computing and analysis
system
6. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Who do we expect will use Jetstream?
• Also…
– Science gateway operators using Jetstream as either the
frontend or processor for scientific jobs
– Anyone who is evaluating or experimenting with software not
traditionally supported on XD systems
– STEM Educators teaching on a variety of subjects
7. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Diverse research domains
• Biology: iPlant and Galaxy VMs, enabling access to and use of new
analytical codes in various modalities
• Earth Science: VMs capable of requesting NSIDC data and running
common routines to enable more effective research and better
analyses of data
• Field Station Research: VM-based data collection and analysis tools
to support data sharing and collaboration
• GIS: Deliver the CyberGIS toolkit and provide access to ArcGIS in a
VM using IU’s existing site license
• Network Science: Build VMs with CIShell tool builders to deliver
network analysis tools interactively
• Social Sciences: Create VMs that allow selection of data from the
Odum Institute in a way that retains provenance and version
information
8. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
21st century workforce development
• Specialized virtual Linux desktops and applications to enable
research and research education at small colleges and universities
– HBCUs (Historically Black Colleges and Universities)
– MSIs (Minority Serving Institutions)
– Tribal colleges
– Higher-education institutions in EPSCoR States
• Also, complete democratization of access to cloud-native
technologies and approaches
10. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Systems Overview
11. Flavor vCPUs RAM Storage Per Node
m.tiny 1 2 20 46
m.small 2 4 40 23
m.medium 6 16 130 7
m.large 10 30 230 4
m.xlarge 22 60 460 2
m.xxlarge 44 120 920 1
VM Host Configuration
• Dual Intel E-2680v3 “Haswell”
• 24 physical cores/node @ 2.5
GHz (Hyperthreading on)
• 128 GB RAM
• Dual 1 TB local disks
• 10GB dual uplink NIC
• Running KVM Hypervisor
Hardware Specifics
CEPH Storage
• 20x Dell 730xd per cloud
• 2x10Gbs bonded NIC per 730xd
• Running CEPH 0.94.5 Hammer
• Configured as OpenStack Storage
• Storage is XSEDE-allocated
• Implemented on backend as OpenStack Volumes
• Each user gets 10 volumes up to 500GB total storage
• Exploring object storage as well but that’s in the future
12. Integration with XSEDE via Globus Auth
Atmosphere Web App uses
and Globus Auth
implements industry-
standard Oauth2
• Leaves us flexibility on
identity and access
• Globus Auth implements
(in beta) password grant
Oauth flow, which means
Jetstream access can be
entirely scripted
13.
14.
15.
16.
17.
18.
19.
20.
21.
22. Programmatic Access
Web Service APIs
• Openstack - Official and unofficial clients + libs
(i.e. boto)
• EC2* – Integration with AWS-specific code
• Atmosphere - Very beta. Getting language
libraries “soon”
• Preview @
http://docs.atmospherev2.apiary.io/
*Still quite finicky
Automation, Orchestration, and Workflow
• Marathon/MESOS
• https://www.youtube.com/v/VzZfwHLmcL0
• Docker Machine* + Swarm
• Apache Airavata
• CloudMan & Elasticluster*
Configuration Management Tools
• Vagrant & Terraform (Hashicorp)
• Chef
24. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
What comes next?
• Both cloud systems + all software components installed, configured,
and operational
• “Early operations mode” is underway
• End of March 2016: 38 XSEDE projects and 250+ users
• Acceptance review scheduled with NSF in early May
• Full, unrestricted operations after system is accepted
• Soliciting Research allocation requests NOW plus Startup and
Education allocations
25. Allow any user with active XSEDE
User Portal account to use a small
(but functional) slice of Jetstream
• Sign up for XUP Account
• Sign into User Portal
• Click “Trial Jetstream Access”
button
• Have (restricted) access to
Jetstream in ~30 minutes or less
Analogous to the free tier that makes
it so easy to get started on the public
cloud
The “Easy Button”
26. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Partners
Construction
Application / Community LeadsManagement & Operations
Vendors
28. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
How can I use Jetstream?
• An XSEDE User Portal (XUP) account is required. They are free!
Get one at https://portal.xsede.org
• Read the Allocations Overview - https://portal.xsede.org/allocations-
overview
• Write a successful allocation request – start with a Startup or
Education request - https://portal.xsede.org/successful-requests
29. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Where can I get help or learn more?
• Production:
– User guides: https://portal.xsede.org/user-guides
– XSEDE KB: https://portal.xsede.org/knowledge-base
– Email: help@xsede.org
– Campus Champions: https://www.xsede.org/campus-champions
– Training Videos / Virtual Workshops (TBD)
• Early use:
– http://jetstream-cloud.org/
– Early use: jethelp@iu.edu
Editor's Notes
HALFWAY THRU 15M
Key distinction: Cloud is someone else’s computer. Cloud computing in this context is some else’s computer that YOU can reconfigure and automate.