JPL uses cloud computing to share live events like the Curiosity landing with the world. The cloud provides scalability, availability, and security. For the landing, JPL used Amazon Web Services for content delivery, computing, storage, and other services. JPL re-engineered systems for the cloud. The Curiosity landing saw a 5x increase in data served and a 214x increase in peak throughput compared to previous Mars rovers, allowing more people to experience the event.
This document discusses optimizing a ground station network for optical space-to-ground communication links. It aims to find a network of optical ground stations that maximizes the amount of data transmitted from low Earth orbit satellites while accounting for the impact of clouds on optical communications, unlike radio-frequency technologies. The document outlines advantages and disadvantages of optical communication compared to radio-frequency and discusses future work involving improving the ground station database and using longer timeframes of cloud data.
The January 17, 2013 Children's Services Council of Broward County monthly meeting agenda included:
1) Calling the meeting to order and taking roll call.
2) The Chair's comments which included council reappointments and approving previous meeting minutes.
3) Election of officers.
4) Reports from the President, CPO, COO, and PAOD which included approving funding for various summer programs and leveraged funds requests.
5) A SNAC report and public comment.
Children's Services Council of Broward County Charter School Presentationcscbroward
Karen Swartzbaugh, Chief Program Officer of the Children's Services Council of Broward County, presented to the Florida Consortium of Public Charter Schools.
The Children's Services Council of Broward County provides leadership, advocacy and resources necessary to enhance children's lives and empower them to become responsible, productive adults. To learn more, visit us online at www.cscbroward.org and on social media at www.facebook.com/cscbroward; www.twitter.com/cscbroward; and www.youtube.com/cscbroward
Children's Services Council of Broward County, Systemic Model of Preventioncscbroward
Research Analyst Laura Ganci and Program Specialist Melissa Stanley of the Children's Services Council of Broward County, hosted a webinar for the Florida Alcohol and Drug Abuse Association on Implementing a Collaborative Approach to Child Welfare.
The Children's Services Council of Broward County provides leadership, advocacy and resources necessary to enhance children's lives and empower them to become responsible, productive adults. To learn more, visit us online at www.cscbroward.org and on social media at www.facebook.com/cscbroward; www.twitter.com/cscbroward; and www.youtube.com/cscbroward
Supported Training & Employment Programs funded by Children's Services Counci...cscbroward
Supported Training & Employment Programs, or STEP, is a year-round program funded by the Children's Services Council of Broward County that offers youth development and employment experience to prepare youth with disabilities for post-secondary education, training and employment opportunities.
Children's Services Council of Broward County October 2014 Council Meetingcscbroward
The Monthly Council Meeting of the Community Services Council of Broward County took place on October 16, 2014. Topics discussed included approving previous meeting minutes and cancelling the December meeting, receiving reports from the President/CEO and CPO on new grants and program adjustments, and committees providing updates on agency capacity and special needs programs. The meeting concluded with public comments, a roundtable discussion, and distribution of information items.
Big data and cloud computing are closely intertwined. The cloud is well-suited to handle big data challenges by providing massive scalability, flexible pay-as-you-go pricing, and removing the undifferentiated heavy lifting of managing infrastructure. This allows companies to focus on analyzing large and complex datasets. Examples show how companies use Amazon Web Services to collect petabytes of data from sources like sensors and social media, process it using services like EMR, and gain insights for applications in various industries.
This document discusses optimizing a ground station network for optical space-to-ground communication links. It aims to find a network of optical ground stations that maximizes the amount of data transmitted from low Earth orbit satellites while accounting for the impact of clouds on optical communications, unlike radio-frequency technologies. The document outlines advantages and disadvantages of optical communication compared to radio-frequency and discusses future work involving improving the ground station database and using longer timeframes of cloud data.
The January 17, 2013 Children's Services Council of Broward County monthly meeting agenda included:
1) Calling the meeting to order and taking roll call.
2) The Chair's comments which included council reappointments and approving previous meeting minutes.
3) Election of officers.
4) Reports from the President, CPO, COO, and PAOD which included approving funding for various summer programs and leveraged funds requests.
5) A SNAC report and public comment.
Children's Services Council of Broward County Charter School Presentationcscbroward
Karen Swartzbaugh, Chief Program Officer of the Children's Services Council of Broward County, presented to the Florida Consortium of Public Charter Schools.
The Children's Services Council of Broward County provides leadership, advocacy and resources necessary to enhance children's lives and empower them to become responsible, productive adults. To learn more, visit us online at www.cscbroward.org and on social media at www.facebook.com/cscbroward; www.twitter.com/cscbroward; and www.youtube.com/cscbroward
Children's Services Council of Broward County, Systemic Model of Preventioncscbroward
Research Analyst Laura Ganci and Program Specialist Melissa Stanley of the Children's Services Council of Broward County, hosted a webinar for the Florida Alcohol and Drug Abuse Association on Implementing a Collaborative Approach to Child Welfare.
The Children's Services Council of Broward County provides leadership, advocacy and resources necessary to enhance children's lives and empower them to become responsible, productive adults. To learn more, visit us online at www.cscbroward.org and on social media at www.facebook.com/cscbroward; www.twitter.com/cscbroward; and www.youtube.com/cscbroward
Supported Training & Employment Programs funded by Children's Services Counci...cscbroward
Supported Training & Employment Programs, or STEP, is a year-round program funded by the Children's Services Council of Broward County that offers youth development and employment experience to prepare youth with disabilities for post-secondary education, training and employment opportunities.
Children's Services Council of Broward County October 2014 Council Meetingcscbroward
The Monthly Council Meeting of the Community Services Council of Broward County took place on October 16, 2014. Topics discussed included approving previous meeting minutes and cancelling the December meeting, receiving reports from the President/CEO and CPO on new grants and program adjustments, and committees providing updates on agency capacity and special needs programs. The meeting concluded with public comments, a roundtable discussion, and distribution of information items.
Big data and cloud computing are closely intertwined. The cloud is well-suited to handle big data challenges by providing massive scalability, flexible pay-as-you-go pricing, and removing the undifferentiated heavy lifting of managing infrastructure. This allows companies to focus on analyzing large and complex datasets. Examples show how companies use Amazon Web Services to collect petabytes of data from sources like sensors and social media, process it using services like EMR, and gain insights for applications in various industries.
Matt Wood discusses Amazon Web Services (AWS) and highlights several key services:
- AWS provides on-demand, pay-as-you-go cloud computing services including compute, storage, databases, and messaging.
- Customers can scale their infrastructure capacity up and down automatically based on real-time demand, avoiding over- or under-provisioning.
- New and updated AWS services highlighted include Virtual Private Cloud, CloudFormation for infrastructure automation, Simple Email Service, and Windows 2008 R2 on EC2.
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
Jan 28th, 2013 - 10:00 am
UC Davis
Title: Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Abstract: Large bandwidth provided by today’s networks requires careful evaluation in order to eliminate system overheads and to bring anticipated high performance to the application layer. As a part of the Advance Network Initiative (ANI) project, we have conducted a large number of experiments in the initial evaluation of the 100Gbps network prototype.
We needed intense fine-tuning, both in network and application layers, to take advantage of the higher network capacity. Instead of explicit improvements in every application as we keep changing the underlying link technology, we require novel data movement mechanisms and abstract layers for end-to-end processing of data. Based on our experience in 100Gbps network, we have developed an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel. MemzNet def ines new data access methods in which applications map memory blocks for remote data, in contrast to the send/receive semantics. In one of the early demonstrations of 100Gbps network applications, we used the initial implementation of MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. We observed that MemzNet showed better results in terms of performance and efficiency,
than the current state-of-the-art file-centric data transfer tools for the transfer of climate datasets with many small files. In this talk, I will mainly describe our experience in 100Gbps tests and present results from the 100Gbps demonstration. I will briefly explain the ANI testbed environment and highlight future research plans.
Bio: Mehmet Balman is a researcher working as a computer engineer in the Computational Research Division at Lawrence Berkeley National Laboratory. His recent work
particularly deals with efficient data transfer mechanisms, high-performance network protocols, bandwidth reservation, network virtualization, scheduling and resource management for large-scale applications. He received his doctoral degree in computer science from Louisiana State University (LSU) in 2010. He has several years of industrial experience as system administrator and R&D specialist, at various software companies before joining LSU. He also worked as a summer intern in Los Alamos National Laboratory.
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could use. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
- Cloud computing platforms like Amazon Web Services allow companies to focus on innovation rather than infrastructure maintenance by providing scalable, pay-as-you-go cloud services.
- Amazon's cloud services like EC2, S3, and SQS were used to build GrepTheWeb, a distributed text search service that can quickly search very large datasets by distributing work across elastic compute resources.
- GrepTheWeb coordinates distributed processing using SQS, stores input files in S3, runs jobs on EC2 instances, and stores results in SimpleDB to provide fast, scalable text searches without having to manage physical infrastructure.
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
The document discusses Mehmet Balman's work on network-aware data management for large-scale distributed applications. It provides background on Balman, including his employment at VMware and affiliations. The presentation outline discusses VSAN and VVOL storage performance in virtualized environments, data streaming in high-bandwidth networks, the Climate100 100Gbps networking demo, and other topics related to network-aware data management.
The document discusses real-time web analytics company LiveStats' transition from conventional hosting to Amazon Web Services (AWS) cloud hosting. It provides reasons for choosing AWS like flexibility, scalability, and pay-as-you-use pricing. It also discusses challenges of moving to the cloud but advantages like full control and lower barriers to entry. The document outlines LiveStats' architecture on AWS including load balancing, auto-scaling, and decoupling services, and how they monitor systems and implement best practices like scaling only when needed.
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
The Open Cloud Consortium operates the Open Science Data Cloud, a not-for-profit cloud computing infrastructure that supports scientific research. The Open Cloud Consortium manages cloud computing testbeds and resources donated by universities, companies, government agencies, and international partners. Its goal is to democratize access to data and computing power for scientific discovery through its Open Science Data Cloud.
Architecting Virtualized Infrastructure for Big DataRichard McDougall
This document discusses architecting virtualized infrastructure for big data. It notes that data is growing exponentially and that the value of data now exceeds hardware costs. It advocates using virtualization to simplify and optimize big data infrastructure, enabling flexible provisioning of workloads like Hadoop, SQL, and NoSQL clusters on a unified analytics cloud platform. This platform leverages both shared and local storage to optimize performance while reducing costs.
The emergence of computing clouds has put a renewed emphasis on the issue of scale in computing. The enormous size of the Web,
together with ever-more demanding requirements such as freshness (results in seconds, not weeks) means that massive
resources are required to handle enormous datasets in a timely fashion. Datacenters are now considered to be the new units of
computer power, e.g. Google's Warehouse-Scale Computer. The number of organizations able to deploy such resources is ever shrinking. Wowd aims to demonstrate that there is an even bigger
scale of computing than that yet imagined -- specifically -- planetary-sized distributed clouds. Such clouds can be deployed by motivated collections of users, instead of a handful of gigantic organizations.
This document discusses strategies for scaling application and database infrastructure to meet increasing demands on an online learning platform. It recommends using cloud-based hardware with load balancers and scaling applications horizontally across Nginx or Apache web servers running PHP or Node.js. For databases, it suggests a sharded MongoDB cluster with 3 replica sets for storage, 3 config servers, and 3 mongos processes, using WiredTiger for up to 3-4x compression compared to MMap. The document also notes upcoming free online courses on xAPI and learning data.
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
The document discusses translating concepts from relational databases to key-value stores. It covers normalizing data to avoid issues like data inconsistencies and loss. While key-value stores don't support relations, transactions, or SQL, the relationships can be composed in the application layer for smaller datasets. Picking the right data for key-value stores involves accessing data primarily by key lookups.
ppbench - A Visualizing Network Benchmark for MicroservicesNane Kratzke
Companies like Netflix, Google, Amazon, Twitter successfully exemplified elastic and scalable microservice architectures for very large systems. Microservice architectures are often realized in a way to deploy services as containers on container clusters. Containerized microservices often use lightweight and REST-based mechanisms. However, this lightweight communication is often routed by container clusters through heavyweight software defined networks (SDN). Services are often implemented in different programming languages adding additional complexity to a system, which might end in decreased performance. Astonishingly it is quite complex to figure out these impacts in the upfront of a microservice design process due to missing and specialized benchmarks. This contribution proposes a benchmark intentionally designed for this microservice setting. We advocate that it is more useful to reflect fundamental design decisions and their performance impacts in the upfront of a microservice architecture development and not in the aftermath. We present some findings regarding performance impacts of some TIOBE TOP 50 programming languages (Go, Java, Ruby, Dart), containers (Docker as type representative) and SDN solutions (Weave as type representative).
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
This chapter discusses different methods for processing large amounts of data across distributed systems. It introduces MapReduce as a programming model used by Google to process vast amounts of data across thousands of servers. MapReduce allows for distributed processing of large datasets by dividing work into independent tasks (mapping) and collecting/aggregating the results (reducing). The chapter also discusses scaling computation by launching many independent virtual machines and assigning tasks via a messaging queue. Overall it provides an overview of approaches for parallel and distributed processing of big data across cloud infrastructures.
This document discusses cloud computing research and examples of using cloud infrastructure for various tasks. It provides an overview of key cloud concepts like elastic on-demand infrastructure, pay-as-you-go pricing, and data processing examples using Amazon Web Services. Specific research examples discussed include genome analysis, biomarker data warehousing, and high energy physics simulations. Security aspects of cloud computing like shared responsibility models and access controls are also summarized.
The magnitude of data being stored and processed in the Cloud is quickly increasing due to advancements in areas that rely on cloud computing, e.g. Big Data, Internet of Things and mobile code offloading. Concurrently, cloud services are getting more global and geographically distributed. To handle such changes in its usage scenario, the Cloud needs to transform into a completely decentralized, federated and ubiquitous environment similar to the historical transformation of the Internet. Indeed, research ideas for the transformation has already started to emerge including but not limited to Cloud Federations, Multi-Clouds, Fog Computing, Edge Computing, Cloudlets, Nano data centers, etc.
Standardization and resource management come up as the most significant issues for the realization of the distributed cloud paradigm. The focus in this thesis is the latter: efficient management of limited computing and network resources to adapt to the decentralization. Specifically, cloud services that consist of several virtual machines, dedicated network connections and databases are mapped to a multi-provider, geographically distributed and dynamic cloud infrastructure. The objective of the mapping is to improve quality of service in a cost-effective way. To that end; network latency and bandwidth as well as the cost of storage and computation are subjected to a multi-objective optimization.
The first phase of the resource mapping optimization is the topology mapping. In this phase, the virtual machines and network connections (i.e. the virtual cluster) of the cloud service are mapped to the physical cloud infrastructure. The hypothesis is that mapping the virtual cluster to a group of data centers with a similar topology would be the optimal solution.
Replication management is the second phase where the focus is on the data storage. Data objects that constitute the database are replicated and mapped to the storage as a service providers and end devices. The hypothesis for this phase is that an objective function adapted from the facility location problem optimizes the replica placement.
Detailed experiments under real-world as well as synthetic workloads prove that the hypotheses of the both phases are true.
SQL Azure Database provides SQL Server database technology as a cloud service, addressing issues with on-premises databases like high maintenance costs and difficulty achieving high availability. It allows databases to automatically scale out elastically with demand. SQL Azure Database uses multiple physical replicas of a single logical database to provide automatic fault tolerance and high availability without complex configuration. Developers can access SQL Azure using standard SQL client libraries and tools from any application.
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
Distributed Interactive Engineering Toolbox (DIET) is a middleware for distributed computing that provides a simple interface for solving computationally intensive problems across heterogeneous platforms. It uses a client-agent-server model and plug-in schedulers to optimize resource usage and performance. DIET has been deployed on large supercomputing platforms like Grid'5000 and has been used for applications in fields like cosmology, climatology, robotics, and bioinformatics.
More Related Content
Similar to Using Cloud Computing to Share Curiosity's Landing
Matt Wood discusses Amazon Web Services (AWS) and highlights several key services:
- AWS provides on-demand, pay-as-you-go cloud computing services including compute, storage, databases, and messaging.
- Customers can scale their infrastructure capacity up and down automatically based on real-time demand, avoiding over- or under-provisioning.
- New and updated AWS services highlighted include Virtual Private Cloud, CloudFormation for infrastructure automation, Simple Email Service, and Windows 2008 R2 on EC2.
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
Jan 28th, 2013 - 10:00 am
UC Davis
Title: Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Abstract: Large bandwidth provided by today’s networks requires careful evaluation in order to eliminate system overheads and to bring anticipated high performance to the application layer. As a part of the Advance Network Initiative (ANI) project, we have conducted a large number of experiments in the initial evaluation of the 100Gbps network prototype.
We needed intense fine-tuning, both in network and application layers, to take advantage of the higher network capacity. Instead of explicit improvements in every application as we keep changing the underlying link technology, we require novel data movement mechanisms and abstract layers for end-to-end processing of data. Based on our experience in 100Gbps network, we have developed an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel. MemzNet def ines new data access methods in which applications map memory blocks for remote data, in contrast to the send/receive semantics. In one of the early demonstrations of 100Gbps network applications, we used the initial implementation of MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. We observed that MemzNet showed better results in terms of performance and efficiency,
than the current state-of-the-art file-centric data transfer tools for the transfer of climate datasets with many small files. In this talk, I will mainly describe our experience in 100Gbps tests and present results from the 100Gbps demonstration. I will briefly explain the ANI testbed environment and highlight future research plans.
Bio: Mehmet Balman is a researcher working as a computer engineer in the Computational Research Division at Lawrence Berkeley National Laboratory. His recent work
particularly deals with efficient data transfer mechanisms, high-performance network protocols, bandwidth reservation, network virtualization, scheduling and resource management for large-scale applications. He received his doctoral degree in computer science from Louisiana State University (LSU) in 2010. He has several years of industrial experience as system administrator and R&D specialist, at various software companies before joining LSU. He also worked as a summer intern in Los Alamos National Laboratory.
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could use. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
- Cloud computing platforms like Amazon Web Services allow companies to focus on innovation rather than infrastructure maintenance by providing scalable, pay-as-you-go cloud services.
- Amazon's cloud services like EC2, S3, and SQS were used to build GrepTheWeb, a distributed text search service that can quickly search very large datasets by distributing work across elastic compute resources.
- GrepTheWeb coordinates distributed processing using SQS, stores input files in S3, runs jobs on EC2 instances, and stores results in SimpleDB to provide fast, scalable text searches without having to manage physical infrastructure.
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
The document discusses Mehmet Balman's work on network-aware data management for large-scale distributed applications. It provides background on Balman, including his employment at VMware and affiliations. The presentation outline discusses VSAN and VVOL storage performance in virtualized environments, data streaming in high-bandwidth networks, the Climate100 100Gbps networking demo, and other topics related to network-aware data management.
The document discusses real-time web analytics company LiveStats' transition from conventional hosting to Amazon Web Services (AWS) cloud hosting. It provides reasons for choosing AWS like flexibility, scalability, and pay-as-you-use pricing. It also discusses challenges of moving to the cloud but advantages like full control and lower barriers to entry. The document outlines LiveStats' architecture on AWS including load balancing, auto-scaling, and decoupling services, and how they monitor systems and implement best practices like scaling only when needed.
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
The Open Cloud Consortium operates the Open Science Data Cloud, a not-for-profit cloud computing infrastructure that supports scientific research. The Open Cloud Consortium manages cloud computing testbeds and resources donated by universities, companies, government agencies, and international partners. Its goal is to democratize access to data and computing power for scientific discovery through its Open Science Data Cloud.
Architecting Virtualized Infrastructure for Big DataRichard McDougall
This document discusses architecting virtualized infrastructure for big data. It notes that data is growing exponentially and that the value of data now exceeds hardware costs. It advocates using virtualization to simplify and optimize big data infrastructure, enabling flexible provisioning of workloads like Hadoop, SQL, and NoSQL clusters on a unified analytics cloud platform. This platform leverages both shared and local storage to optimize performance while reducing costs.
The emergence of computing clouds has put a renewed emphasis on the issue of scale in computing. The enormous size of the Web,
together with ever-more demanding requirements such as freshness (results in seconds, not weeks) means that massive
resources are required to handle enormous datasets in a timely fashion. Datacenters are now considered to be the new units of
computer power, e.g. Google's Warehouse-Scale Computer. The number of organizations able to deploy such resources is ever shrinking. Wowd aims to demonstrate that there is an even bigger
scale of computing than that yet imagined -- specifically -- planetary-sized distributed clouds. Such clouds can be deployed by motivated collections of users, instead of a handful of gigantic organizations.
This document discusses strategies for scaling application and database infrastructure to meet increasing demands on an online learning platform. It recommends using cloud-based hardware with load balancers and scaling applications horizontally across Nginx or Apache web servers running PHP or Node.js. For databases, it suggests a sharded MongoDB cluster with 3 replica sets for storage, 3 config servers, and 3 mongos processes, using WiredTiger for up to 3-4x compression compared to MMap. The document also notes upcoming free online courses on xAPI and learning data.
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
The document discusses translating concepts from relational databases to key-value stores. It covers normalizing data to avoid issues like data inconsistencies and loss. While key-value stores don't support relations, transactions, or SQL, the relationships can be composed in the application layer for smaller datasets. Picking the right data for key-value stores involves accessing data primarily by key lookups.
ppbench - A Visualizing Network Benchmark for MicroservicesNane Kratzke
Companies like Netflix, Google, Amazon, Twitter successfully exemplified elastic and scalable microservice architectures for very large systems. Microservice architectures are often realized in a way to deploy services as containers on container clusters. Containerized microservices often use lightweight and REST-based mechanisms. However, this lightweight communication is often routed by container clusters through heavyweight software defined networks (SDN). Services are often implemented in different programming languages adding additional complexity to a system, which might end in decreased performance. Astonishingly it is quite complex to figure out these impacts in the upfront of a microservice design process due to missing and specialized benchmarks. This contribution proposes a benchmark intentionally designed for this microservice setting. We advocate that it is more useful to reflect fundamental design decisions and their performance impacts in the upfront of a microservice architecture development and not in the aftermath. We present some findings regarding performance impacts of some TIOBE TOP 50 programming languages (Go, Java, Ruby, Dart), containers (Docker as type representative) and SDN solutions (Weave as type representative).
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
This chapter discusses different methods for processing large amounts of data across distributed systems. It introduces MapReduce as a programming model used by Google to process vast amounts of data across thousands of servers. MapReduce allows for distributed processing of large datasets by dividing work into independent tasks (mapping) and collecting/aggregating the results (reducing). The chapter also discusses scaling computation by launching many independent virtual machines and assigning tasks via a messaging queue. Overall it provides an overview of approaches for parallel and distributed processing of big data across cloud infrastructures.
This document discusses cloud computing research and examples of using cloud infrastructure for various tasks. It provides an overview of key cloud concepts like elastic on-demand infrastructure, pay-as-you-go pricing, and data processing examples using Amazon Web Services. Specific research examples discussed include genome analysis, biomarker data warehousing, and high energy physics simulations. Security aspects of cloud computing like shared responsibility models and access controls are also summarized.
The magnitude of data being stored and processed in the Cloud is quickly increasing due to advancements in areas that rely on cloud computing, e.g. Big Data, Internet of Things and mobile code offloading. Concurrently, cloud services are getting more global and geographically distributed. To handle such changes in its usage scenario, the Cloud needs to transform into a completely decentralized, federated and ubiquitous environment similar to the historical transformation of the Internet. Indeed, research ideas for the transformation has already started to emerge including but not limited to Cloud Federations, Multi-Clouds, Fog Computing, Edge Computing, Cloudlets, Nano data centers, etc.
Standardization and resource management come up as the most significant issues for the realization of the distributed cloud paradigm. The focus in this thesis is the latter: efficient management of limited computing and network resources to adapt to the decentralization. Specifically, cloud services that consist of several virtual machines, dedicated network connections and databases are mapped to a multi-provider, geographically distributed and dynamic cloud infrastructure. The objective of the mapping is to improve quality of service in a cost-effective way. To that end; network latency and bandwidth as well as the cost of storage and computation are subjected to a multi-objective optimization.
The first phase of the resource mapping optimization is the topology mapping. In this phase, the virtual machines and network connections (i.e. the virtual cluster) of the cloud service are mapped to the physical cloud infrastructure. The hypothesis is that mapping the virtual cluster to a group of data centers with a similar topology would be the optimal solution.
Replication management is the second phase where the focus is on the data storage. Data objects that constitute the database are replicated and mapped to the storage as a service providers and end devices. The hypothesis for this phase is that an objective function adapted from the facility location problem optimizes the replica placement.
Detailed experiments under real-world as well as synthetic workloads prove that the hypotheses of the both phases are true.
SQL Azure Database provides SQL Server database technology as a cloud service, addressing issues with on-premises databases like high maintenance costs and difficulty achieving high availability. It allows databases to automatically scale out elastically with demand. SQL Azure Database uses multiple physical replicas of a single logical database to provide automatic fault tolerance and high availability without complex configuration. Developers can access SQL Azure using standard SQL client libraries and tools from any application.
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
Distributed Interactive Engineering Toolbox (DIET) is a middleware for distributed computing that provides a simple interface for solving computationally intensive problems across heterogeneous platforms. It uses a client-agent-server model and plug-in schedulers to optimize resource usage and performance. DIET has been deployed on large supercomputing platforms like Grid'5000 and has been used for applications in fields like cosmology, climatology, robotics, and bioinformatics.
Similar to Using Cloud Computing to Share Curiosity's Landing (20)
David Loureiro - Presentation at HP's HPC & OSL TES
Using Cloud Computing to Share Curiosity's Landing
1. Using Cloud Computing to Share
Curiosity’s Landing with the World
Jonathan Chiang
IT Chief Engineer
2. Agenda
What is the Cloud to JPL?
Requirements for Curiosity Landing
Re-engineered for the Cloud
Comparison with Mars Exploration Rovers
3. What is the Cloud to JPL?
o The cloud is a natural
extension of our own data
centers
o The cloud can be as secure or
more secure than some of our
own data centers
o The landscape is changing –
we continuously monitor new
cloud providers, technologies
and services
4. Outreach for the Curiosity Landing
Requirements for sharing the
landing with the world
A robust Content Distribution
Network (CDN)
Scalability and elasticity of
resources
High availability and resiliency
5. Why Amazon Web Services?
CloudFront - Content Delivery
Network
EC2 – Elastic Compute Cloud
S3 – Simple Storage Service
EBS – Elastic Block Storage
RDS – Relational Database
Service
Route53 – Dynamic DNS
7. Comparison with MER
MER, 0
.7
Peak Throughput (Gbps)
MSL, 1
50 MER, 3
1
MSL, 15
4
Total Data Served (TB)
8. Comparison with MER
MSL MER Increase
Total Data Served (TB) 154 31 5x
Streaming 123 -
Mars Sites 9 -
Eyes on the Solar System 22 -
Peak request rate / sec
80
Peak throughput (Gbps)
150 0.7 214x
Peak hits per minute (M) 8
Peak hits per hour (M)
50
Introduction – Jonathan Chiang IT Chief Engineer and Technical Project Manager for Mars Public Outreach Web ApplicationsWith 2.5 Billion dollars, 8 years of preparation, and the future of JPL at stake, we knew the world would be watching the night of August 6, 2012. This was also the most technologically advanced robotic probe ever imagined. The landing sequence consisted of techniques that had never been attempted before and were fraught with risk – especially during the final 7 minutes it took for the rover to descend through the Martian atmosphere and land on the surface of Mars. We knew from the our previous experience with the Mars Exploration Rovers (Spirit and Opportunity) hundreds of thousands of fascinated people would be visiting our websites. The experience with MER was highly successful, but costly. That was also nearly 10 years ago. Since then, the Internet immensely largely due to mobile devices and global connectivity. Events like the final Space Shuttle launch brought millions of concurrent visitors to the nasa.gov web page. We had to prepare for countless challenges and knew that the answer was in the cloud!
Three years ago, our CIO Jim Rinaldi, gave my team and I the mandate to stop buying servers and storage. No more hardware! We took this as a challenge to explore how to utilize different cloud vendors, leverage their capabilities, and understand costs to develop and deploy JPL applications. What is the cloud to JPL? To us, the cloud is a natural extension of our data centers. It provides us seemingly endless compute and storage capabilities. It offers a level of availability and resiliency that would be impossible to duplicate. The cloud also extends the JPL network to the edges of the globe with geographically dispersed compute, storage, and networks. And if done correctly, it does all of this at a fraction of traditional IT costs. Our missions are utilizing the cloud for embarrassingly parallel compute jobs, image processing, content and software distribution, multimedia… the list continues to grow. We also believe with great confidence that the cloud can be as secure or more secure that some of our own data centers. In the past year, JPL has been focused on operationalizing the cloud. These efforts include implementing additional security controls, large scale configuration and property management capabilities, and improved forensics and auditing capabilities. In many cases, we have more insight into what is running in the cloud than in our own data centers. Currently, JPL has granted Authorites to Operate in three different cloud venues. Presently, we are the only FFRDC or NASA Center who have granted ATOs to run workloads in the cloud.But we can’t limit ourselves to three cloud venues. The cloud landscape is constantly changing so we continue to evaluate new providers, technologies and services. With leadership from our CTO, Tom Soderstrom, we created the Cloud Computing Commodity Board at JPL whose mission is to evaluate and rapidly on-board new cloud vendors so JPL missions can gain benefit at the time they need it most.
So it was an obvious decision to utilize the cloud to deliver to the world the multimedia content of our greatest endeavor yet: landing curiosity on mars. We had some very challenging requirementsThe media being presented would be consumed by millions of people around the globe. In order to increase performance and reduce immense loads on our infrastructure, we needed a robust Content Distribution Network to help deliver our message. The infrastructure needed to highly scalable and elastic. We would only use the capacity we needed for landing night, then gradually reduce our infrastructure to meet demand. The solution would also need to provide very large storage capacity for images from the spacecraft, telemetry data, and hi-resolution video streaming.We also had to prepare for the unimaginable. The availability, scalability, and performance of the mars websites was of the utmost importance, especially during the landing event
We evaluated a number of cloud providers, but chose Amazon based on its capabilities and cost. Currently, Amazon by far offers the most robust set of tools for engineers and application developers. AWS is available in multiple regions in the US, Europe, Asia, and South America. This allowed us to design a highly available system that is geographically dispersed in the event of outages. Another great feature is that all AWS services can be accessed over HTTP using REST and SOAP protocols. All AWS services are also utility based, providing the elasticity we need to optimize cost. Here’s a brief overview on how these services were assembled to meet our requirements.Amazon CloudFront, a content delivery network (CDN) for distributing objects to so-called "edge locations" near the requester.Allowed us to extend our static content – videos, images, downloads, etc., to storage resources in Europe, Asia, and South AmericaAmazon Elastic Compute Cloud (EC2) provides scalable virtual private servers using Xen.EC2 provided us highly scalable server and networking capabilities to implement our infrastructure and content platformAmazon Simple Storage Service (S3) provides Web Service based storage.We utilized S3 to store and serve static content including images, telemetry, videos. This reduced a great amount of load from our system since a large portion of our visits were for newest images and videos.Amazon Elastic Block Store (EBS) provides persistent block-level storage volumes for EC2.Amazon Relational Database Service (RDS) provides a scalable database server with MySQLAmazon Route 53 provides a highly available and scalable Domain Name System (DNS) web service.Allowed us to dynamically route traffic to AWS regions in the event of outages
The legacy mars outreach websites were built long before the advent of cloud computing. It was designed to be hosted on physical servers, attached to local storage, and database servers in the same data center. To meet the paradigm shift, we leveraged Cloud Design Principles to re-engineer cloud system.To avoid large licensing costs in a scalable and elastic environment, we had to port the legacy software to the open source equivalent of ColdFusion – Railo. We also had to migrate to a file system that could be distributed to a cloud scale – so we chose Gluster, an open source file storage software platform.We utilized to dynamically scale our MySQL databases – and did so during the landing eventWe utilized cloudfront and S3 buckets to distribute our static content, and elastic load balancers and a large farm of application servers to deliver our dynamic content