Bursting on-premise analytic workloads to Amazon EMR using Alluxio

•

0 likes•63 views

Data Orchestration Summit 2020 organized by Alluxio https://www.alluxio.io/data-orchestration-summit-2020/ Bursting on-premise analytic workloads to Amazon EMR using Alluxio Roy Hasson, AWS About Alluxio: alluxio.io Engage with the open source community on slack: alluxio.io/slack

Software

Bursting on-premise analytics workloads
to Amazon EMR using
Roy Hasson
Principal Analytics Specialist
LinkedIn: /in/royhasson
Twitter: royhasson

© 2020, Amazon Web Services, Inc. or its Affiliates.
Customers want more value from their data
Growing
exponentially
From new
sources
Increasingly
diverse
Used by
many people
Analyzed by
many applications

© 2020, Amazon Web Services, Inc. or its Affiliates.
On-premise Hadoop is rigid and costly
Difficult to
integrate with
latest tech
Costly to
maintain and
scale
Difficult to
manage and
upgrade
Inhibits rapid
experimentation

© 2020, Amazon Web Services, Inc. or its Affiliates.
Devices Web Sensors Social
Hadoop Silo
Business
Intelligence
Machine
learning
BI +
analyticsData
warehousing
Data lakes
Open file formats
Central
catalog/governance
Modernization is a journey
On-premise
sources

© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy

© 2020, Amazon Web Services, Inc. or its Affiliates.
Optimized Runtime
Runtime built on a optimized version of Spark
Best performance
• 2.6x faster than Spark on EMR without runtime
• 1.6x faster than 3rd party Managed Spark (with their
runtime)
Lowest price
• 1/10th the cost of 3rd party Managed Spark (with their
runtime)
100% compliant with Spark API’s
*Based on TPC-DS 3TB Benchmarking running 6 node C4x8
extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 10,000 20,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds - lower is better)

© 2020, Amazon Web Services, Inc. or its Affiliates.
Managed scaling improves ease of use
Automatically scale cluster to meet workload demand in < 10sec
and save up to 60% on cost
Requested Resize

© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
How to approach modernization

© 2020, Amazon Web Services, Inc. or its Affiliates.
3 key modernization approaches
Lift & Shift Rearchitect Hybrid
Less time and
effort
Gain maximum
value from cloud
Burst the new,
rearchitect the
old

© 2020, Amazon Web Services, Inc. or its Affiliates.
Burst workloads to Amazon EMR
Hive Metastore
Hadoop compute
& HDFS storage
On-premise cluster
CatalogServiceUnifiedFS
Amazon S3
AWS Glue
Data Catalog
Automatic sync
Move on-demand
Amazon EMR

© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon EMR on AWS Outpost
• Ideal for
• Highly sensitive data and workloads
• Edge computing of high volume data (AV)
• Same user experience as on the cloud
• Simple to manage and stay on latest version
Launch data applications on-premise using
EMR for AWS Outpost

© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
The future state

© 2020, Amazon Web Services, Inc. or its Affiliates.
The Lake House – Integrated and simple to use
Key Benefits
• Unified analytics experience
• Managed and governed
• Scalable & Elastic
• Flexible & Agile
• Cost effective
• Easy to use
Amazon S3
AWS Glue Data Catalog
AWS Lake Formation
Secure access layer
Amazon Redshift Amazon EMRAmazon AthenaAmazon SageMaker
Single source of truth for metadata
Single source of truth for data
Central governance and authorization
Amazon QuickSight
BI & Visualization
SageMaker Studio
Unified Data & ML
Experience
AWS Glue Studio / DataBrew
Visual ETL / Data Prep
Federation & caching
AWS Data Exchange

© 2020, Amazon Web Services, Inc. or its Affiliates.
What did we learn
• Modernization is a journey – Burst to get value quicker
• Amazon EMR – Fully managed service to run big data workloads
• Amazon EMR + Alluxio – Makes bursting big data workloads easier
• Lake House – Future state architecture combining pace of
innovation, separation of concerns, elasticity, portability and cost.

© 2020, Amazon Web Services, Inc. or its Affiliates.
Thank you and Q&A
Roy Hasson
Principal Analytics Specialist
LinkedIn: /in/royhasson
Twitter: royhasson
https://www.alluxio.io/products/aws/

Considering new ways and options for reducing operational costs and scaling flexibility of your Apache Hadoop/Spark? Try migrating to Amazon EMR! On-premises Apache Hadoop/Spark clusters are among the top sources of financial pressure for businesses. IT organizations want to reduce spend while still meeting demand, to keep their legacy data applications up and running. Come and learn from experts at Provectus & AWS how you can use Amazon EMR to start driving cost efficiencies in your organization! Agenda - Hadoop market and cost optimizations using Amazon EMR - Cost related and other challenges of on-prem Hadoop clusters - Cost optimizations by using Amazon EMR and migration best practices Intended audience Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, and developers Presenters - Stepan Pushkarev, Chief Technology Officer, Provectus - Pritpal Sahota, Technical Account Manager, Provectus - Nirav Shah, Senior Solutions Architect, AWS - Perry Peterson, Business Development Manager, AWS Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions! REQUEST WEBINAR: https://provectus.com/cost-optimization-for-apache-hadoop-spark-workloads-with-amazon-emr-june-2020/

Apache Spark and the Hadoop Ecosystem on AWS

Amazon Web Services

BigData: AWS RedShift with S3, EC2

Paulraj Pappaiah

Amazon Aurora and AWS Database Migration Service

Amazon Web Services

Amazon Aurora is a MySQL and PostgreSQL compatible relational database built for the cloud, that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. AWS Database Migration Service helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. In this session, we explore features of Amazon Aurora and demonstrate database migration using the AWS Database Migration Service.

Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...

Amazon Web Services

Big data with amazon EMR - Pop-up Loft Tel Aviv

Amazon Web Services

Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; Deployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively.

"Amgen discovers, develops, manufactures, and delivers innovative human therapeutics, helping millions of people in the fight against serious illnesses. In 2014, Amgen implemented a solution to offload ETL data across a diverse data set (U.S. pharmaceutical prescriptions and claims) using Amazon EMR. The solution has transformed the way Amgen delivers insights and reports to its sales force. To support Amgen’s entry into a much larger market, the ETL process had to scale to eight times its existing data volume. We used Amazon EC2, Amazon S3, Amazon EMR, and Amazon Redshift to generate weekly sales reporting metrics. This session discusses highlights in Amgen's journey to leverage big data technologies and lay the foundation for future growth: benefits of ETL offloading in Amazon EMR as an entry point for big data technologies; benefits and challenges of using Amazon EMR vs. expanding on-premises ETL and reporting technologies; and how to architect an ETL offload solution using Amazon S3, Amazon EMR, and Impala."

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

Amazon Web Services

PASS 17: RDS SQL Server on Amazon Web Services Overview

Amazon Web Services

AWS May Webinar Series - Getting Started with Amazon EMR

Amazon Web Services

If you are interested to know more about AWS Chicago Summit, please use the following to register: http://amzn.to/1RooPPL Many AWS customers store vast amounts of data in Amazon S3, a low cost, scalable, and durable object store; Amazon DynamoDB, a NoSQL database; or Amazon Kinesis, a real time data stream processing service. With large datasets in various AWS services, how do you derive value from this information in a cost-effective way? Using Amazon Elastic MapReduce (Amazon EMR) with applications in the Apache Hadoop ecosystem, you can directly interact with data in each of these storage services for scalable analytics workloads or ad hoc queries. You can quickly and easily launch an Amazon EMR cluster from the AWS Management Console, and scale your cluster to match the compute and memory resources needed for your workflow, independent from the storage capacity used in your AWS storage services. The webinar will accelerate your use of Amazon EMR by showing you how to create and monitor Amazon EMR clusters, and provide several use cases and architectures for using Amazon EMR with different AWS data stores. Learning Objectives: • Recognize when to use Amazon EMR • Understand the steps required to set up and monitor an Amazon EMR cluster • Architect applications that effectively use Amazon EMR • Understand how to use HUE for ad hoc query of data in Amazon S3 Who Should Attend: • Developers, LOB owners, Continuous Integration & Continuous Delivery (CICD) practitioners

Athena & Glue

Amazon Web Services

Apache Spark and the Hadoop Ecosystem on AWS

Amazon Web Services

Amazon EMR Deep Dive & Best Practices

Amazon Web Services

Getting Started with Big Data and HPC in the Cloud - August 2015

Amazon Web Services

How can you use Big Data to grow your business and discover new opportunities? When organizations effectively capture, analyze, visualize and apply big data insights to their business goals, they differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. With Amazon Web Services, businesses and researchers can easily fulfill their high performance computing (HPC) requirements with the added benefit of ad-hoc provisioning, pay-as-you-go pricing and faster time-to-results. Join this session to understand how to run HPC applications in AWS cloud, and about different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.

What’s New in Amazon Aurora

Amazon Web Services

by Joyjeet Banerjee, Enterprise Solutions Architect, AWS Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300

AWS-Enabled Disaster Recovery and Business Continuity for SIFIs

Amazon Web Services

AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry. Speaker: Felix Candelario, Global Accounts Solutions Architect, AWS

AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS

Amazon Web Services

New Database Migration Services & RDS Updates

Amazon Web Services

AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...

Amazon Web Services

Interactively Querying Large-scale Datasets on Amazon S3

Amazon Web Services

Organizations often need to quickly analyze large amounts of data, such as logs generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes. In this session you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using standard ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.

Building a Server-less Data Lake on AWS - Technical 301

Amazon Web Services

Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016

Amazon Web Services

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Web Services

"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."

Migrating Oracle Databases to AWS

AWS Germany

What's New with Big Data Analytics

Amazon Web Services

What's New with Big Data Analytics 亞馬遜 AWS 於 2018 年 11 月底在美國拉斯維加斯所舉辦的第七屆 AWS re:Invent 2018 大會，在 AWS 客戶、合作夥伴、媒體人士、產業分析師及 AWS 員工共襄盛舉下，與會人數再創新高，超過 5 萬人。會中 AWS 發布超過 20 款雲端方案，且一半以上專攻雲端 AI、機器學習、物聯網，包括對 SageMaker 強化更多進階功能，推出第一款專用的機器學習推論晶片、加入深度的機器學習運算法支援，及其他包括儲存、資料庫、混合雲、邊緣運算 IoT 等解決方案。而具備微型機器學習能力的迷你自駕遙控車 DeepRacer 的現身，驚人之舉不僅抓人眼球，深入客戶體驗的用心，更成功抓住全球使用者的心。為讓您與全球先進技術同步，共享最新趨勢資訊，解決您開發機器學習和發展 AIoT 所遇到的難題，AWS 台灣團隊將於 2019 年 1 月 31 日 (四) 舉辦《AWS re:Invent 2018 Recap 台北》，特別嚴選最適切國內諸位先進和企業需求的內容，從「技術創新」、「AIoT」兩大分組議程，發表 AWS 的新服務和新方案。大會除了邀請亞馬遜 AWS 大中華區首席雲計算企業顧問 (Principal Evangelist) 張俠博士分享 AWS 的解決方案藍圖外，眾多 AWS 資深專家也將分享包含機器學習、深度學習推理加速等新方案，完全託管的文件系統、資料庫，無伺服器、容器技術與安全性，以及大數據與分析、物聯網服務應用、儲存方案等最新技術。歡迎您親臨會場，全方位體驗 AWS 新服務將能為您創造的驚人創新之效益。

Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit

Amazon Web Services

Amazon RDS enables you to launch an optimally configured, secure, and highly available relational database with just a few clicks. It provides cost-efficient and resizable capacity while managing time consuming administration tasks, freeing you to focus on your applications and business. In this session, we take a closer look at how Amazon RDS works, and we review best practices to achieve performance, flexibility, and cost savings for your MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server databases on Amazon RDS. We also discuss AWS Database Migration Service, a quick and secure means for migrating your existing relational database management system investments to Amazon RDS.

Cost Optimisation on AWS

Amazon Web Services

Cost Optimisation on AWS

Amazon Web Services

What's hot

(BDT316) Offloading ETL to Amazon Elastic MapReduce

Amazon Web Services

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

Amazon Web Services

PASS 17: RDS SQL Server on Amazon Web Services Overview

Amazon Web Services

AWS May Webinar Series - Getting Started with Amazon EMR

Amazon Web Services

Athena & Glue

Amazon Web Services

Apache Spark and the Hadoop Ecosystem on AWS

Amazon Web Services

Amazon EMR Deep Dive & Best Practices

Amazon Web Services

Getting Started with Big Data and HPC in the Cloud - August 2015

Amazon Web Services

What’s New in Amazon Aurora

Amazon Web Services

AWS-Enabled Disaster Recovery and Business Continuity for SIFIs

Amazon Web Services

AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS

Amazon Web Services

New Database Migration Services & RDS Updates

Amazon Web Services

AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...

Amazon Web Services

Interactively Querying Large-scale Datasets on Amazon S3

Amazon Web Services

Building a Server-less Data Lake on AWS - Technical 301

Amazon Web Services

Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016

Amazon Web Services

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Web Services

Migrating Oracle Databases to AWS

AWS Germany

What's New with Big Data Analytics

Amazon Web Services

Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit

Amazon Web Services

What's hot (20)

(BDT316) Offloading ETL to Amazon Elastic MapReduce

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

PASS 17: RDS SQL Server on Amazon Web Services Overview

AWS May Webinar Series - Getting Started with Amazon EMR

Athena & Glue

Apache Spark and the Hadoop Ecosystem on AWS

Amazon EMR Deep Dive & Best Practices

Getting Started with Big Data and HPC in the Cloud - August 2015

What’s New in Amazon Aurora

AWS-Enabled Disaster Recovery and Business Continuity for SIFIs

AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS

New Database Migration Services & RDS Updates

AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...

Interactively Querying Large-scale Datasets on Amazon S3

Building a Server-less Data Lake on AWS - Technical 301

Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Migrating Oracle Databases to AWS

What's New with Big Data Analytics

Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit

Similar to Bursting on-premise analytic workloads to Amazon EMR using Alluxio

Cost Optimisation on AWS

Amazon Web Services

Cost Optimisation on AWS

Amazon Web Services

Implementazione di una soluzione Data Lake.pdfAmazon Web Services

在 AWS 上構建無服務器分析Amazon Web Services

AWS 資料湖服務Amazon Web Services

Database Freedom. Database migration approaches to get to the Cloud - Marcus ...

Amazon Web Services

Databases are at the heart of many of the software systems we build. Learn how to achieve the benefits of moving your core datastores to the AWS platform, by designing a migration model that retires tech debt and sets the platform for innovation. In this session we'll cover the different database options available on AWS, how you can start migrating your databases to the cloud, and cover tools like the AWS Database Migration Service (DMS) to largely automate this work for you.

NetApp Cloud Data Services & AWS Empower Your Cloud Champions

Amazon Web Services

Agencies have spent years controlling and aligning the appropriate levels of data performance, protection, and security in the data center to support their applications. Now, as they look to the cloud or a hybrid cloud environment, they need to maintain control of their data. Attend this session to discover how NetApp Cloud Data Services, a suite of data-driven services, allows you to migrate and control data across multiple clouds and run critical applications in the cloud. Learn how you can effectively control your NAS and SAN data on AWS Gov storage resources from a single, centralized management console. Build an enterprise data management service on AWS to ensure your critical applications run with the same capabilities as they do on-premises. Establish an end-to-end disaster recovery plan with high availability instances and efficient replication technologies with NetApp on AWS.

BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift

Amazon Web Services

In this session, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Redshift Spectrum, a feature of Redshift that enables you to analyze data across Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos. A customer is joining us to share how they were able to extend their data warehouse to their data lake to encompass multiple data sources and data formats. This modern architecture helps them tie together data sources to get actionable insights across their business units.

APN Live-AWS Core ServicesAmazon Web Services

What's New with Amazon Redshift - ADB202 - Anaheim AWS Summit

Amazon Web Services

Organizations cannot afford a data warehouse that scales slowly or enforces a tradeoff between performance and concurrency. Amazon Redshift scales to provide consistently fast performance with rapidly growing data and high user and query concurrency. In this session, we highlight the available features in Amazon Redshift and those that are coming soon. We discuss how your Amazon Redshift data warehouse and your Amazon S3 data lake enable you to scale storage and compute resources automatically and on demand. We also demo the intelligent maintenance and administration operations that Amazon Redshift performs to ensure your clusters are performant at any scale.

Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...

Amazon Web Services LATAM

Modernizing .NET Applications on AWS (GPSCT204) - AWS re:Invent 2018

Amazon Web Services

Many customers move to the cloud to innovate faster and gain more business agility. In order to recognize these benefits of the cloud, many customers are migrating their .NET applications to AWS, whilst innovating faster by taking advantage of cloud-native services. In this session, we will go through application modernization journey for a .NET application to AWS, and walkthrough Containerization as an option. We also discuss how easy it is for the customers to transform their business applications using AWS while using the familiar Microsoft toolset and workflows.

AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...

Amazon Web Services

Learn how the AWS Marketplace brings together customers who have challenges with ISVs who have solutions to those challenges. See how to use relational and NoSQL technologies on AWS to build enterprise and consumer apps. NBC used MarkLogic to deliver an award-winning app that can handle high traffic levels and unexpected usage spikes. NBC’s popular, Emmy-winning, “SNL 40” was launched to celebrate the 40th anniversary of Saturday Night Live, and delivers four decades of sketches and performances. Hosted on AWS, the app — as well as a browser-based platform — are powered by the MarkLogic Enterprise NoSQL database. Come learn from the team who collaborated on this project how to run your own database on AWS, and how to integrate with Amazon RDS and other data stores. A world-recognized automotive brand needed to deliver real-time response about their worldwide fleet vehicles. You will learn how they used a combination of AWS services and FileMaker Cloud, (an Apple subsidiary, procured through AWS Marketplace) to deliver high-scale dealer-facing applications.

Migrating your IT - AWS Summit Cape Town 2018

Amazon Web Services

Speaker: Diaa Radwan, AWS Level: 300 When migrating applications to the AWS Cloud, it’s important to architect cloud environments that are efficient, secure, and compliant. AWS now offers the simple services of data and applications migration. In this session, we explore ways to cost-effectively reinvent disaster recovery so it can extend to applications and workloads as first steps for migration to AWS cloud. We discuss customer use cases and review the different applications they used with our data migration services to cut their IT expenditures and management time on hardware and backup solutions.

Realize Value, Reduce Costs And Optimize the Value of Your Microsoft Investme...

Amazon Web Services

Enterprises around the world are driving growth through innovation when they run Windows based solutions on the leading cloud platform. In addition, enterprises can significantly reduce total cost of ownership and optimize their costs when they choose AWS to host legacy and 3rd party Microsoft applications optimized for Windows Server and SQL Server by taking advantage of our cutting-edge infrastructure, flexible pricing options and licensing solutions. AWS also offers solutions and programs that empower .NET developers to leverage their skills and tools to continue developing cutting edge solutions. So, whether you are migrating a small application or considering divesting an entire datacenter, AWS can scale and support hosting of Windows solutions that help you run your business today. About the event AWS Transformation Day is designed for enterprise organizations migrating to the cloud to become more responsive, agile and innovative, while staying secure and compliant. Join us for this one-day event and we’ll share our experiences of helping enterprise customers accelerate the pace of migration and adoption of strategic services. Who should attend? This event is recommended for IT and business leaders who are looking to create sustainable benefits and a competitive advantage by using the AWS Cloud. CIOs, CTOs, CISOs, CDOs, CFOs, IT leaders and IT professionals, enterprise developers, business decision makers, and finance executives.

Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks

Amazon Web Services

Realize Value of Your Microsoft Investments - AWS Transformation Day Boston 2018

Amazon Web Services

AWS webinar what is cloud computing 13 09 11

Amazon Web Services

Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...

Amazon Web Services

Aimed at solutions architects and technical managers, this session focuses on the practical ways our customers achieve cost-efficient architectures through service selection and configuration. We start by discussing the building block services. We cover the main trends, such as containers and serverless, and we explore some of the specific services and configurations customers have used. We also take you through real-life examples that can be implemented to minimize costs while driving innovation and business output. After you attend this session, you will understand what is possible on AWS, and you will know ways in which you can deploy new workloads or modify existing workloads for optimization.

Realize Value of Your Microsoft Investments - AWS Transformation Days Raleigh...

Amazon Web Services

Similar to Bursting on-premise analytic workloads to Amazon EMR using Alluxio (20)

Cost Optimisation on AWS

Implementazione di una soluzione Data Lake.pdf

在 AWS 上構建無服務器分析

AWS 資料湖服務

Database Freedom. Database migration approaches to get to the Cloud - Marcus ...

NetApp Cloud Data Services & AWS Empower Your Cloud Champions

BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift

APN Live-AWS Core Services

What's New with Amazon Redshift - ADB202 - Anaheim AWS Summit

Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...

Modernizing .NET Applications on AWS (GPSCT204) - AWS re:Invent 2018

AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...

Migrating your IT - AWS Summit Cape Town 2018

Realize Value, Reduce Costs And Optimize the Value of Your Microsoft Investme...

Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks

Realize Value of Your Microsoft Investments - AWS Transformation Day Boston 2018

AWS webinar what is cloud computing 13 09 11

Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...

Realize Value of Your Microsoft Investments - AWS Transformation Days Raleigh...

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo

Alluxio, Inc.

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Eric Wang (Software Engineer, @Uber) Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how these models work is challenging and demands significant resources to experiment with various training algorithms and feature sets. With ML explainability, the ML team aims to bring transparency to these models, helping to clarify their predictions and behavior. This transparency also assists the operations and legal teams in explaining the reasons behind specific prediction outcomes. In this talk, Eric Wang will discuss the methods Uber used for explaining deep learning models and how we integrated these methods into the Uber AI Michelangelo ecosystem to support offline explaining.

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

Alluxio, Inc.

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Junchen Jiang (Assistant Professor of Computer Science, @University of Chicago) Prefill in LLM inference is known to be resource-intensive, especially for long LLM inputs. While better scheduling can mitigate prefill’s impact, it would be fundamentally better to avoid (most of) prefill. This talk introduces our preliminary effort towards drastically minimizing prefill delay for LLM inputs that naturally reuse text chunks, such as in retrieval-augmented generation. While keeping the KV cache of all text chunks in memory is difficult, we show that it is possible to store them on cheaper yet slower storage. By improving the loading process of the reused KV caches, we can still significantly speed up prefill delay while maintaining the same generation quality.

AI/ML Infra Meetup | Perspective on Deep Learning Framework

Alluxio, Inc.

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

Alluxio, Inc.

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Lu Qiu (Data & AI Platform Tech Lead, @Alluxio) - Siyuan Sheng (Senior Software Engineer, @Alluxio) Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub. In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn: - The data loading challenges hindering GPU utilization - The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT - Real-world examples of boosting model performance and GPU utilization through optimized data access

Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud

Alluxio, Inc.

Alluxio Monthly Webinar May. 14, 2024 For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - ChanChan Mao (Developer Advocate, Alluxio) - Bin Fan (VP of Technology, Alluxio) Running AI/ML workloads in different clouds present unique challenges. The key to a manageable multi-cloud architecture is the ability to seamlessly access data across environments with high performance and low cost. This webinar is designed for data platform engineers, data infra engineers, data engineers, and ML engineers who work with multiple data sources in hybrid or multi-cloud environments. Chanchan and Bin will guide the audience through using Alluxio to greatly simplify data access and make model training and serving more efficient in these environments. You will learn: - How to access data in multi-region, hybrid, and multi-cloud like accessing a local file system - How to run PyTorch to read datasets and write checkpoints to remote storage with Alluxio as the distributed data access layer - Real-world examples and insights from tech giants like Uber, AliPay and more

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Alluxio, Inc.

Alluxio Monthly Webinar Apr. 23, 2024 For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - ChanChan Mao (Developer Advocate, Alluxio) - Shawn Sun (Tech Lead of Cloud Native, Alluxio) Cloud-native model training jobs require fast data access to achieve shorter training cycles. Accessing data can be challenging when your datasets are distributed across different regions and clouds. Additionally, as GPUs remain scarce and expensive resources, it becomes more common to set up remote training clusters from where data resides. This multi-region/cloud scenario introduces the challenges of losing data locality, resulting in operational overhead, latency and expensive cloud costs. In the third webinar of the multi-cloud webinar series, Chanchan and Shawn dive deep into: - The data locality challenges in the multi-region/cloud ML pipeline - Using a cloud-native distributed caching system to overcome these challenges - The architecture and integration of PyTorch/Ray+Alluxio+S3 using POSIX or RESTful APIs - Live demo with ResNet and BERT benchmark results showing performance gains and cost savings analysis

Optimizing Data Access for Analytics And AI with Alluxio

Alluxio, Inc.

Speed Up Presto at Uber with Alluxio Caching

Alluxio, Inc.

Correctly Loading Incremental Data at Scale

Alluxio, Inc.

Alluxio x Tobiko - ETL Happy Hour April 16, 2024 For more Alluxio events: https://alluxio.io/events/ Speaker: Toby Mao (CTO @ Tobiko Data) Writing efficient and correct incremental pipelines is challenging. Data practitioners who take on this challenge are viewed as performing an "advanced" function, which discourages broader teams from adopting incremental loads. In this lightning talk, CTO of Tobiko Data, Toby Mao, will demystify incremental loading data at scale.

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML

Alluxio, Inc.

Big Data Bellevue Meetup March 21, 2024 For more Alluxio events: https://alluxio.io/events/ Speakers: Bin Fan (VP of Open Source, Alluxio) In this presentation, Bin Fan (VP of Open Source @ Alluxio) will address a critical challenge of optimizing data loading for distributed Python applications within AI/ML workloads in the cloud, focusing on popular frameworks like Ray and Hugging Face. Integration of Alluxio’s distributed caching for Python applications is accomplished using the fsspec interface, thus greatly improving data access speeds. This is particularly useful in machine learning workflows, where repeated data reloading across slow, unstable or congested networks can severely affect GPU efficiency and escalate operational costs. Attendees can look forward to practical, hands-on demonstrations showcasing the tangible benefits of Alluxio’s caching mechanism across various real-world scenarios. These demos will highlight the enhancements in data efficiency and overall performance of data-intensive Python applications. This presentation is tailored for developers and data scientists eager to optimize their AI/ML workloads. Discover strategies to accelerate your data processing tasks, making them not only faster but also more cost-efficient.

Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...

Alluxio, Inc.

Alluxio Monthly Webinar Feb. 27, 2024 For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Tarik Bennett (Senior Solutions Engineer, Alluxio) As GenAI and AI continue to transform businesses, scaling these workloads requires optimized underlying infrastructure. A multi-cloud architecture allows organizations to leverage different cloud services to meet diverse workload demands while maximizing efficiency, reducing costs, and avoiding vendor lock-in. However, achieving a multi-cloud vision can be challenging. In this webinar, Tarik will share how an agonistic data layer, like Alluxio, allows you to embrace the separation of storage from compute and simplify the adoption of multi-cloud for AI. - Learn why leveraging multiple cloud providers is critical for balancing performance, scalability, and cost of your AI platform - Discover how an agnostic data layer like Alluxio provides seamless data access in multi-cloud that bridges storage and compute without data replication - Gain insights into real-world examples and best practices for deploying AI across on-prem, hybrid, and multi-cloud environments

Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...

Alluxio, Inc.

Alluxio Monthly Webinar Jan. 30, 2024 For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Kevin Petrie (VP of Research, Eckerson Group) - Omid Razavi (SVP of Customer Success, Alluxio) 2024 is gearing up to be an impactful year for AI and analytics. Join us on January 30, as Kevin Petrie (VP of Research at Eckerson Group) and Omid Razavi (SVP of Customer Success at Alluxio) share key trends that data and AI leaders should know. This event will efficiently guide you with market data and expert insights to drive successful business outcomes. - Assess current and future trends in data and AI with industry experts - Discover valuable insights and practical recommendations - Learn best practices to make your enterprise data more accessible for both analytics and AI applications

Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction

Alluxio, Inc.

Data Infra Meetup Jan. 25, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Juncheng Yang(Ph.D Candidate, @CMU) As a cache eviction algorithm, FIFO has a lot of attractive properties, such as simplicity, speed, scalability, and flash-friendliness. The most prominent criticism of FIFO is its low efficiency (high miss ratio). In this talk, I will describe a simple, scalable FIFO-based algorithm with three static queues (S3-FIFO). Evaluated on 6594 cache traces from 14 datasets, we show that S3- FIFO has lower miss ratios than state-of-the-art algorithms across traces. Moreover, S3-FIFO’s efficiency is robust — it has the lowest mean miss ratio on 10 of the 14 datasets. FIFO queues enable S3-FIFO to achieve good scalability with 6× higher throughput compared to optimized LRU at 16 threads. Our insight is that most objects in skewed workloads will only be accessed once in a short window, so it is critical to evict them early (also called quick demotion). The key of S3-FIFO is a small FIFO queue that filters out most objects from entering the main cache, which provides a guaranteed demotion speed and high demotion precision.

Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge

Alluxio, Inc.

Data Infra Meetup Jan. 25, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Jingwen Ouyang (Product Manager, @Alluxio) In this session, Jingwen presents an overview of using Alluxio Edge caching to accelerate Trino or Presto queries. She offers practical best practices for using distributed caching with compute engines. In addition, this session also features insights from real-world examples.

Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud

Alluxio, Inc.

Data Infra Meetup Jan. 25, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Siyuan Sheng (Senior Software Engineer, @Alluxio) - Chunxu Tang (Research Scientist, @Alluxio) In this session, cloud optimization specialists Chunxu and Siyuan break down the challenges and present a fresh architecture designed to optimize I/O across the data pipeline, ensuring GPUs function at peak performance. The integrated solution of PyTorch/Ray + Alluxio + S3 offers a promising way forward, and the speakers delve deep into its practical applications. Attendees will not only gain theoretical insights but will also be treated to hands-on instructions and demonstrations of deploying this cutting-edge architecture in Kubernetes, specifically tailored for Tensorflow/PyTorch/Ray workloads in the public cloud.

Data Infra Meetup | ByteDance's Native Parquet Reader

Alluxio, Inc.

Data Infra Meetup | Uber's Data Storage Evolution

Alluxio, Inc.

Data Infra Meetup Jan. 25, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Jing Zhao (Principal Engineer, @Uber) Uber builds one of the biggest data lakes in the industry, which stores exabytes of data. In this talk, we will introduce the evolution of our data storage architecture, and delve into multiple key initiatives during the past several years. Specifically, we will introduce: - Our on-prem HDFS cluster scalability challenges and how we solved them - Our efficiency optimizations that significantly reduced the storage overhead and unit cost without compromising reliability and performance - The challenges we are facing during the ongoing Cloud migration and our solutions

Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...

Alluxio, Inc.

Alluxio Monthly Webinar Nov. 15, 2023 For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Tarik Bennett (Senior Solutions Engineer) - Beinan Wang (Senior Staff Engineer & Architect) Many companies are working with development architectures for AI platforms but have concerns about efficiency at scale as data volumes increase. They use centralized cloud data lakes, like S3, to store training data for AI platforms. However, GPU shortages add more complications. Storage and compute can be separate, or even remote, making data loading slow and expensive: 1) Optimizing a developmental setup can include manual copies, which are slow and error-prone 2) Directly transferring data across regions or from cloud to on-premises can incur expensive egress fees This webinar covers solutions to improve data loading for model training. You will learn: - The data loading challenges with distributed infrastructure - Typical solutions, including NFS/NAS on object storage, and why they are not the best options - Common architectures that can improve data loading and cost efficiency - Using Alluxio to accelerate model training and reduce costs

AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...

Alluxio, Inc.

AI Infra Day Oct. 25, 2023 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Adit Madan (Director of Product Management, @Alluxio) In this session, Adit Madan, Director of Product Management at Alluxio, presents an overview of using distributed caching to accelerate model training and serving. He explores the requirements of data access patterns in the ML pipeline and offers practical best practices for using distributed caching in the cloud. This session features insights from real-world examples, such as AliPay, Zhihu, and more.

AI Infra Day | The AI Infra in the Generative AI Era

Alluxio, Inc.

AI Infra Day Oct. 25, 2023 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Bin Fan (Cheif Architect, VP of Open Source, @Alluxio) As the AI landscape rapidly evolves, the advancements in generative AI technologies, such as ChatGPT, are driving a need for a robust AI infra stack. This opening keynote will explore the key trends of the AI infra stack in the generative AI era.

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

AI/ML Infra Meetup | Perspective on Deep Learning Framework

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Optimizing Data Access for Analytics And AI with Alluxio

Speed Up Presto at Uber with Alluxio Caching

Correctly Loading Incremental Data at Scale

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML

Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...

Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...

Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction

Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge

Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud

Data Infra Meetup | ByteDance's Native Parquet Reader

Data Infra Meetup | Uber's Data Storage Evolution

Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...

AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...

AI Infra Day | The AI Infra in the Generative AI Era

Recently uploaded

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

takuyayamamoto1800

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Globus

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

AMB-Review

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos https://www.amb-review.com/tubetrivia-ai Exclusive Features: AI-Powered Questions, Wide Range of Categories, Adaptive Difficulty, User-Friendly Interface, Multiplayer Mode, Regular Updates. #TubeTriviaAI #QuizVideoMagic #ViralQuizVideos #AIQuizGenerator #EngageExciteExplode #MarketingRevolution #BoostYourTraffic #SocialMediaSuccess #AIContentCreation #UnlimitedTraffic

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Juraj Vysvader

Graphic Design Crash Course for beginners

e20449

Quarkus Hidden and Forbidden Extensions

Max Andersen

Globus Connect Server Deep Dive - GlobusWorld 2024

Globus

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

rickgrimesss22

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

How to Position Your Globus Data Portal for Success Ten Good Practices

Globus

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite

Google

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite 👉👉 Click Here To Get More Info 👇👇 https://sumonreview.com/ai-pilot-review/ AI Pilot Review: Key Features ✅Deploy AI expert bots in Any Niche With Just A Click ✅With one keyword, generate complete funnels, websites, landing pages, and more. ✅More than 85 AI features are included in the AI pilot. ✅No setup or configuration; use your voice (like Siri) to do whatever you want. ✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It… ✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again. ✅ZERO Limits On Features Or Usages ✅Use Our AI-powered Traffic To Get Hundreds Of Customers ✅No Complicated Setup: Get Up And Running In 2 Minutes ✅99.99% Up-Time Guaranteed ✅30 Days Money-Back Guarantee ✅ZERO Upfront Cost See My Other Reviews Article: (1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review (2) SocioWave Review: https://sumonreview.com/sociowave-review (3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review (4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review

Globus Compute Introduction - GlobusWorld 2024

Globus

How Recreation Management Software Can Streamline Your Operations.pptx

wottaspaceseo

Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Shahin Sheidaei

Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.

A Comprehensive Look at Generative AI in Retail App Testing.pdf

kalichargn70th171

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...

Anthony Dahanne

Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ? Venez le découvrir lors de cette session ignite

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...

informapgpstrackings

2024 RoOUG Security model for the cloud.pptx

Georgi Kodinov

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

Recently uploaded (20)

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Graphic Design Crash Course for beginners

Quarkus Hidden and Forbidden Extensions

Globus Connect Server Deep Dive - GlobusWorld 2024

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

Cyaniclab : Software Development Agency Portfolio.pdf

How to Position Your Globus Data Portal for Success Ten Good Practices

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite

Globus Compute Introduction - GlobusWorld 2024

How Recreation Management Software Can Streamline Your Operations.pptx

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

A Comprehensive Look at Generative AI in Retail App Testing.pdf

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...

2024 RoOUG Security model for the cloud.pptx

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Bursting on-premise analytic workloads to Amazon EMR using Alluxio

1. Bursting on-premise analytics workloads to Amazon EMR using Roy Hasson Principal Analytics Specialist LinkedIn: /in/royhasson Twitter: royhasson

2. © 2020, Amazon Web Services, Inc. or its Affiliates. Customers want more value from their data Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications

3. © 2020, Amazon Web Services, Inc. or its Affiliates. On-premise Hadoop is rigid and costly Difficult to integrate with latest tech Costly to maintain and scale Difficult to manage and upgrade Inhibits rapid experimentation

4. © 2020, Amazon Web Services, Inc. or its Affiliates. Devices Web Sensors Social Hadoop Silo Business Intelligence Machine learning BI + analyticsData warehousing Data lakes Open file formats Central catalog/governance Modernization is a journey On-premise sources

5. © 2020, Amazon Web Services, Inc. or its Affiliates. Amazon EMR Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS Low cost 50–80% reduction in costs with EC2 Spot and Reserved Instances Per-second billing for flexibility Use S3 storage Process data in S3 securely with high performance using the EMRFS connector Latest versions Updated with latest open source frameworks within 30 days Fully managed no cluster setup, node provisioning, cluster tuning Easy

6. © 2020, Amazon Web Services, Inc. or its Affiliates. Optimized Runtime Runtime built on a optimized version of Spark Best performance • 2.6x faster than Spark on EMR without runtime • 1.6x faster than 3rd party Managed Spark (with their runtime) Lowest price • 1/10th the cost of 3rd party Managed Spark (with their runtime) 100% compliant with Spark API’s *Based on TPC-DS 3TB Benchmarking running 6 node C4x8 extra large clusters and EMR 5.28, Spark 2.4 10,164 16,478 26,478 0 10,000 20,000 30,000 Spark with EMR (with runtime) 3rd party Managed Spark (with their runtime) Spark with EMR (without runtime) Runtime total on 104 queries (seconds - lower is better)

9. © 2020, Amazon Web Services, Inc. or its Affiliates. 3 key modernization approaches Lift & Shift Rearchitect Hybrid Less time and effort Gain maximum value from cloud Burst the new, rearchitect the old

10. © 2020, Amazon Web Services, Inc. or its Affiliates. Burst workloads to Amazon EMR Hive Metastore Hadoop compute & HDFS storage On-premise cluster CatalogServiceUnifiedFS Amazon S3 AWS Glue Data Catalog Automatic sync Move on-demand Amazon EMR

11. © 2020, Amazon Web Services, Inc. or its Affiliates. Amazon EMR on AWS Outpost • Ideal for • Highly sensitive data and workloads • Edge computing of high volume data (AV) • Same user experience as on the cloud • Simple to manage and stay on latest version Launch data applications on-premise using EMR for AWS Outpost

13. © 2020, Amazon Web Services, Inc. or its Affiliates. The Lake House – Integrated and simple to use Key Benefits • Unified analytics experience • Managed and governed • Scalable & Elastic • Flexible & Agile • Cost effective • Easy to use Amazon S3 AWS Glue Data Catalog AWS Lake Formation Secure access layer Amazon Redshift Amazon EMRAmazon AthenaAmazon SageMaker Single source of truth for metadata Single source of truth for data Central governance and authorization Amazon QuickSight BI & Visualization SageMaker Studio Unified Data & ML Experience AWS Glue Studio / DataBrew Visual ETL / Data Prep Federation & caching AWS Data Exchange

14. © 2020, Amazon Web Services, Inc. or its Affiliates. What did we learn • Modernization is a journey – Burst to get value quicker • Amazon EMR – Fully managed service to run big data workloads • Amazon EMR + Alluxio – Makes bursting big data workloads easier • Lake House – Future state architecture combining pace of innovation, separation of concerns, elasticity, portability and cost.

Bursting on-premise analytic workloads to Amazon EMR using Alluxio

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bursting on-premise analytic workloads to Amazon EMR using Alluxio

Similar to Bursting on-premise analytic workloads to Amazon EMR using Alluxio (20)

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Recently uploaded

Recently uploaded (20)

Bursting on-premise analytic workloads to Amazon EMR using Alluxio