Twister4Azure - Iterative MapReduce for Azure Cloud

•Download as PPTX, PDF•

1 like•1,358 views

Twister4Azure is an iterative MapReduce framework, which support development and execution of Iterative MapReduce and traditional MapReduce application in Microsoft Azure cloud.

Technology Business

Twister4Azure : Iterative MapReduce for Azure Cloud ThilinaGunarathne, Judy Qiu, Geoffrey Fox {tgunarat, xqiu,gcf}@indiana.edu CCA 2011 April 12 – 13, 2011

MapReduceRolesfor Azure Familiar MapReduce programming model Built using highly-available and scalable Azure cloud services Co-exist with eventual consistency & high latency of cloud services Decentralized control No single point of failure. Supports dynamically scaling up and down of the compute resources. MapReduce fault tolerance

Cache aware hybrid scheduling using Queues as well as using a bulletin board (special table) ,[object Object]

Performance – Kmeans Clustering Performance with/without data caching. Speedup gained using data cache Increasing number of iterations Scaling speedup

Performance Comparisons BLAST Sequence Search Smith Watermann Sequence Alignment Cap3 Sequence Assembly

What's hot

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Dawn Wright

HadoopAmit Chaudhary

Scheduling in cloudDr.Manjunath Kotari

Introduction to OpenStackThe Linux Foundation

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...JAYAPRAKASH JPINFOTECH

WTIA Cloud Computing Series - Part I: The FundamentalsWashington Technology Industry Association

Realworld Powersavings for the Cloud Customercloudcampghent

DGterzoMassimo Crescimbene

GridFajar Zain

PosterCollin Purcell

Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman

Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiPacificResearchPlatform

Foss4G 2009 Scenz Gridnoho

Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.

OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Globus

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...JAYAPRAKASH JPINFOTECH

AWS Customer Presentation - Cycle Computing - AWS Summit 2012 - NYCAmazon Web Services

Clustring computingmeraz_ahmed

NASA Earth Exchange (NEX) OverviewPlanet OS

What's hot (20)

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...

Hadoop

Scheduling in cloud

Introduction to OpenStack

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...

WTIA Cloud Computing Series - Part I: The Fundamentals

Realworld Powersavings for the Cloud Customer

DGterzo

Grid

Poster

Bioclouds CAMDA (Robert Grossman) 09-v9p

Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li

Foss4G 2009 Scenz Grid

Data Orchestration for AI, Big Data, and Cloud

OCC Overview OMG Clouds Meeting 07-13-09 v3

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...

AWS Customer Presentation - Cycle Computing - AWS Summit 2012 - NYC

Clustring computing

NASA Earth Exchange (NEX) Overview

Similar to Twister4Azure - Iterative MapReduce for Azure Cloud

Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...AM Publications

Paper444012-4014saumya yuval

New proximity estimate for incremental update of non uniformly distributed cl...IJDKP

Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmIRJET Journal

A novel cost-based replica server placement for optimal service quality in ...IJECEIAES

IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET Journal

IEEE Parallel and distributed system 2016 Title and Abstracttsysglobalsolutions

MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss

D017212027IOSR Journals

A Novel Approach for Workload Optimization and Improving Security in Cloud Co...IOSR Journals

Data Distribution Handling on Cloud for Deployment of Big Dataneirew J

Data Distribution Handling on Cloud for Deployment of Big Dataijccsa

Extending Grids with Cloud Resource Management for Scientific ComputingBharat Kalia

IRJET- An Efficient Data Replication in Salesforce Cloud EnvironmentIRJET Journal

Assimilating sense into disaster recovery databases and judgement framing pr...IJECEIAES

A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD

Final_attribute based encryption in cloud with significant reduction of compu...Naveena N

ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESijccsa

Similar to Twister4Azure - Iterative MapReduce for Azure Cloud (20)

Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...

Paper444012-4014

New proximity estimate for incremental update of non uniformly distributed cl...

Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm

A novel cost-based replica server placement for optimal service quality in ...

IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...

IEEE Parallel and distributed system 2016 Title and Abstract

MataNui - Building a Grid Data Infrastructure that "doesn't suck!"

D017212027

A Novel Approach for Workload Optimization and Improving Security in Cloud Co...

Data Distribution Handling on Cloud for Deployment of Big Data

Extending Grids with Cloud Resource Management for Scientific Computing

IRJET- An Efficient Data Replication in Salesforce Cloud Environment

Assimilating sense into disaster recovery databases and judgement framing pr...

A Study on Replication and Failover Cluster to Maximize System Uptime

Final_attribute based encryption in cloud with significant reduction of compu...

ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES

Recently uploaded

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Advanced Computer Architecture – An IntroductionDilum Bandara

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers

Advanced Computer Architecture – An Introduction

Developer Data Modeling Mistakes: From Postgres to NoSQL

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

WordPress Websites for Engineers: Elevate Your Brand

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

The State of Passkeys with FIDO Alliance.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

SIP trunking in Janus @ Kamailio World 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

TeamStation AI System Report LATAM IT Salaries 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Connect Wave/ connectwave Pitch Deck Presentation

DevEX - reference for building teams, processes, and platforms

DMCC Future of Trade Web3 - Special Edition

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Commit 2024 - Secret Management made easy

Scanning the Internet for External Cloud Exposures via SSL Certs

Are Multi-Cloud and Serverless Good or Bad?

Twister4Azure - Iterative MapReduce for Azure Cloud

1. Twister4Azure : Iterative MapReduce for Azure Cloud ThilinaGunarathne, Judy Qiu, Geoffrey Fox {tgunarat, xqiu,gcf}@indiana.edu CCA 2011 April 12 – 13, 2011

2. MapReduceRolesfor Azure Familiar MapReduce programming model Built using highly-available and scalable Azure cloud services Co-exist with eventual consistency & high latency of cloud services Decentralized control No single point of failure. Supports dynamically scaling up and down of the compute resources. MapReduce fault tolerance

3. MapReduceRolesfor Azure

5. In-Memory Caching of static data

7. Performance – Kmeans Clustering Performance with/without data caching. Speedup gained using data cache Increasing number of iterations Scaling speedup

8. Performance Comparisons BLAST Sequence Search Smith Watermann Sequence Alignment Cap3 Sequence Assembly

9. Conclusion Enables users to easily and efficiently perform large scale iterative data analysis and scientific computations on Azure cloud. Utilizes a novel hybrid scheduling mechanism to provide the caching of static data across iterations. Utilize cloud infrastructure services effectively to deliver robust and efficient applications. http://salsahpc.indiana.edu/twister4azure

Editor's Notes

We use Azure Queues for scheduling, Tables to store meta-data and monitoring data, Blobs for input/output/intermediate data storage.
In our current work, we extend MR4Azure to support iterative mapreduce computations. We added an additional merge step to the programming model, where the computations decides whether to go for a new iteration or not. We also support in-memory caching of static data between iterations and we developed a hybrid scheduling strategy to perform cache aware scheduling.
We don’t have a master node, who has the global knowledge about the cached data.. Hence in each iteration, tasks will be posted to the bulleting board, where workers will first check to identify any tasks that require a data product they have in cache. If not they fall back to the queue.
KMeans iterative MapReduce performance.16 Azure Small instances,6 iterations,8 to 48 million 20-D data points.Left: Performance with and without data caching.Right: Speedup obtained from using the data cacheLeft: Scaling speedup with increasing number of instances (Azure Small) & data for 10 iterations.Right: Increasing number of iterations using 16 million data points with caching using 16 Azure Small instances.

Twister4Azure - Iterative MapReduce for Azure Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Twister4Azure - Iterative MapReduce for Azure Cloud

Similar to Twister4Azure - Iterative MapReduce for Azure Cloud (20)

Recently uploaded

Recently uploaded (20)

Twister4Azure - Iterative MapReduce for Azure Cloud

Editor's Notes