CUDA performance study on Hadoop MapReduce Cluster

•

13 likes•9,710 views

This document summarizes a study on using GPUs (CUDA) to accelerate Hadoop MapReduce workloads. It introduces CUDA into Hadoop clusters, evaluates the performance speedup and power efficiency on matrix multiplication and molecular dynamics simulations, and concludes that GPU acceleration provides up to 20x speedup and reduces power consumption by up to 19/20, making it a cost-effective approach compared to CPU-only upgrades. Future work is outlined to port more applications and support heterogeneous GPU/CPU clusters.

Technology

CUDA Performance Study on
Hadoop MapReduce Clusters
CSE 930 Advanced Computer Architecture @ Fall 2010

Chen He
che@cse.unl.edu

Peng Du
pdu@cse.unl.edu

University of Nebraska-Lincoln

Overview
•
•
•
•
•

Introduction
Methodology
Evaluation
Conclusions
Future work

Introduction
CPU+GPU Architecture

5.devC[]=devA[]+devB[]
G P U

1.Malloc a[],b[],c[]

Main Mem

C P U

2.cudaMalloc
(devA[],
devB[],
devC[])

6.store

8.recycle(devA[],
devB[],devC[])
BUS
3.copy a[],b[]
To devA[].devB[]

7.Copy devC[] to c[]

4.load

Introduction
• Questions
– Can we introduce CUDA into Hadoop MapReduce
Clusters?
• Mechanism and implementation

– Is this reasonable?
• Effects and Costs

Methodology
• Question-1:Can we introduce CUDA into Hadoop ?

Methodology
• Test cases
– SDK programs
• Data intensive: Matrix Multiplication
• Computation intensive: Monte Carlo

– MDMR (Molecular Dynamics simulation based on
MapReduce)
• Pure Java program
• Introduce JCUDA

Methodology
• Port CUDA programs onto Hadoop
– GPU (CUDA-C) vs CPU (C)
– Approach
• MapRed (processHadoopData & cudaCompute)
• Main
(Hadoop Pipes)
• Scripts (runbase.sh, run-<prog>-CPU/GPU.sh)
• Input data generators

Methodology
Hadoop DFS
Input Generator

generates

.c
.c
data

void generate(..)

Hadoop-enabled MonteCarlo

CUDA
MonteCarlo

MonteCarlo.cpp
extracted

.c
.c
.c
.cu
.cu
.cu

void processHadoopData(..)
void cudaCompute(..)

extracted

MapRed.cpp
class Mapper
class Reducer

Methodology
• MDMR (Molecular Dynamics simulation based
on MapReduce)
– Time Complexity by using CPU
T (n)  c1n2  c2 n  c3

– We can simply employ GPU to parallel the n-squre
portion and reduce the time complexity to linear
(within the limit of GPU threads)

T ' (n)  c1 (dn)  c2 n  c3  c4

Evaluation
• Environment
– Head: 2xAMD 2.2GHz, 4GB DDR400 RAM, 800GB HD
– Slaves: 3 PCs (AMD 2.3G CPU, 2G DDR2-667 RAM, 400GB HD, 1Gbps
Ethernet)
– GPU: XFX 9400GT 64bit 512MB DDR3
– CUDA 3.2 Toolkit
– Hadoop 0.20.3
– ServerTech CWG-CDU power distribution unit (for the power
consumption monitoring)
• Factors
– Speedup
– Power consumption
– Cost

Evaluation
• Matrix Multiplication (Execution time)

Evaluation
• Matrix Multiplication (Power consumption)

Conclusions
• Introduced GPU into MapReduce cluster and
obtained up to 20 times speedup.
• Reduced up to 19/20 power consumption with the
current preliminary solution and work load.
• Compared with upgrading CPUs and adding more
nodes, deploying GPU on Hadoop has high cost-tobenefit ratio.
• Provided practical implementations for people
wanting to construct MapReduce clusters with GPUs.

Future Work
• Port more CUDA programs onto Hadoop.
• Incorporate reducers into the experiments
• Support heterogeneous clusters which mixed
GPU-nodes and non-GPU nodes.

Reference
• nVIDIA CUDA
http://developer.nvidia.com/object/cuda-3.2/
• Hadoop, http://www.hadoop.com.
• J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres and
E. Ayguadé Performance Management of Accelerated
MapReduce Workloads in Heterogeneous Clusters,
ICPP2010, (2010), 654-662.
• C. He, D. Swanson. Molecular Dynamics simulation
based on MapReduce, poster section, LCI 2010, (2010).

There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale

Spark Summit

Easy and High Performance GPU Programming for Java Programmers

Kazuaki Ishizaki

Methods that scale with available computation are the future of AI. Distributed deep learning is one such method that enables data scientists to massively increase their productivity by (1) running parallel experiments over many devices (GPUs/TPUs/servers) and (2) massively reducing training time by distributing the training of a single network over many devices. Apache Spark is a key enabling platform for distributed deep learning, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end pipeline. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows to build distributed deep learning applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Horovod to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We will also look at where you will find the bottlenecks when training models (in your frameworks, the network, GPUs, and with your data scientists) and how to get around them. We will look at how to use Spark Estimator model to perform hyper-parameter optimization with Spark/TensorFlow and model-architecture search, where Spark executors perform experiments in parallel to automatically find good model architectures. The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on the Hops platform. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training. The demo will be run on the Hops platform, currently used by over 450 researchers and students in Sweden, as well as at companies such as Scania and Ericsson.

Deploying Accelerators At Datacenter Scale Using Spark

Jen Aman

20140708hcjHatayama Hideharu

GPU Computing With Apache Spark And Python

Jen Aman

Parallel Linear Regression in Interative Reduce and YARN

DataWorks Summit

Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/MapReduce. In this session, we will take a look at how we parallelized linear regression parameter optimization on the next-gen YARN framework Iterative Reduce.

Enterprise Scale Topological Data Analysis Using Spark

Alpine Data

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...

Databricks

You will learn how CERN has implemented an Apache Spark-based data pipeline to support deep learning research work in High Energy Physics (HEP). HEP is a data-intensive domain. For example, the amount of data flowing through the online systems at LHC experiments is currently of the order of 1 PB/s, with particle collision events happening every 25 ns. Filtering is applied before storing data for later processing. Improvements in the accuracy of the online event filtering system are key to optimize usage and cost of compute and storage resources. A novel prototype of event filtering system based on a classifier trained using deep neural networks has recently been proposed. This presentation covers how we implemented the data pipeline to train the neural network classifier using solutions from the Apache Spark and Big Data ecosystem, integrated with tools, software, and platforms familiar to scientists and data engineers at CERN. Data preparation and feature engineering make use of PySpark, Spark SQL and Python code run via Jupyter notebooks. We will discuss key integrations and libraries that make Apache Spark able to ingest data stored using HEP data format (ROOT) and the integration with CERN storage and compute systems. You will learn about the neural network models used, defined using the Keras API, and how the models have been trained in a distributed fashion on Spark clusters using BigDL and Analytics Zoo. We will discuss the implementation and results of the distributed training, as well as the lessons learned.

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...

Databricks

Recently, there has been increased interest in running analytics and machine learning workloads on top of serverless frameworks in the cloud. The serverless execution model provides fine-grained scaling and unburdens users from having to manage servers, but also adds substantial performance overheads due to the fact that all data and intermediate state of compute task is stored on remote shared storage. In this talk I first provide a detailed performance breakdown from a machine learning workload using Spark on AWS Lambda. I show how the intermediate state of tasks — such as model updates or broadcast messages — is exchanged using remote storage and what the performance overheads are. Later, I illustrate how the same workload performs on-premise using Apache Spark and Apache Crail deployed on a high-performance cluster (100Gbps network, NVMe Flash, etc.). Serverless computing simplifies the deployment of machine learning applications. The talk shows that performance does not need to be sacrificed.

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...

Databricks

Data is the key ingredient to building high-quality, production AI applications. It comes in during the training phase, where more and higher-quality training data enables better models, as well as during the production phase, where understanding the model’s behavior in production and detecting changes in the predictions and input data are critical to maintaining a production application. However, so far most data management and machine learning tools have been largely separate. In this presentation, we’ll talk about several efforts from Databricks, in Apache Spark, as well as other open source projects, to unify data and AI in order to make it significantly simpler to build production AI applications.

Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...

Jen Aman

Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...

Databricks

Graphics Processing Units (GPUs) are becoming popular for achieving high performance of computation intensive workloads. The GPU offers thousands of cores for floating point computation. This is beneficial to machine learning algorithms that are computation intensive and are parallelizable on the Spark platform. While the current execution strategy of Spark is to execute computations for the workload across nodes, only CPUs on each node execute computation. If Spark could use GPUs on each node, users benefit from GPUs that can reduce the execution time of the algorithm and reduce the number of nodes in a cluster. Recently, while application programmers use DataFrame APIs for their application, machine learning algorithms work with RDDs that keep data across nodes for distributed computation on CPU cores. A RDD keeps data as a Scala collection class in a row-based format. The computation model of GPU can achieve high performance for contiguous data in a column-based format. For enabling efficient GPU computation on Spark, we present a column-based RDD that can keep data as an array. When we execute them on the GPU, our implementation simply copies data in the column-based RDD to the GPU device memory. Then, each GPU cores execute operations faster on the device memory. CPU cores can execute existing functions on the column-based RDD. In this session, we will give the following contribution to the Spark community: (1) we give a brief design overview of transparent GPU exploitations from programmers (2) we show our APIs to build a GPU-accelerated library using column-based RDD and show the performance gain of some programs (3) we discuss current work for transparent GPU code generation from DataFrame APIs The package for (2) is available at http://github.com/IBMSparkGPU/GPUEnabler

Spark Summit 2016: Connecting Python to the Spark Ecosystem

Daniel Rodriguez

Big Data Analytics-Open Source ToolkitsDataWorks Summit

Hadoop for Scientific Workloads__HadoopSummit2010

Yahoo Developer Network

Rapids: Data Science on GPUs

inside-BigData.com

In this deck from FOSDEM'19, Christoph Angerer from NVIDIA presents: Rapids - Data Science on GPUs. "The next big step in data science will combine the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions while taking advantage of the same technology that enables dramatic increases in speed in deep learning. This session highlights the progress that has been made on RAPIDS, discusses how you can get up and running doing data science on the GPU, and provides some use cases involving graph analytics as motivation. GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., Pandas and Scikit-Learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark). RAPIDS, developed by a consortium of companies and available as open source code, allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This allows for a substantial speed up, particularly on large data sets, and affords rapid, interactive work that previously was cumbersome to code or very slow to execute. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX). We will present GPU-accelerated graph capabilities that, with minimal conceptual code changes, allows both graph representations and graph-based analytics to achieve similar speed ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results. RAPIDS has a mission to build a platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying on the GPU and GPU platforms." Learn more: https://rapids.ai/ and https://fosdem.org/2019/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Josef A. Habdank

Watch the recording of the speech done at Spark Summit Brussles 2016 here: https://www.youtube.com/watch?v=wyfTjd9z1sY Data Science with SparkML on DataBricks is a perfect platform for application of Ensemble Learning on massive a scale. This presentation describes Prediction-as-a-Service platform which can predict trends on 1 billion observed prices daily. In order to train ensemble model on a multivariate time series in thousands/millions dimensional space, one has to fragment the whole space into subspaces which exhibit a significant similarity. In order to achieve this, the vastly sparse space has to undergo dimensionality reduction into a parameters space which then is used to cluster the observations. The data in the resulting clusters is modeled in parallel using machine learning tools capable of coefficient estimation at the massive scale (SparkML and Scikit Learn). The estimated model coefficients are stored in a database to be used when executing predictions on demand via a web service. This approach enables training models fast enough to complete the task within a couple of hours, allowing daily or even real time updates of the coefficients. The above machine learning framework is used to predict the airfares used as support tool for the airline Revenue Management systems.

Making Hardware Accelerator Easier to Use

Kazuaki Ishizaki

RAPIDS – Open GPU-accelerated Data Science

Data Works MD

RAPIDS – Open GPU-accelerated Data Science RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis. Corey J. Nolet Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs. Adam Thompson Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.

Overview of Scientific Workflows - Why Use Them?

inside-BigData.com

Scott Callaghan from the Southern California Earthquake Center presented this deck in a recent Blue Waters Webinar. "I will present an overview of scientific workflows. I'll discuss what the community means by "workflows" and what elements make up a workflow. We'll talk about common problems that users might be facing, such as automation, job management, data staging, resource provisioning, and provenance tracking, and explain how workflow tools can help address these challenges. I'll present a brief example from my own work with a series of seismic codes showing how using workflow tools can improve scientific applications. I'll finish with an overview of high-level workflow concepts, with an aim to preparing users to get the most out of discussions of specific workflow tools and identify which tools would be best for them." Watch the video: http://wp.me/p3RLHQ-gtH Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Hadoop Scheduling - a 7 year perspective

Joydeep Sen Sarma

3d printing technologyPrachi Agarwal

Energy conservation week celebrationSudha Arun

What's hot

Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

Databricks

Deploying Accelerators At Datacenter Scale Using Spark

Jen Aman

20140708hcjHatayama Hideharu

GPU Computing With Apache Spark And Python

Jen Aman

Parallel Linear Regression in Interative Reduce and YARN

DataWorks Summit

Enterprise Scale Topological Data Analysis Using Spark

Alpine Data

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...

Databricks

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...

Databricks

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...

Databricks

Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...

Jen Aman

Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...

Databricks

Spark Summit 2016: Connecting Python to the Spark Ecosystem

Daniel Rodriguez

Big Data Analytics-Open Source ToolkitsDataWorks Summit

Hadoop for Scientific Workloads__HadoopSummit2010

Yahoo Developer Network

Rapids: Data Science on GPUs

inside-BigData.com

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Josef A. Habdank

Making Hardware Accelerator Easier to Use

Kazuaki Ishizaki

RAPIDS – Open GPU-accelerated Data Science

Data Works MD

Overview of Scientific Workflows - Why Use Them?

inside-BigData.com

Hadoop Scheduling - a 7 year perspective

Joydeep Sen Sarma

What's hot (20)

Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

Deploying Accelerators At Datacenter Scale Using Spark

20140708hcj

GPU Computing With Apache Spark And Python

Parallel Linear Regression in Interative Reduce and YARN

Enterprise Scale Topological Data Analysis Using Spark

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...

Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...

Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...

Spark Summit 2016: Connecting Python to the Spark Ecosystem

Big Data Analytics-Open Source Toolkits

Hadoop for Scientific Workloads__HadoopSummit2010

Rapids: Data Science on GPUs

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Making Hardware Accelerator Easier to Use

RAPIDS – Open GPU-accelerated Data Science

Overview of Scientific Workflows - Why Use Them?

Hadoop Scheduling - a 7 year perspective

Viewers also liked

3d printing technologyPrachi Agarwal

Energy conservation week celebrationSudha Arun

Data Warehouse OptimizationCloudera, Inc.

Cloud Computing v.s. Cyber Security

Bahtiyar Bircan

This presentation surveys relation between cloud computing and cyber security. Cloud computing is one of trends enabling technology in all areas of life. This includes cyber security. Presentation review relationship of cyber security and cloud in context of cyber defense and cyber offense. Last section is dedicated to experiment how easy it is to conduct cyber attack using cloud, completely anonymously. Presentation was first given at NATO IST-125 Panel meeting, METU,Ankara TURKIYE 2015.06.11

Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...Dr.Choen Krainara

Making Display Advertising Work for Auto Dealers

Speed Shift Media

Consumer's buying behaviors have changed — the majority of the buyer cycle is now done online. Get in front of the consumer throughout the full cycle and take the initiative to capture your market by providing what they want and are actively searching for. There are many variations of display advertising and it's important to understand the difference between static ads — static display ads delivered dynamically — and truly dynamic ads. Technology is evolving everyday and so is display advertising. There are now so many opportunities available with in-image ads or "native advertising" and the capability to pull live dealership inventory directly into display advertising. In this session you will also learn how display advertising works with SEO and SEM activity and what key metrics to focus on when calculating ROI - like the direct VDP Click.

Real-World Data Governance: Data Governance Roles & Responsibilities

DATAVERSITY

Well thought out data governance roles and responsibilities lie at the heart of successful data governance programs. All activities focus on the roles. From how we recognize stewards and apply governance, to how we engage and communicate with the people in the roles – the roles become the operating model for how governance works. Join Bob Seiner for this month’s installment of the DATAVERSITY Real-World Data Governance webinar series focused on defining an operating model that can be assimilated to your organization. This model includes an easy-to-explain set of roles and responsibilities aligned with how your organization functions. The session will cover: Operational, Tactical, Strategic and Support Roles How to recognize your stewards and other roles How to apply roles consistently through all facets of your program Providing incentive for active involvement

Top 10 heavy duty diesel mechanic interview questions and answerstonychoper8206

Seminar datawarehousing

Kavisha Uniyal

Lab Report on copper cycle

Karanvir Sidhu

Equity derivativesRahul Sane

How to perform an efficient Cold Chain Compliance and Gap Analysis

Alternatives Technologie Pharma

The purpose of Gap Analysis is to assist pharmaceutical manufacturers, distributors or 3 PLs (3rd Party Logistics Providers) to help them identify gaps in their cold chain supply chain network or systems. What you will learn: * The regulatory aspects related to the cold chain * Responsibilities in the supply chain * Requirements for the storage and handling of drug products * Packaging, transportation and distribution of drug products * Performing a gap analysis to know what needs to be done in order to fully comply with regulations and optimize your processes * Ways and means to develop an executable action plan How you will benefit: * Understand how to execute a cold chain regulatory gap analysis * Discover what should be covered when looking at cold chain compliance * Gap analysis: The first step to develop a cold chain compliance program * Uncover the requirements for the storage and distribution of drug products * Sharing the responsibilities for a good cold chain compliance

Financial Management Best Practices

Autotask

Do you have the right tools to measure your financial performance? Do you know what elements are necessary to guide your business? Based on last year's rave reviews, Autotask's own Chief Financial Officer, Vince Zumbo, will return to lay out the fundamentals of planning and monitoring your financials for success. Vince will be aided by Autotask Product Manager Joe Rourke who will demonstrate how you can apply what you've learned by leveraging Autotask to support your business' optimal financial health. This session is full of tips, templates and insights that are used by financial professionals today and can be used by organizations of all sizes. [Presenters: Vince Zumbo & Patrick Burns, Autotask]

AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나

Amazon Web Services Korea

Churn management

Mohammed Akram Ayyubi

Consulting Company Valuation Model

Tony Rice

Lecture 1 introduction to construction procurement process.Aszahari Aie

Bài 1: Làm quen với ASP.NET - Giáo trình FPT - Có ví dụ kèm theo

MasterCode.vn

Energy management final ppt

EcoEvents

Top 10 electrical project engineer interview questions and answersrobin26331

Viewers also liked (20)

3d printing technology

Energy conservation week celebration

Data Warehouse Optimization

Cloud Computing v.s. Cyber Security

Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...

Making Display Advertising Work for Auto Dealers

Real-World Data Governance: Data Governance Roles & Responsibilities

Top 10 heavy duty diesel mechanic interview questions and answers

Seminar datawarehousing

Lab Report on copper cycle

Equity derivatives

How to perform an efficient Cold Chain Compliance and Gap Analysis

Financial Management Best Practices

AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나

Churn management

Consulting Company Valuation Model

Lecture 1 introduction to construction procurement process.

Bài 1: Làm quen với ASP.NET - Giáo trình FPT - Có ví dụ kèm theo

Energy management final ppt

Top 10 electrical project engineer interview questions and answers

Similar to CUDA performance study on Hadoop MapReduce Cluster

project--2 nd review_2aswini pilli

project--2 nd review_2Aswini Ashu

Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters

Koichi Shirahata

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

Deanna Kosaraju

Optimal Execution Of MapReduce Jobs In Cloud Anshul Aggarwal, Software Engineer, Cisco Systems Session Length: 1 Hour Tue March 10 21:30 PST Wed March 11 0:30 EST Wed March 11 4:30:00 UTC Wed March 11 10:00 IST Wed March 11 15:30 Sydney Voices 2015 www.globaltechwomen.com We use MapReduce programming paradigm because it lends itself well to most data-intensive analytics jobs run on cloud these days, given its ability to scale-out and leverage several machines to parallel process data. Research has demonstrates that existing approaches to provisioning other applications in the cloud are not immediately relevant to MapReduce -based applications. Provisioning a MapReduce job entails requesting optimum number of resource sets (RS) and configuring MapReduce parameters such that each resource set is maximally utilized. Each application has a different bottleneck resource (CPU :Disk :Network), and different bottleneck resource utilization, and thus needs to pick a different combination of these parameters based on the job profile such that the bottleneck resource is maximally utilized. The problem at hand is thus defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as Optimal resource utilization with Minimum incurred cost, Lower execution time, Energy Awareness, Automatic handling of node failure and Highly scalable solution.

Cloud Services for Big Data Analytics

Geoffrey Fox

We present a software model built on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS. We discuss layers in this stack We give examples of integrating ABDS with HPC We discuss how to implement this in a world of multiple infrastructures and evolving software environments for users, developers and administrators We present Cloudmesh as supporting Software-Defined Distributed System as a Service or SDDSaaS with multiple services on multiple clouds/HPC systems. We explain the functionality of Cloudmesh as well as the 3 administrator and 3 user modes supported

Cloud Services for Big Data Analytics

Geoffrey Fox

HP - Jerome Rolia - Hadoop World 2010

Cloudera, Inc.

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

DLow6

Optimal Chain Matrix Multiplication Big Data Perspective

পল্লব রায়

RAPIDS cuGraph – Accelerating all your Graph needs

Connected Data World

The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks. To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library. Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph. A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML. This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.

High Performance Deep learning with Apache Spark

Rui Liu

Dato vs GraphX

Keira Zhou

Distributed Processing Frameworks

Antonios Katsarakis

How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu

Yahoo Developer Network

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Databricks

In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.

Scheduling scheme for hadoop clusters

Amjith Singh

How to Puppetize Google Cloud Platform - PuppetConf 2014

Puppet

How @twitterhadoop chose google cloud

lohitvijayarenu

A Container-based Sizing Framework for Apache Hadoop/Spark Clusters

DataWorks Summit/Hadoop Summit

Finalprojectpresentation

SANTOSH WAYAL

Similar to CUDA performance study on Hadoop MapReduce Cluster (20)

project--2 nd review_2

Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

Cloud Services for Big Data Analytics

HP - Jerome Rolia - Hadoop World 2010

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

Optimal Chain Matrix Multiplication Big Data Perspective

RAPIDS cuGraph – Accelerating all your Graph needs

High Performance Deep learning with Apache Spark

Dato vs GraphX

Distributed Processing Frameworks

How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Scheduling scheme for hadoop clusters

How to Puppetize Google Cloud Platform - PuppetConf 2014

How @twitterhadoop chose google cloud

A Container-based Sizing Framework for Apache Hadoop/Spark Clusters

Finalprojectpresentation

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs

Alex Pruden

This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second). Paper: https://eprint.iacr.org/2023/1886

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Mind map of terminologies used in context of Generative AI

Kumud Singh

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Free Complete Python - A step towards Data Science

RinaMondal9

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Large Language Model (LLM) and it’s Geospatial Applications

Rohit Gautam

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject. We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup. Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved. The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring. The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise. By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

20240605 QFM017 Machine Intelligence Reading List May 2024

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Mind map of terminologies used in context of Generative AI

Monitoring Java Application Security with JDK Tools and JFR Events

Epistemic Interaction - tuning interfaces to provide information for AI support

Removing Uninteresting Bytes in Software Fuzzing

Essentials of Automations: The Art of Triggers and Actions in FME

Climate Impact of Software Testing at Nordic Testing Days

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Free Complete Python - A step towards Data Science

Video Streaming: Then, Now, and in the Future

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Large Language Model (LLM) and it’s Geospatial Applications

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

Artificial Intelligence for XMLDevelopment

CUDA performance study on Hadoop MapReduce Cluster

1. CUDA Performance Study on Hadoop MapReduce Clusters CSE 930 Advanced Computer Architecture @ Fall 2010 Chen He che@cse.unl.edu Peng Du pdu@cse.unl.edu University of Nebraska-Lincoln

2. Overview • • • • • Introduction Methodology Evaluation Conclusions Future work

3. Introduction • Hadoop MapReduce

4. Introduction CPU+GPU Architecture 5.devC[]=devA[]+devB[] G P U 1.Malloc a[],b[],c[] Main Mem C P U 2.cudaMalloc (devA[], devB[], devC[]) 6.store 8.recycle(devA[], devB[],devC[]) BUS 3.copy a[],b[] To devA[].devB[] 7.Copy devC[] to c[] 4.load

5. Introduction • Questions – Can we introduce CUDA into Hadoop MapReduce Clusters? • Mechanism and implementation – Is this reasonable? • Effects and Costs

6. Methodology • Question-1:Can we introduce CUDA into Hadoop ?

7. Methodology • Test cases – SDK programs • Data intensive: Matrix Multiplication • Computation intensive: Monte Carlo – MDMR (Molecular Dynamics simulation based on MapReduce) • Pure Java program • Introduce JCUDA

8. Methodology • Port CUDA programs onto Hadoop – GPU (CUDA-C) vs CPU (C) – Approach • MapRed (processHadoopData & cudaCompute) • Main (Hadoop Pipes) • Scripts (runbase.sh, run-<prog>-CPU/GPU.sh) • Input data generators

9. Methodology Hadoop DFS Input Generator generates .c .c data void generate(..) Hadoop-enabled MonteCarlo CUDA MonteCarlo MonteCarlo.cpp extracted .c .c .c .cu .cu .cu void processHadoopData(..) void cudaCompute(..) extracted MapRed.cpp class Mapper class Reducer

10. Methodology • MDMR (Molecular Dynamics simulation based on MapReduce) – Time Complexity by using CPU T (n)  c1n2  c2 n  c3 – We can simply employ GPU to parallel the n-squre portion and reduce the time complexity to linear (within the limit of GPU threads) T ' (n)  c1 (dn)  c2 n  c3  c4

11. Evaluation • Environment – Head: 2xAMD 2.2GHz, 4GB DDR400 RAM, 800GB HD – Slaves: 3 PCs (AMD 2.3G CPU, 2G DDR2-667 RAM, 400GB HD, 1Gbps Ethernet) – GPU: XFX 9400GT 64bit 512MB DDR3 – CUDA 3.2 Toolkit – Hadoop 0.20.3 – ServerTech CWG-CDU power distribution unit (for the power consumption monitoring) • Factors – Speedup – Power consumption – Cost

12. Evaluation • Matrix Multiplication (Execution time)

13. Evaluation • Matrix Multiplication (Power consumption)

14. Evaluation

15. Evaluation

16. Evaluation • MDMR – Execution time

17. Evaluation • MDMR – Power consumption

18. Conclusions • Introduced GPU into MapReduce cluster and obtained up to 20 times speedup. • Reduced up to 19/20 power consumption with the current preliminary solution and work load. • Compared with upgrading CPUs and adding more nodes, deploying GPU on Hadoop has high cost-tobenefit ratio. • Provided practical implementations for people wanting to construct MapReduce clusters with GPUs.

19. Future Work • Port more CUDA programs onto Hadoop. • Incorporate reducers into the experiments • Support heterogeneous clusters which mixed GPU-nodes and non-GPU nodes.

20. Reference • nVIDIA CUDA http://developer.nvidia.com/object/cuda-3.2/ • Hadoop, http://www.hadoop.com. • J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres and E. Ayguadé Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters, ICPP2010, (2010), 654-662. • C. He, D. Swanson. Molecular Dynamics simulation based on MapReduce, poster section, LCI 2010, (2010).

CUDA performance study on Hadoop MapReduce Cluster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to CUDA performance study on Hadoop MapReduce Cluster

Similar to CUDA performance study on Hadoop MapReduce Cluster (20)

Recently uploaded

Recently uploaded (20)

CUDA performance study on Hadoop MapReduce Cluster