IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure
Upcoming SlideShare
Loading in...5
×
 

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure

on

  • 1,526 views

Presentation IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure at the AMD Developer Summit (APU13) November 11-13, 2013.

Presentation IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure at the AMD Developer Summit (APU13) November 11-13, 2013.

Statistics

Views

Total Views
1,526
Views on SlideShare
1,475
Embed Views
51

Actions

Likes
2
Downloads
56
Comments
0

3 Embeds 51

http://www.linkedin.com 30
https://www.linkedin.com 13
https://twitter.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure Presentation Transcript

  • ACCELERATING ANALYTICS ON HADOOP USING OPEN CL November 2013 ZUBIN DOWLATY KRISHNARAJ GHARPURE ROHIT SAREWAR PRASANNA LALINGKAR AUSTIN CV MU SIGMA INC.
  • AGENDA High Level Trends in the Applied Analytics Arena ‒Computational Enablement ‒Usability & Visualization ‒Intelligent Systems Computational Enablement ‒Big Data & Hadoop ‒Reference Diagram ‒Open CL and Hadoop Research 2 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • THE INFORMATION EXPLOSION IS LEADING TO AN UNPRECEDENTED ABILITY TO STORE, MANAGE AND ANALYZE DATA; WE NEED TO RUN AHEAD.. Data Source Explosion Click-stream data Purchase intent data Social Media, Blogs Intra / Inter Firewall RFID & Geospatial Information Events Email Archives I N F O R M A T I O N Expanding Applications Offer Optimization Opinion Mining Social Graph Targeting Influencer Marketing Clinical Data Analysis Fraud Detection Digital Data in 2011: ~ 1750 Facebook: 100+ million users per day Exabytes SKU Rationalization Mobile Phone Subscription:15 Million Bing recommendation records 4.6 billion CART Radial Basis Functions Adaptive Regression Splines Text Machine Learning ► Support Vector Machines Neural Networks Complex Event Processing RFID Tags: 40 billion Geospatial Predictive Modeling Advanced Analytical Techniques 3 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Software as a service C L O U D In Memory Analytics Search Technologies Cloud Computing Interactive Visualization Massively large databases Technology Evolution
  • Macro Trends: Agenda in All Three THREE KEY MACRO ANALYTICS TRENDS TARGETING CONSUMPTION Computational Enablement Analytics Trends – Scalable Analytics, Big Data & NoSQL technologies, GP-GPU, In – Memory & High Performance Computing – Master Data Management, Cloud, and Federated Data – the end of the central EDW? – Analytics as a Service and the Cloud – Open Source Maturity – Rise of Agile Self Service – Data Model Disruption: ETL vs. ELT Usability and Visualization – Dashboards will Evolve into Cockpits – Improved Interactive Data Visualization & Aesthetics – Augmented / Virtual Reality / Natural User Interfaces – Graph Analytics sparked by Social Data – Discovery with Computational Geometry – Mobile BI and the Integration of Consumer & Enterprise Devices – Gamification: Do you want to play a game? 4 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL  Intelligent Systems (“Anticipation Denotes Intelligence”) ‒ Operational Smart Systems are Back propagating – Pervasive Analytics ‒ Real Time Analytics and Event Streams – Beyond Batch.. ‒ Artificial Intelligence (AI) is not what we expected ‒ Don’t forget the 4th tier it’s Juicy ‒ Metaheuristics and the Ants ‒ Operational Analytics, You are the Exception ‒ Experiments in the Enterprise and the Quasi ‒ Just Say No to OLS ‒ Energy Informatics if Radiating ‒ Intelligent Search
  • What is Big Data? “BIG DATA”, THE TERM, IS ABOUT APPLYING NEW TECHNOLOGIES TO MEET UNFULFILLED NEEDS: ANALYTICS AT SCALE… BIG Computation & Big Data Scaling Economically Big Computation Map Reduce Intensive Computational Analysis not Supported by the Traditional Data Warehouse Stack and Architecture 5 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Big Data Technology Evolution BIG DATA’S RECENT ENTERPRISE POPULARITY CAN BE ATTRIBUTED TO TWO KEY FACTORS  Technology Trigger ‒ Robust Open Source Technology for Parallel Computation on Commodity Hardware (Hadoop / NoSQL)  Frustration Trigger.. ‒ Tyranny of the Data Model, Cost & Complexity for data integration, Agility, Impact, Bureaucracy, Difficulty & Costly to scale of existing EDW technology Teradata buys Aster Apache: Hadoop project Google Trigger: Releases “Map Reduce” & “GFS” paper (map-reduce DB) Cloudera named most promising startup funded in 2009 IBM Acquires Lucene subproject Yahoo: 10K core cluster Google: Bigtable paper HP Acquires Vertica Oracle MSOFT SAS EMC Acquires Greenplum elastic map-reduce 2003 2004 2005 Early Research 2006 2007 6 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL 2008 Open source dev momentum 2009 Initial success stories 2010 2011 Commercialization & Adoption 2012
  • Big Data – What is it? A MINDSET FOR ENABLING DECISION SCIENCES AT SCALE DATASET -> TOOLSET -> SKILLSET -> MINDSET  Skillset ‒ Computer Scientist ‒ Map-Reduce, HDFS, HIVE, R, PIG, NoSQL, Unstructured Data ‒ Parallel computation & Stream Processing ‒ Data & Visual Explorer ‒ Math / Stats / Data Mining / Machine Learning ‒ Math & Technology & Business ‒ Data “Artisans” / Design Thinking ‒ Systems Theory, Generalizations  Mindset ‒ Automation, Scale, Agility ‒ Higher utilization of machines for decisions ‒ Learning vs Knowing ‒ Consumption Focused ‒ Algorithmic and Heuristic (Man & Machine) ‒ Intrapreneur, Curious, Persuasive, Soft Skills 7 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • The Big Data Ecosystem THE BIG DATA ECO-SYSTEM IS A CHALLENGE TO KEEP UP WITH.. MINDSET: KNOWING VS. LEARNING.. ”Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).” source: Gartner 8 | Accelerating Analytics on HADOOP using Open CL Source: The Big Data Group llc | CONFIDENTIAL | NOVEMBER 19, 2013
  • Usability and Visualization: Evolution TREND: DASHBOARDS WILL EVOLVE TO BE MORE ACTIONABLE AND FORWARD LOOKING Dashboredom? http://www.informationmanagement.com/issues/20_7/dashboards_data_management_qua lity_business_intelligence-10019101-1.html “As we move from the information age into the intelligence age”.. 9 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Usability and Visualization OPEN SOURCE: D3 (DATA-DRIVEN DOCUMENTS) HTTP://D3JS.ORG/ 1. Helps to bring data to life using HTML5, SVG and CSS. 2. Its an extension of “Protovis”, a very popular JavaScript library. 3. With Minimal overhead, D3 is extremely fast, supporting large datasets and dynamic behavior for interaction and animation. 4. It provides many built-in reusable functions and function factories, such as graphical primitives for area, line and pie charts. http://d3js.org/ 10 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Usability and Visualization OPEN SOURCE:  R is the premier language for data scientists ‒ Strength in data visualizations.. ‒ http://www.r-project.org/ 11 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Usability and Visualization GRAPH NETWORK VISUALIZATION APPLIED TO METRIC DATA – CORRELATION NETWORKS 12 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • THE CURRENT BUSINESS SCENARIO DEMANDS AUTOMATING AND INJECTING INTELLIGENCE INTO VARIOUS ENTERPRISE TOUCH POINTS Taxonomy, Associations, Nearest Neighbor Yearly Bonus  Let me buy a new laptop Search Buy laptop Wow ! Great offers .. Since I have some more money, let me Customer Segmentation have a look Propensity Score Webpage Email Behind the Scenes Checkout Revenue Management Marketing Attribution Methods to Optimize Media Spend Hey ! I have some good deals being provided here Collaborative Filtering Recommender Systems Add to Cart CRM Pricing Elasticity Popup Optimization, Need State Personalization Mix Yippee ! 5 % discount .. Yeah.. I think I need these headphones too Offer Optimization Add Intelligent Analytics Everywhere You Can to Unlock The Value In Information 13 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • WE ARE IN THE ‘INFORMATION AGE’ AND ITS TIME THAT DATA AND HUMAN ANALYTICS CAPABILITIES ARE AUGMENTED WITH MACHINE INTELLIGENCE Intelligent Systems: Next Wave of Innovation and Change (Iron Man Suit)  Convergence of the global industrial system with the power of advanced computing, analytics, low-cost sensing and new levels of connectivity permitted by the Internet Intelligent Machines Advanced Analytics People At Work Data Mining Finally Meets the Operation.. Powered by Big Data Scale & Value * Source: GE (Industrial Internet – Pushing the Boundaries of Minds and Machines) 14 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • CURRENT NEWS ON INTELLIGENT SYSTEMS CEP Vendors Acquired June 2013: Streambase (Tibco) and Progress Apama (Software AG) USA Today, September 16, 2012 – http://www.usatoday.com/money/business/story/2012/09/16/review-automate-this-probes-math-inglobal-trading/57782600/1 Two Second Advantage: Published 2011 – Wayne Gretzky: Edge: He predicted a little better than anyone else, where the hockey puck would be. – Enterprise 2.0: Every transaction became a bit of digital data – Enterprise 3.0: Every EVENT can become a bit of digital data – There will always be value in making projections, days, months, years, predictive analytics, data mining, and analytics systems on historical data – batch systems – But in today’s world, enterprises need something more, instantaneous, proactive, predictive capability – real time systems GE Mind an Machines: November 2012 » Industrial Internet, Intelligent Systems, and Intelligent Machines » http://www.ge.com/docs/chapters/Industrial_Internet.pdf » http://www.youtube.com/watch?v=SvI3Pmv-DhE 15 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • AGENDA High Level Trends in the Applied Analytics Arena ‒Computational Enablement ‒Usability & Visualization ‒Intelligent Systems Computational Enablement ‒Big Data & Hadoop ‒Open CL and Hadoop Research 16 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Computational Enablement THE DIVERSE WORLD OF HADOOP  Manage complex data – relational and non relational – in a single repository  “Data Reservoir/Lake” Fault tolerant distributed processing Self healing, distributed storage Commodity hardware – inexpensive storage Zookeeper, Puppet Maintenance +  Store massive data sets, scale horizontally with inexpensive commodity nodes  Process at source – eliminate data movement Abstraction for parallel computing Nagios, Ganglia, Splunk Monitoring HBASE, Pig, Hive, Rhipe, Rhadoop, Mahout Data management and analytics 17 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL  Complements your existing relational database technology that excel at structured data manipulation  **NEWS FLASH***  Hadoop as the data “OS”  Hadoop 2.0 and Yarn Supporting Other Parallel Frameworks such as MPI (SAS HPC Announcement)  High Speed SQL (STRATA 2013)  Impala, Dremel, SHARK  Hadapt  ETL & EDW Disruption  GigaOM: Think Bigger..  Schema on Read, ELT
  • THE HPC LANDSCAPE IS VAST AND CONTINUOUS RESEARCH TO UNDERSTAND THE STRENGTHS AND WEAKNESSES OF DIFFERENT COMPONENTS IS THE KEY Distributed Memory SPARK Map-Reduce HPCC STORM Data Driven MPI HAMA Computation Driven Legend Already in use by Mu Sigma OpenMP Should be explored Not necessary in current context Intel TBBs Techniques for writing parallel programs Techniques supporting parallelization on the infrastructure • Size indicates our comfort level with the technique/tool 18 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL JumboMem Shared Memory Agent, Actor Model GP GPU
  • HADOOP 2.0 “THE OS FOR DATA” Source: HortonWorks 19 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • Case study: pro CRM client Data sourcing Data management Data processing Data Consumption MU SIGMA IMPLEMENTED A BIG DATA SOLUTION THAT REDUCED SCORING TIME FROM 18 HOURS TO 2 HOURS.. INGEST Marketer: Process to score a list was taking 18 hours… Data Scientists adding propensity scores, segmentation codes, LTV, etc.. Big Data Infrastructure Load scored customer data in the specified format back to EDW 20 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Score all customers based on Prioritization logic & produce output in specified format
  • Application: Machine Logs GLM on Map-Reduce, Attribution modeling on clickstream data IP Creative 1 Creative 2 Creative 3 Purchase (1/0) w.x.y.z a.b.c.d e.f.g.h 3:15 AM 5:49 AM 9:54 PM 6:16 AM 3:11 AM 5:12 PM - 1 0 1 Background  E-Commerce sites are constantly collecting clickstream data, of the order of terabytes, on various attributes such as  users,  the creative's they’ve viewed,  their transactions, etc.  Frequently, when an online customer makes a purchase, the sale is attributed to the most recent creative  However, this inferred causality is not necessarily always correct  A previous creative or sequence of creatives might actually have triggered the customer’s decision to make a purchase 21 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Solution  Statistical techniques such as logistic regression can help  better understand the contributions of different creatives towards purchases and  identify key purchase-driving creatives  Hadoop can store and process large volumes of data and R is a powerful statistical programming language  Leveraging R to perform statistical analysis on large clickstream data stored in Hadoop using map-reduce is the natural next step  Least-squares regression, a crucial component of iterative methods used to solve logistic regression problems, has been implemented here in R, using map-reduce
  • SPEEDING UP HADOOP WITH THE GPU DOES THE GPU HELP?  Hadoop jobs are designed to run in batch, with data being read off disks and streamed into ‘mappers’ and ‘reducers’ ‒ Execution times of Hadoop jobs can be viewed in terms of a fixed overhead, a variable overhead (dependent mainly on data size) and the processing time, which depends on the complexity of computations and the size of data Fixed overhead Variable overhead Processing complexity × Data size ‒ For a given size of data, the most computationally trivial job in Hadoop (an identity job) is usually measurable in tens of seconds Fixed overhead Variable overhead  The testing objective was to measure the speedups achievable by using GPUs within MapReduce for relatively complex data processing Fixed overhead Variable overhead 22 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Fixed overhead Variable overhead Processing Processing Processing
  • SETTING IT UP AND MAKING IT WORK CONFIG AND TEETHING TROUBLES  Servers, with Details CPU 4-core Intel(R) Xeon(R) CPU E5-1603 0 @ 2.80GHz RAM 16 GB GPU AMD/ATI Radeon HD 7970 with 3GB GDDR5 Memory Base software Ubuntu 12.04.3 LTS , Java 1.6.0_31 Hadoop + ecosystem Hadoop 2.0.0-cdh4.4.0 , MapReduce 0.20 OpenCL + ecosystem OpenCl 1.2 , JOCL  Since mappers/reducers run in parallel, an ideal Hadoop cluster with a GPU per node was simulated by running mappers/reducers on one machine with one GPU  It was found that ‒ Hadoop doesn’t recognize the GPU card ‒ Needs to access GUI services provided by the X-server to be able to do so* ‒ A few modifications to the user’s shell configurations were required to get around this * AMD is currently working on getting OpenCL working without the X-server 23 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • EXPERIMENTAL DETAILS THE USE-CASE  Linear regression was written in Java MapReduce with and without a GPU ‒ Uses 𝑁 rows and (𝑀 + 2)* columns of data to fit the expected value of the dependent variable as a linear combination of a set 𝑀 of predictor variables, such that for the 𝑖 th row ‒ 𝐸 𝑦 𝑖 = 𝛽0 + 𝑀 𝑗=1 𝛽 𝑗 𝑥 𝑖,𝑗 ‒ Regression coefficients are computed by solving the Normal Equations ‒ 𝛽 = (𝑋 𝑇 𝑋)−1 (𝑋 𝑇 𝑦) where 𝑋 is a matrix of dimension (𝑁) × (𝑀 + 1) and 𝑦 is a column vector of length 𝑁  Data in Hadoop is distributed across multiple nodes into 𝐵 blocks that are mutually exclusive and 𝑁 collectively exhaustive (notwithstanding Hadoop’s replication factor) 𝑇 𝑇 ‒ 𝑋 𝑏𝑘 𝑋 𝑏𝑘 and 𝑋 𝑏𝑘 𝑦 𝑏𝑘 (**) terms are computed and appended in mappers for each input block ‒ Augmented terms output from each of the mappers are summed and inversion and multiplication of terms is performed in the reducer  Each 2D input*** 2Dmappers is passed as by row and 1D array to aarray tofunction Using GPUs, each to input** is flattened a flattened sent as a 1D kernel GPU kernel functions ‒ Mappers compute dot products of all sub-arrays formed by elements 𝑒ℎ such that ℎ mod 𝑀 + 2 = 𝑙 for 𝑙 from 0 to 𝑀+1 * The dependent variable, 𝑀 predictor variables and a column of 1s used to compute the intercept coefficient, 𝛽0 ** 𝑏𝑘 for 𝑘 from 1 to 𝐵 indicating the blocks of data that constitute the input data 24 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL *** 1D in case data is processed 1 row at a time
  • USING THE GPU FROM MAPPERS DATA MOVEMENT AND COMPUTATIONS Computing the XTX term 1 1 1 3 × 1 1 2 4 XT 1 1 1 3 2 4 4 10 6 14 2 4 6 14 20 25 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL 2 1 2 = 4 3 4 6 X 4 10 14 XTX 6 14 20
  • MAPREDUCE ON CPU CPU OLS CODE LOGIC  Mapper class uses methods ‒ setup() : Called once at the beginning of the map task ‒ map() : Called for each (K,V) pair, Create a list of all the values in the input split ‒ cleanup(): Called once at the end of the map task, OLS code implemented in this method  Reducer class uses methods Calculate XTX, XTY for input blocks ‒ setup() : Called once at the beginning of the reduce task ‒ reduce() : Called for each key, sum of intermediate matrices computed in this method ‒ cleanup(): Called once at the end of the reduce task, Matrix transpose and β values computed here Mapper 26 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Reducer
  • MAPREDUCE ON GPU + CPU GPU OLS CODE LOGIC  Mapper class uses methods ‒ setup() : Called once at the beginning of the map task ‒ map() : Called for each (K,V) pair, Create a list of all the values in the input split ‒ cleanup(): Called once at the end of the map task, OLS code implemented in this method using JOCL library  Reducer class uses methods ‒ setup() : Called once at the beginning of the reduce task ‒ reduce() : Called for each key, sum of intermediate matrices computed in this method ‒ cleanup(): Called once at the end of the reduce task, Matrix transpose and β values computed here 27 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Mapper Reducer
  • EXPERIMENTAL OUTCOMES KEY RESULTS  Processing data one row at a time is not efficient – runtimes were measured for ‘chunks’ of data of increasing size 100 50 6000 5000 4000 3000 GPU 2000 CPU 1000 0 0 1000 1 row at a time GPU 2000 1000 rows at a time * Processing times for 1000 rows and 256 columns, 1 row at a time and all together ** Processing times for 100,000 rows and 256 columns, processed in chunks of ‘Number of rows’ 5000 10000 25000 50000 100000 Number of rows CPU Job execution time (s) Job execution time (s) Multiple data vs row-by-row processing* Job execution time (s)  Performing linear regression using GPUs within MapReduce constantly outperformed the same exercise using ‘vanilla’ MapReduce code Increasing the size of data chunks** Increasing the size of data chunks** (GPU only) 60 50 40 30 20 1000 28 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL 2000 5000 10000 25000 Number of rows 50000 100000
  • EXPERIMENTAL OUTCOMES CPU AND GPU EXECUTION TIME  CPU execution times show a sharp rise on data beyond 16 columns and 100,000 rows  GPU execution time does not show significant rise for rows from 1000 to 1,000,000 and columns from 4 to 256  Beyond 1,000,000 rows GPU execution times grow than CPU execution times CPU execution time * # GPU execution time * 600 500-600 500 400-500 400 300-400 300 200-300 200 10000K 100 1000K 0 100-200 0-100 100K 4 8 16 10K 32 64 128 Number of columns #1 CPU execution times of a few data sets which are more than 600 seconds which are not seen clearly in above graph Rows Columns 1K Number of rows 256 Time (s) Job execution time (s) Job execution time (s) Time (s) 600 500-600 500 400-500 400 300-400 300 200-300 200 10000K 100 1000K 0 1M 256 4 8 16 10K 32 Number of columns 64 128 1K 256 Number of rows 804 128 1863 10M 256 7701 #2 The 2 bounds of the sharp increase in CPU execution times are 256 columns * 100 K rows (= 51.2 MB) and 16 columns * 10 M rows (= 320 MB) * Processing time for data sets where number of rows range from 1K to 10 M and columns range from 4 to 256 29 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL 0-100 100K Time (s) 10M 100-200
  • EXPERIMENTAL OUTCOMES COMPARING CPU AND GPU Ratio of CPU to GPU execution time *  For very small data size, CPU and GPU have comparable runtimes  As number of rows increase the CPU execution time increases drastically as compared to GPU 17-19 Ratio of execution time of (CPU/GPU)  With increase in the number of columns the CPU execution time shows gradual increase as compared to GPU Ratio 15-17 13-15 19 11-13 17 9-11 15 7-9 13 5-7 11 3-5 9 1-3 -1-1 7 5 3 10000K 1 1000K -1  For larger data size the ratio of CPU to GPU execution time shows a steep increase 100K 4 8 16 10K 32 64 128 1K 256 Number of columns 30 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL * For data sets where number of rows range from 1K to 10 M and columns range from 4 to 256 Number of Rows
  • TAKEAWAYS RECOMMENDATIONS AND NEXT STEPS  GPUs can be effectively used within MapReduce to speed up relatively computationally heavy tasks like matrix operations ‒ Since copying data to/from a GPU is expensive, a large amount of data should be copied and processed ‘in one go’ ‒ Computation time asymtotically decreases with increasing data chunk size  Multiple mappers running on a nodeon a node can share the GPU mappers/reducers running can share the GPU  The Size of input data, Hadoop blocks, memory allocation are key factors that determine successfully and Size of data chunks and memory allocation must be chosen carefully to ensure all processes run faster execution of Hadoop codesotherwise would, when one GPU is shared across mappers/reducers successfully and faster than they using a GPU, when one GPU is shared across mappers ‒ Using a GPU is not necessarily more ‘efficient’ for data with a small number of columns ‒ Smaller blocks increases the degree of MapReduce parallelism but also leads to more mappers using the GPU simultaneously ‒ Improper memory allocation in Hadoop and on the GPU can lead to either unused memory, or more memory required than available  Future exploration includes ‒ Measure changes in job execution time with the GPU being used in the reducer as well, for OLS ‒ Empirically determining optimal values for data size and memory allocation with one/more job running multiple mappers/reducers on a node that share a GPU 31 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • APPENDIX 32 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL
  • EXPERIMENTAL RESULTS RUNTIMES OF CPU AND GPU CPU Rows GPU Rows Cols 1K 10K 100K 1M 10M Cols 1K 10K 100K 1M 10M 4 10 9 10 14 33 4 12 13 14 12 41 8 11 10 11 14 54 8 11 13 12 16 49 16 10 10 12 20 62 16 12 13 12 18 51 32 10 14 13 44 466 32 12 11 13 22 73 64 12 12 20 120 1073 64 12 13 13 36 119 128 12 12 46 232 1863 128 11 13 16 39 202 256 14 17 157 804 7701 256 13 12 18 57 423 Execution time for CPU OLS code in seconds Ratio Execution time for GPU OLS code in seconds Size Rows Rows Cols 1K 10K 100K 1M 10M Cols 1K 10K 100K 1M 10M 4 0.83 0.69 0.71 1.17 0.80 4 8 KB 80 KB 800 KB 8 MB 80 MB 8 1.00 0.77 0.92 0.88 1.10 8 16 KB 160 KB 1.6 MB 16 MB 160 MB 16 0.83 0.77 1.00 1.11 1.22 16 32 KB 320 KB 3.2 MB 32 MB 320 MB 32 0.83 1.27 1.00 2.00 6.38 32 64 KB 640 KB 6.4 MB 64 MB 640 MB 64 1.00 0.92 1.54 3.33 9.02 64 128 KB 1.28 MB 12.8 MB 128 MB 1.28 GB 128 1.09 0.92 2.88 5.95 9.22 128 256 KB 2.56 MB 25.6 MB 256 MB 2.56 GB 256 1.08 1.42 8.72 14.11 18.21 256 512 KB 5.12 MB 51.2 MB 512 MB 5.12 GB Ratio of CPU to GPU execution time 33 | Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL Data size for different number of rows and columns
  • Questions? Zubin.Dowlaty@Mu-Sigma.com Twitter @crunchdata