Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

1,947 views

Published on

No Downloads

Total views

1,947

On SlideShare

0

From Embeds

0

Number of Embeds

19

Shares

0

Downloads

10

Comments

0

Likes

2

No embeds

No notes for slide

- 1. ACCELERATING MACHINE LEARNING ALGORITHMS BY INTEGRATING GPUS INTO MAPREDUCE CLUSTERS Sergio Herrero-Lopez Intelligent Engineering Systems Laboratory (IESL) November 30, 20111 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 2. INTRODUCTION ABOUT ME: Ph.D (December 2011) at Massachusetts Institute of Technology (USA) M.Sc (2007) and B.Sc (2005) in Electrical Engineering at University of Navarra (Spain) Microsoft Research (Redmond WA, 2008), Tampere University of Technology (Finland, 2005) and IKUSI (Spain, 2003) ABOUT PROF. WILLIAMS RESEARCH GROUP (ENGINEERING SYSTEMS DIVISION): High Performance Price Analytics for the Smart Grid (2008-2009) Large-Scale Simulator for Global Data Infrastructure Optimization (2009-2011) Music Event Detection from Tweets in New York (2010-2011) Accelerating Machine Learning Algorithms by integrating GPUs into MapReduce Clusters2 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 3. AGENDA o PROBLEM STATEMENT: Big Data & Need for scale and/or speed o PROPOSITION: Modify MapReduce runtime to o Satisfy the particular requirements of ML algorithms o Integrate Massively Parallel Processors in the system o PREVIOUS WORK MapReduce for ML in Multicore/Single-GPU/Multi- GPU/GPU-Cluster/FPGA o IMPLEMENTATION of new MR runtime using Port abstractions o PERFORMANCE results running SVMs on the proposed system o CONCLUSIONS: Contributions and Limitations. Lessons learned o FUTURE WORK3 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 4. MACHINE LEARNING PARALLELIZATION { xi, yi },i =1… n, "i, n -Representative sample 1. Does not fit in resources d -Feature selection 2. Takes too long xi Î R d , yi Î Y = {1… k} k -Consolidate classes 3. Accuracy was sacrificed Algorithm 1 Algorithm 1 Independent Runs L1 Worker X Worker Y (Cluster) Algorithm 1 Summation Form L2 (MapReduce) Worker X Worker Y Algorithm 1 L3 Structural Parallelism (MPPs) Machine Learning Algorithms decomposable into MR primitives Naïve Bayes K-means Expectation Maximization Neural Network Support Vector Machine Classification Principal Component Analysis Hidden Markov Models4 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 5. MAPREDUCE PRIMITIVES & RUNTIME Input M [ k1, v1 ] ® [ k2, v2 ] Split R ék2 , {v2,i }k ë ù® v WORKER 1 WORKER 2 WORKER M-1 WORKER M 2,i =k2 û 3 Map Sort WORKER 1 WORKER 2 WORKER N-1 WORKER N Reduce Merge Output5 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 6. MAPREDUCE REPRESENTATION OF K-MEANS M ékit , xi ù ® éki¢t , xi ù ë û ë û kit { ki¢t = x j : x j - mit £ x j - mit¢ "i¢ =1… k } ki¢t Rék¢t , { xi }k¢t =k¢t ù ® mk¢t ë û t+1 { xi }k¢ =k¢ i t t i å 1 mk¢t = t+1 x xi ki¢t =k ¢t x Î{ xi }k¢t =k¢t t+1 mk¢t i6 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 7. MAPREDUCE REPRESENTATION OF EM FOR MIXTURE OF GAUSSIANS M [(i, k), xi ] ® é(i, k), pi,k ù ë û xi a f ( xi | m , S t k t k t k ) pi,k = K åa f ( x | m , S ) t k i t k t k pi,k k=1 Rék, { pi,k¢ }k¢=k ù ® ak t+1 { pi,k¢ }k¢=k ë û n åp i,k t+1 a t+1 k = i=1 ak n Rék, { xi , pi,k¢ }k¢=k ù ® mk ë û t+1 { xi, pi,k¢ }k¢=k n åx p i i,k m t+1 k = i=1 t+1 mk t+1 nak Rék, { xi , pi,k¢ }k¢=k ù ® St+1 ë û k { xi, pi,k¢ }k¢=k n å p (x i - m k ) ( xi - m k ) t+1 t+1 T i,k S t+1 k = i=1 na t+1 St+1 k k7 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 8. MAPREDUCE REPRESENTATION OF SVM (SMO) M [i, fi ] ® [i, fi¢] fi fi¢= fi + Da Iup yIup k(x Iup , x i )+ Da Ilow yIlow k(x Ilow , x i ) fi¢ M [i, ai ] ® [i, ki ] I 0 = {i : yi = {1, -1}, 0 < a i < C} ai I1 = {i : yi = 1, a i = 0} È {i : yi = -1, a i = C } I 2 = {i : yi = 1, ai = C} È {i : yi = -1, a i = 0} ki kup = {i Î I 0 È I1 }, klow = {i Î I 0 È I 2 } ki Î kup , klow R ék, { fi }k =k ù ® (b, I ) ë i û { fi }k =k i bup = min{ fi : ki = kup }, Iup = argmin ki =kup fi blow = max{ fi : ki = klow }, I low = argmax ki =klow fi (b, I) M [i, ai ] ® [i, ai¢] yIup ( fIlow - fIup ) ai a¢ = aI - Iup up 2k(xIlow , xIup ) - k(xIlow , xIlow ) - k(xIup , xIup ) a ¢ = a I + yI yI (a I - a ¢ ) I low low low Iup up up a i¢8 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 9. MAPREDUCE FOR ML WISHLIST Static Variable mk Static vs Variable data xi x Static: Largest, fixed, used in every iteration i (a , mk , St+1 ) t+1 t+1 ( xi, yi ) k k Variable: Results of each iteration, consumed in the next iteration ( fi, ai ) DFS Iterate until convergence Avoid reloading static data between iterations MEM Utilize memory hierarchy as opposed to DFS or LFS DFS Massively Threaded MapReduce Tasks Map is embarrassingly parallel CPU MPP Reduce is highly parallelizable Dimensionality & Algebra - b xi -x j 2 Map Tasks may encapsulate high dimensional matrix-vector k(xi , x j ) = e or matrix-matrix operations Interleave multithreaded BLAS operations using static data i = 1...n, j Î { I up, I low } Sparse data structures9 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 10. COMPUTING ECOSYSTEM COMMODITY HIGH PERFORMANCE/SUPER COMPUTING COMPUTING RELATIONAL DB HADOOP INFINIBAND BIGTABLE DRYAD CASSANDRA OPENMPI GPU DYNAMO GPU 1/10 GB ETHERNET FPGA COLUMN DB HADOOP 20 GB INFINIBAND SSD DATA APPLIANCE/ WAREHOUSE COMPUTING10 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 11. MAPREDUCE CLUSTER: ARCHITECTURE Client 1) Distributed File System. - Unstructured data File Job - Scales to thousands of nodes - High reliability through NameNode replication DFS MRF 2) Map Reduce Framework Runtime JobTracker - Batch processing system - Load balancing Task Task Task DataNode 1 Block DataNode 2 DataNode 3 Block Block MRF MRF MRF TaskTracker TaskTracker TaskTracker DFS DFS DFS11 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 12. MAPREDUCE CLUSTER: LIMITATIONS DataNode 1 DataNode 2 Task Task MRF Tracker MRF Tracker One (or two) tasks per node DFS Block DFS Block One Task One Data Block CPU CPU One Core One Thread Map Map Task Task HD Block HD Block Synchronization by materialization of intermediate results CPU CPU Reduce Reduce Task Task DFS Block DFS Block No support for iterative jobs12 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 13. MASSIVELY PARALLEL PROCESSORS: NVIDIA TESLA ARCHITECTURE Host Device Stream Multiprocessor N Stream Multiprocessor 2 Memory Shared 1 Cycle coalesced Stream Multiprocessor 1 Memory Shared ~10 Cycles uncoalesced Registers Registers Registers Shared Memory Registers Registers Registers Instruction Registers Unit ProcessorRegisters 1 Processor 2 Registers …. Processor M Instruction Unit 0 Cycles Processor 1 Processor 2 …. Processor M Instruction Constant Cache Unit SP 1 SP 2 …. SP M Constant Cache Texture Cache ~10 Cycles Cache Hit Constant Memory Texture Cache Texture Memory ~400 Cycles ~400 Cycles 102 GB/s 102 GB/s Host Memory Device Memory PCI-E 16x (8GB/s)13 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 14. NVIDIA TESLA: REPRESENTATIONS Logical Representation Physical Representation Thread Processor Block MultiProcessor Maximum(512,512,64)But max 512threads per block Grid Device Shared Shared Register Memor Register Register Maximum Register Memor s s Register yRegister s Processs y …. s s Process Process (65535, Process or ConstantM Process…. or 65535) or 1 2 Process or 1 or ConstantM 2 Texture or Cache Cache Cache14 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 15. PROPOSED RUNTIME: MR + GPU Block Block DFS MRF Task Tracker HState HMem Split H->D Transfers DMem DState Pre-Map BLAS GPU Map DMem DState Post-Map D->H Transfers HState Cross-Node HMem Sort DMem DState H->D Transfers Pre-Reduce BLAS Local GPU Reduce DMem DState D->H Transfers Post-Reduce HState Cross-Node Global HMem Reduce Block Block State Snapshot every DFS x iterations15 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 16. PROPOSED RUNTIME: MR + GPU Block Block DFS MRF Task Tracker HState HMem Multiple tasks per node DMem DState Multithreaded MR Tasks GPU Interleave Multithreaded BLAS DMem DState Local/Global Reduction HState Static/Variable Data Cross-Node HMem Long-running Iterative Jobs DMem DState Stateful Nodes Shared-Memory GPU Fault-Tolerance Relaxation DMem DState HState Cross-Node HMem Block Block DFS16 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 17. PREVIOUS WORK MAPREDUCE ON SINGLE GPU/ SINGLE FPGA Interleave Multithreaded BLAS •Mars (He et al. PACT 2008) •NVIDIA (Catanzaro et al. STMCS 2008) •Cell (de Kruijf and Sankaralingam IBM Journal R&D 2009) Massively Multithreaded MR Tasks MAPREDUCE ON MULTICORE Shared-Memory •Phoenix (Ranger et al. HPCA 2007) •Phoenix 2 (Yoo et al. IISWC 2009) •Phoenix ++ (Talbot et al. MAPREDUCE 2011) Fault-Tolerance Relaxation MAPREDUCE ON MULTI-GPU/GPU CLUSTERS Intermediate data in-memory •CellMR (Rafique et al. IPDPS 2009) •GPMR (Stuart and Owens IPDPS 2011) Local/Global Reduction MAPREDUCE FOR MACHINE LEARNING •Mahout (Apache) Long running (iterative) Tasks •Multicore (Chu et al. NIPS 2006) •FGPA (Xu NIPS 2009) •Twister (Ekanayake et al. MAPREDUCE 2010) •SystemML (Ghoting et al. ICDE 2011) Static vs Variable Data17 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 18. PORT-BASED PROGRAMMING: ABSTRACTION Message Port Single Item Receiver Arbiter Multiple Item Receiver Dispatcher Handler Handler Task Handler Join Receiver Dispatcher Queue Choice Receiver Teardown State Handler Concurrent Exclusive Scatter Gather18 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 19. SCATTER-GATHER USING GPU-PORTS Task MRF Tracker (Task, Block, Port Response Port) Master Arbiter C#/Java Thread Dispatcher Handler Scatter Handler Task Handler Dispatcher Queue HState CPU Handler C++ Kernel CUDA 3.2 Gather DState19 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 20. H-DISPATCH ALTERNATIVE Task MRF Tracker (Task, Block, Response Port) Master H-Dispatch Thread Scatter + Load Balancing for non-uniform workloads + Local variable reutilization. Avoid GC blocking threads + Runs hState> sum(Dmem) Gather - Detach state and port: dState load/unload20 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 21. BINARY SVM Binary Classification: Given l samples x1, y1 ,, xl , yl with xi Rn , yi Y , i and Y 1,1 , a binary classifier predicts the label y Y of an unseen sample x Rn 1 f* f* 2 xi x j k ( xi , x j ) e21 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 22. PRIMAL & DUAL FORM OF THE SVM Find the function f that solves the following regularization problem: l k maxk,0 1 2 min f HC 1 yi f xi f where i 1 2 C 0 Then slack variables i are introduced to classify non-separable data: Primal form: Dual form: l l 1 2 1 T min f H C i f max K i 1 2 Rl i 1 i 2 subject to: subject to: l yi f xi 1 yi i 0 i i 1, , l i 1 i 1, , l i 0 0 i C where Kij yi y j k xi , x j is the kernel function l Solving the dual: f ( x ) yi i k x , xi b where b is an unreagularized bias term i 122 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 23. MULTICLASS CLASSIFICATION Multiclass Classification: Given l samples x1, y1 ,, xl , yl with xi Rn , yi Y , i and Y 1, M , , a multiclass classifier predicts the label y Y of an unseen sample x Rn Multiclass SVM: Combination of N independent binary classification tasks. Binary tasks are defined by an output code matrix R of size MxN and R ij 1,0,1 1 1 0 1 1 1 M All vs All (AVA): R 1 0 1 N 2 One vs All (OVA): R 1 1 1 N M 0 1 1 1 1 123 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 24. BINARY SVM AS MAP REDUCE PRIMITIVES IN A SINGLE-GPUGPU Processor 1 Processor p Processor P fi MAP f i MAP (ai , ki ) LOCALREDUCE (ki , fi ) (ki , fi ) GLOBALREDUCE (bup , I up ) Pre-MAP (blow, Ilow) MAP up low 2 - b xi -x j k(xi , x j ) = e i = 1...n, j Î { I up, I low } Device State: (xi , yi ) ( fi , ai , ki , b, I, K) LRU Cache Static Variable24 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 25. BINARY SVM AS MAP REDUCE PRIMITIVES IN 4 GPUS Master Thread GPU 1 GPU 2 GPU 3 GPU 4 MAP MAP MAP MAP LOCAL REDUCE LOCAL REDUCE LOCAL REDUCE LOCAL REDUCE GLOBAL REDUCE MAP MAP MAP MAP25 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 26. EXPERIMENTS AND HARDWARE Host Device Ubuntu 8.10 64bit 4x Tesla C1060 Dual Socket Intel Xeon # Stream Processors: 240 E5520 Frequency of Frequency of Processors: Cores: 2.26 GHz 1.3GHz 145 GFlops 933 GFlops Memory: Memory: 32GB DDR3 4GB DDR3 Memory Bandwidth: Memory Bandwidth: 25.6GB/s 102GB/s Host <-> Device PCIe x16 (8GB/s) LIBSVM Hadoop Multicore Single GPU Multi GPU • Single threaded • 4 VMs with one • 8 Worker Threads • 1 Worker Thread • 4 Worker Threads • Double precision datanode each in H-Dispatch • 1 GPU • 4 GPUs • Sparse • Pegasos SVM • 1 Block – 1 Thread • Single Precision • Single Precision • Double Precision • Double Precision • Dense-Sparse • Dense-Sparse • Sparse • Dense26 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 27. PERFORMANCE RESULTS: DATASETS SVM Experiment Setup # Training # Testing # (Features, Dataset (C,β) Points Points Classes) Same kernel types (RBF) WEB 49749 14951 (300,2) (64,7.8125) Same regularization parameter C Same stopping criteria: 0.001 MNIST 60000 10000 (780,10) (10,0.125) SMO based (Except Hadoop version) RCV1 518571 15564 (47236,53) (1,0.1) One vs All in multiclass problems 1GB kernel cache PROTEIN 17766 6621 (357,3) (10,0.05) SENSIT 78823 19705 (100,3) (1,0.7)27 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 28. PERFORMANCE RESULT COMPARISON Single Multi Dataset (Non-Zero %) LIBSVM Hadoop Multicore GPU(Dense) GPU(Dense) Time(s) 2364.2 1698.7 912.81 154.3 73.6 WEB (3%) Gain (x) 1.00 1.39 2.59 15.32 32.12 Accuracy (%) 82.69 82.69 82.69 82.69 82.69 Time(s) 118943.5 66753.5 22873.75 2010.3 726.9 MNIST (19%) Gain (x) 1.00 1.78 5.20 59.17 163.63 Accuracy (%) 95.76 95.76 95.76 95.76 95.76 Time(s) 710664 231486 N/A N/A N/A RCV1 (0.1%) Gain (x) 1.00 3.07 N/A N/A N/A Accuracy (%) 94.67 94.67 94.67 94.67 94.67 Time(s) 861 717.5 260.12 32.93 16.06 PROTEIN (29%) Gain (x) 1.00 1.20 3.31 26.15 53.61 Accuracy (%) 70.03 70.03 70.03 70.03 70.03 Time(s) 8162 4295.78 2005.4 134.67 58.29 SENSIT (100%) Gain (x) 1.00 1.90 4.07 60.61 140.02 Accuracy (%) 83.46 83.46 83.46 83.46 83.4628 Accelerating ML algorithms by integrating MapReduce Clusters SVMs by integrating GPUs in GPUs in MR
- 29. ELLPACK-R (Vazquez et al. IEEE CIT 2010) Dataset Single Multi (Non-Zero %) GPU(Sparse) GPU(Sparse) Time(s) 107.35 57.3 WEB (3%) Gain (x) 22.02 (1.43) 41.26 (1.26) Accuracy (%) 82.69 82.69 Time(s) N/A 3686 RCV1 (0.1%) Gain (x) N/A 192.80 Accuracy (%) 94.67 94.67 ~8.2 days -> ~1hour29 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 30. CONCLUSIONS CONCLUSIONS: Constructed a MR runtime that satisfies the requirements of many ML algorithms and integrates GPUs. Iterative stateful jobs Multithreaded BLAS to prepare Map or Reduce Tasks Static/Variable data Tested the runtime solving popular classification problems. Delivered up to two orders of magnitude of acceleration using 4 GPUs Compared different runtimes LIMITATIONS: H-Dispatch (Pull) dependent on H->D state transfers Relaxation of Fault-tolerance must be acceptable d>>n -> MapReduce will have little benefit30 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 31. FUTURE WORK FUTURE: GPU Technology: Concurrent Kernel Execution-> Maximize utilization GPUDirect-> Facilitate Sort operation Distributed Memory -> Intermediate Results Shared memory space CPU-GPU Communication Cross-Node performance GPU-Port-Abstraction In-node: Cross-Thread pointer exchange Out-node: MVAPICH2 and MVAPICH2-GPU Algorithms Requirements for incremental classification and clustering31 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 32. CONCURRENT KERNEL EXECUTION Port CPU Task Thread 1 Queue CPU Thread 2 • CUDA Compute Capability 2.0 allows up to sixteen concurrent kernels. • Concurrent kernels need to run on the same context.32 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 33. INTEGRATING THE MPP IN THE MR CLUSTER ARCHITECTURE Block Block DFS MRF Task Tracker HState HMem DMem DState GPUDirect: GPU • GPU to GPU memory copy DState • Communication with network DMem devices HState Cross-Node HMem DMem DState Minimal Communication to HState GPU DState DMem HState Cross-Node HMem Block Block State Snapshot every DFS x iterations33 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 34. PIPELINING/MEMCACHED DataNode 1 DataNode 2 Task Task MRF Tracker MRF Tracker DFS Block DFS Block Memcached node CPU CPU Map Map Task Task node MEM MEM node CPU CPU Reduce Reduce Task Task DFS Block DFS Block34 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 35. QUESTIONS35 Accelerating ML algorithms by integrating MapReduce Clusters SVMs by integrating GPUs in GPUs in MR
- 36. APPLICATION I: EVENT DETECTION USING TWEETS Sakaki et al: Detect Tweet outbreaks about large-scale and infrequent events: Natural Disasters: Earthquakes, floods. Accidents: Fire, road accidents INFREQUENT EVENTS36 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 37. APPLICATION I: EVENT DETECTION USING TWEETS Listening to the New York Philarmonic, amazing performance Lots of people trying to enter the MSG for the Alice in Chains concert. I wish I had tickets. Goal: Detect popular Nassau County Museum of Art is events on locations with looking for volunteers to greet, high volume of tweets. work in gift shop or perform clerical support.37 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 38. APPLICATION I: FEATURE VECTOR It/PRP is/VBZ a/DT good/JJ day/NN when/WRB the/DT CEO/NN of/IN a/DT multinational/JJ ,/, multi-million/JJ dollar/NN company/NN tells/VBZ you/PRP you/PRP re/VBP a/DT genius/NN ./.:/: D/NNP Lots/NNS of/IN people/NNS trying/VBG to/TO enter/VB the/DT MSG/NNP for/IN the/DT Alice/NNP in/IN Chains/NNP concert/NN ./.I/PRP wish/VBP I/PRP had/VBD tickets/NNS ./. Feature Vectors: - Has unigram with POS ì 1 If (x,y) contains___ - Has bigram with POSs hi (x, y) = í - Has trigram with POSs î 0 otherwise - X1 is subject of X2 - ….38 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 39. APPLICATION I: EXPERIMENT Used NYC.com event calendar (Oct 9-11,2009). Extracted ~400 features Title Location Description Alice in Chains has sold more than twenty million albums in the Madison Square Garden, 2 United States (and an estimated 40 million worldwide), released Alice in Penn Plaza, New York, NY, two number-one albums and 19 top 40 singles, and has received Chains 10001 six Grammy nominations… EXPERIMENT 1: • 2000 Tweets from the same weekend (160 (%8) “Concert”, 1840 (%92) “Background”) • RBF Kernel (C=10, gamma=1.0). Testing 20% -> Accuracy of %97 • “False positives” EXPERIMENT 2: • 2000 Tweets from the next weekend (160 (%8) “Concert”, 1840 (%92) “Background”) • RBF Kernel (C=10, gamma=1.0). Testing 100% -> Accuracy of %93 • “False positives” + “False negative” • After using NYC.com again -> Accuracy of %9639 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 40. APPLICATION II: PRICE CALCULATIONS FOR EACH HOUSEHOLD 30 x 96 = 2880 Values 840 Accelerating ML algorithms by integrating GPUs in MR Clusters
- 41. APPLICATION II: PRICE CALCULATIONS FOR EACH HOUSEHOLD41 Accelerating ML algorithms by integrating GPUs in MR Clusters

No public clipboards found for this slide

Be the first to comment