Human Factors of XR: Using Human Factors to Design XR Systems
Presentation on experimental setup for verigying - "Slow Learners are Fast"
1. Machine Learning on Cell Processor
Supervisor: Dr. Eric McCreath
Student: Robin Srivastava
2. Background and Motivation
Machine
Learning
Batch Online
Learning Learning
HAM
Email-N ……..… email-2 Email-1
SPAM
3. Background and Motivation
Machine
Learning
Sequential
Batch Online in Nature
Learning Learning
HAM
Email-N ……..… email-2 Email-1
SPAM
4. Object
Performance evaluation of a parallel online machine
learning algorithm (Langford et. al. [1])
Target Machines
Cell Processor: One 3 GHz 64-bit IBM PowerPC, six
specialized co-processors
Intel Dual Core Machine: 2GHz dual core processor, 1.86 GB
of main memory
5. Stochastic Gradient Descent
Step 1: Initialize weight vector w0 with some arbitrary
values
Step 2: Update the weight vector as follows
w (t +1) = w t − η∇E ( w t )
where ∇E is the gradient of error function and η is the
learning rate
€
Step 3: Follow Step 2 for all the units for data
€ €
6. Delayed Stochastic Gradient Descent
Step 1: Initialize weight vector w0 with some arbitrary
values
Step 2: Update the weight vector as follows
w (t +1) = w t − η∇E ( w t−τ )
where ∇E is the gradient of error function and η is the
learning rate
€
Step 3: Follow Step 2 for all the units for data
€ €
8. Implementation
Dataset – TREC 2007 Public Corpus
Number of mail: 75,419
Each mail classified as either ‘ham’ or ‘spam’
Pre-processing
Total number of features extracted: 2,218,878
Pre-processed email format
<Number of features><space><index>:<count><space>…………..<index>:<count>
9. Memory Requirement
Algorithm Implemented
Online Logistic Regression with delayed update
Requirement per level of parallelization
Two private copy of weight vectors
Two shared copy of weight vectors
Two error gradients
Required Dimension for each = Number of features = 2,218,878
Data type: Float (On Cell takes 4 bytes)
Total = (6 x 2218878) x 4 = 53,253,072 bytes = 50.78 MB
Size occupied by other auxiliary variables
Alternatively
Make only shared copy use the full dimension
Total size = (2 x 2218878) x 4 = 16.9 MB + others
10. Limitations on Cell
Memory limitation of SPE
Available: 256 KB
Required: approx. 51 MB
Work Around:
Reduced the number of features
Done one more level of pre-processing
SIMD limitation
The time wasted in preparing the data for SIMD surpassed its
benefits for this implementation
11. Results
Serial implementation of logistic regression on Intel Dual
core took 36.93 and 36.45 sec respectively for two
consecutive executions.
Parallel implementation using stochastic gradient process
13. References
① John Langford, Alexander J. Samola and Martin Zinkevich.
Slow learners are fast published in Journal of Machine
Learning Research 1(2009)
② Michael Kistler, Michael Perrone, Fabrizio Petrini. Cell
Multiprocessor Communication Network: Built for Speed.
③ Thomas Chen , Ram Raghavan , Jason Dale and Eiji Iwata. Cell
Broadband Engine Architecture and its first implementation
④ Jonathan Bartlett. Programming high-performance
applications on the Cell/B.E. processor, Part 6: Smart buffer
management with DMA transfers
⑤ Introduction to Statistical Machine Learning, 2010 course
assignment 1
⑥ Christopher Bishop, Pattern Recognition and Machine
Learning.