SlideShare a Scribd company logo
k-NN Text Classification using an FPGA-Based Sparse
Matrix Vector Multiplication Accelerator
Kevin R. Townsend, Song Sun, Tyler Johnson, Osama G. Attia,
Phillip H. Jones, and Joseph Zambreno
Reconfigurable Computing Laboratory
Iowa State University
EIT’15
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 1 / 11
Outline
1 What is k-NN text Classification?
2 Example
3 Mapping to an Accelerator
4 Results
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 2 / 11
What is k-NN text Classification?
k-NN Text Classification
autumn
leaves
butterfly
D1
D2
D5
D3,
D6
D4 class a
class b
Text classification is the
machine learning task to
classify documents.
Examples include spam filters,
classifying books in library
catelogs, and determining the
sub topic a conference paper is.
The problem can be simplified by converting documents into vectors,
also known as term-document vectors.
Each dimension in the model represents a word.
Each vector has a classification.
To classify a test document the document is converted into a vector
then the k nearest training vectors ‘vote’ to determine the
classification of the test document.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 3 / 11
Example
Dataset
name class text
Training
D1 a
Autumn was it when we first met
Autumn is it what I can’t forget
Autumn have made me alive
D2 a
Grinning pumpkins, falling leaves,
Dancing scarecrows, twirling breeze,
Color, color everywhere,
Autumn dreams are in the air!
Autumn is a woman growing old
D3 b
butterfly, butterfly
fly in the sky
butterfly, butterfly
flies so high
D4 b
Hoping to catch your eye
Circling around you, oh my
Butterfly, butterfly, come into the light
Oh, what a beautiful sight
Testing
D5 a
Its autumn again
Leaves whisper the sound of our past
In loss they pay a descent
To the ground we fall
D6 b
Butterfly; butterfly fly away,
teach me how to be as free as free can be.
Butterfly; butterfly I see you there
Each document (poem)
belongs to either class a
(poem about autumn)
or class b (poem about
butterflies)
In order to test the
algorithm there needs to
be a training set and a
testing set.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 4 / 11
Example
Converting into vectors
D1
D2
D3
D4
D5
D6
A
training
B
testing
class
name
a
a
b
b
a
b
autumn
met
alive
leaves
color
growing
butterfly
fly
sky
flies
high
hoping
sight
whisper
fall
teach
free
3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 4 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 2 0 0 0 0 1 1 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 1 2
By converting the documents in term-document vectors, 2 sparse
matrices are created: the training matrix A and the testing matrix B.
Now we can find the distance between any 2 vectors. We use dot
products to determine distance.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 5 / 11
Example
Distances and Sorting
Finding the
distance between
every test
document to
every train
document
equates to matrix
matrix
multiplication.
D1
D2
D3
D4
training
D5
D6
testing
3 0
3 0
0 17
0 8
D5 D6
D1,a,3
D2,a,3
D3,b,0
D4,b,0
D3,b,17
D4,b,8
D1,a,0
D2,a,0
k
sum a=6
b=0
b=25
a=0
We sort the values in each column while keeping track of the
documents.
We discard everything except the k = 2 largest dot products (smallest
distances).
Then add the values by class. The class with the largest sum is the
classification of the test document.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 6 / 11
Mapping to an Accelerator
Profiling
words
documents(index)
0 261,976
year
33,652 1979
112,359 1989
213,221 1999
328,692 2009
374,989 2014
We need a larger dataset to test
performance.
Profiling reveals that SpMV takes 90%
of the runtime.
Percentofruntime
Na¨ıve
Parallel
0%
25%
50%
75%
100%
Other
Partial
Sorting
SpMV
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 7 / 11
Mapping to an Accelerator
Dataflow with Accelerator
Host
Training
Documents
Rainbow Matrix
R3
FormatterR3
Formatted Matrix
R3
Formatted Matrix Coprocessor
We have developed a
FPGA-based SpMV
accelerator called R3.
For the training phase
the matrix is converted
into a new format.
Host Coprocessor
Testing Documents
Rainbow
Testing Matrix
y Vector
Partial sort
Indices and values of
k nearest documents
Classify
R3 Formatted
Training Matrix
Zeroing
0 Vector
Vector filler
x Vector
R3 SpMV
y Vector
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 8 / 11
Results
Results
Runtime(relativetona¨ıve)
0%
20%
40%
60%
80%
100%
Na¨ıve
Parallel
FPGA
Units in
seconds
0.16
0.014
0.0097
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 9 / 11
Results
New profile
Percentofruntime
Na¨ıve Parallel FPGA
0%
25%
50%
75%
100%
Other
Partial
Sorting
PCIe Com-
munication
SpMV
SpMV still takes
the majority of
the runtime, so
the introduction
of PCIe time is
not a high
priority.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 10 / 11
Results
Future Work
Currenly we perform sparse matrix sparse matrix multiplication as a
series of sparse matrix (dense) vector multiplication operations. We
could use bitmaps to reduce the memory bandwidth. (SpMV is
memory bound.)
Integration into existing programs like Rainbow.
Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 11 / 11

More Related Content

Viewers also liked

Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...
potaters
 
Floating Point Compression EIT'15
Floating Point Compression EIT'15Floating Point Compression EIT'15
Floating Point Compression EIT'15
Kevin Townsend
 
k_nearest_neighbor
k_nearest_neighbork_nearest_neighbor
k_nearest_neighbor
Mutawaqqil Billah
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
zukun
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
Harshana Madusanka Jayamaha
 
Application of matrices in real life
Application of matrices in real lifeApplication of matrices in real life
Application of matrices in real life
X-Ʀǿmặŋtiç ßǿy-ẌǷƿ
 

Viewers also liked (6)

Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...
 
Floating Point Compression EIT'15
Floating Point Compression EIT'15Floating Point Compression EIT'15
Floating Point Compression EIT'15
 
k_nearest_neighbor
k_nearest_neighbork_nearest_neighbor
k_nearest_neighbor
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 
Application of matrices in real life
Application of matrices in real lifeApplication of matrices in real life
Application of matrices in real life
 

Similar to k-NN Text Classification using an FPGA-Based Sparse Matrix Vector Multiplication Accelerator EIT'15

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Vrije Universiteit Amsterdam
 
A Document Similarity Measurement without Dictionaries
A Document Similarity Measurement without DictionariesA Document Similarity Measurement without Dictionaries
A Document Similarity Measurement without Dictionaries
鍾誠 陳鍾誠
 
Slides
SlidesSlides
Slides
butest
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
Sardhendu Mishra
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
Feynman Liang
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documents
lau
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
vini89
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Thomas Gottron
 
Topic Models
Topic ModelsTopic Models
Topic Models
Claudia Wagner
 
LDA/TagLDA In Slow Motion
LDA/TagLDA In Slow MotionLDA/TagLDA In Slow Motion
LDA/TagLDA In Slow Motion
Pradipto Das
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
sreedevibalasubraman
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
Jie Bao
 
New California Bridge
New California BridgeNew California Bridge
New California Bridge
Dr. Paul A. Rodriguez
 
Lec1
Lec1Lec1
Container Classes
Container ClassesContainer Classes
Container Classes
adil raja
 
About decision tree induction which helps in learning
About decision tree induction  which helps in learningAbout decision tree induction  which helps in learning
About decision tree induction which helps in learning
GReshma10
 
Publishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked DataPublishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked Data
Christoph Lange
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
emailharmeet
 
Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
INRIA-OAK
 

Similar to k-NN Text Classification using an FPGA-Based Sparse Matrix Vector Multiplication Accelerator EIT'15 (20)

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
A Document Similarity Measurement without Dictionaries
A Document Similarity Measurement without DictionariesA Document Similarity Measurement without Dictionaries
A Document Similarity Measurement without Dictionaries
 
Slides
SlidesSlides
Slides
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documents
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
LDA/TagLDA In Slow Motion
LDA/TagLDA In Slow MotionLDA/TagLDA In Slow Motion
LDA/TagLDA In Slow Motion
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
New California Bridge
New California BridgeNew California Bridge
New California Bridge
 
Lec1
Lec1Lec1
Lec1
 
Container Classes
Container ClassesContainer Classes
Container Classes
 
About decision tree induction which helps in learning
About decision tree induction  which helps in learningAbout decision tree induction  which helps in learning
About decision tree induction which helps in learning
 
Publishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked DataPublishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked Data
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
 
Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 

k-NN Text Classification using an FPGA-Based Sparse Matrix Vector Multiplication Accelerator EIT'15

  • 1. k-NN Text Classification using an FPGA-Based Sparse Matrix Vector Multiplication Accelerator Kevin R. Townsend, Song Sun, Tyler Johnson, Osama G. Attia, Phillip H. Jones, and Joseph Zambreno Reconfigurable Computing Laboratory Iowa State University EIT’15 Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 1 / 11
  • 2. Outline 1 What is k-NN text Classification? 2 Example 3 Mapping to an Accelerator 4 Results Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 2 / 11
  • 3. What is k-NN text Classification? k-NN Text Classification autumn leaves butterfly D1 D2 D5 D3, D6 D4 class a class b Text classification is the machine learning task to classify documents. Examples include spam filters, classifying books in library catelogs, and determining the sub topic a conference paper is. The problem can be simplified by converting documents into vectors, also known as term-document vectors. Each dimension in the model represents a word. Each vector has a classification. To classify a test document the document is converted into a vector then the k nearest training vectors ‘vote’ to determine the classification of the test document. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 3 / 11
  • 4. Example Dataset name class text Training D1 a Autumn was it when we first met Autumn is it what I can’t forget Autumn have made me alive D2 a Grinning pumpkins, falling leaves, Dancing scarecrows, twirling breeze, Color, color everywhere, Autumn dreams are in the air! Autumn is a woman growing old D3 b butterfly, butterfly fly in the sky butterfly, butterfly flies so high D4 b Hoping to catch your eye Circling around you, oh my Butterfly, butterfly, come into the light Oh, what a beautiful sight Testing D5 a Its autumn again Leaves whisper the sound of our past In loss they pay a descent To the ground we fall D6 b Butterfly; butterfly fly away, teach me how to be as free as free can be. Butterfly; butterfly I see you there Each document (poem) belongs to either class a (poem about autumn) or class b (poem about butterflies) In order to test the algorithm there needs to be a training set and a testing set. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 4 / 11
  • 5. Example Converting into vectors D1 D2 D3 D4 D5 D6 A training B testing class name a a b b a b autumn met alive leaves color growing butterfly fly sky flies high hoping sight whisper fall teach free 3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 1 2 By converting the documents in term-document vectors, 2 sparse matrices are created: the training matrix A and the testing matrix B. Now we can find the distance between any 2 vectors. We use dot products to determine distance. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 5 / 11
  • 6. Example Distances and Sorting Finding the distance between every test document to every train document equates to matrix matrix multiplication. D1 D2 D3 D4 training D5 D6 testing 3 0 3 0 0 17 0 8 D5 D6 D1,a,3 D2,a,3 D3,b,0 D4,b,0 D3,b,17 D4,b,8 D1,a,0 D2,a,0 k sum a=6 b=0 b=25 a=0 We sort the values in each column while keeping track of the documents. We discard everything except the k = 2 largest dot products (smallest distances). Then add the values by class. The class with the largest sum is the classification of the test document. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 6 / 11
  • 7. Mapping to an Accelerator Profiling words documents(index) 0 261,976 year 33,652 1979 112,359 1989 213,221 1999 328,692 2009 374,989 2014 We need a larger dataset to test performance. Profiling reveals that SpMV takes 90% of the runtime. Percentofruntime Na¨ıve Parallel 0% 25% 50% 75% 100% Other Partial Sorting SpMV Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 7 / 11
  • 8. Mapping to an Accelerator Dataflow with Accelerator Host Training Documents Rainbow Matrix R3 FormatterR3 Formatted Matrix R3 Formatted Matrix Coprocessor We have developed a FPGA-based SpMV accelerator called R3. For the training phase the matrix is converted into a new format. Host Coprocessor Testing Documents Rainbow Testing Matrix y Vector Partial sort Indices and values of k nearest documents Classify R3 Formatted Training Matrix Zeroing 0 Vector Vector filler x Vector R3 SpMV y Vector Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 8 / 11
  • 10. Results New profile Percentofruntime Na¨ıve Parallel FPGA 0% 25% 50% 75% 100% Other Partial Sorting PCIe Com- munication SpMV SpMV still takes the majority of the runtime, so the introduction of PCIe time is not a high priority. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 10 / 11
  • 11. Results Future Work Currenly we perform sparse matrix sparse matrix multiplication as a series of sparse matrix (dense) vector multiplication operations. We could use bitmaps to reduce the memory bandwidth. (SpMV is memory bound.) Integration into existing programs like Rainbow. Townsend et al. (RCL@ISU) kNN Text Classification EIT’15 11 / 11