SlideShare a Scribd company logo
Optimizing Data Mining Process Using
         Graphic Processors
MACHINE
                         LEARNING     Data Mining
                                      An interdisciplinary field

     DATABASE              DATA              PATTERN
     SYSTEMS              MINING           RECOGNITION




                                    INFORMATION
            STATISTICS
                                      SCIENCE




“Extracting Knowledge from the Data”
CRISP-DM
  CRoss Industry
Standard Process
 for Data Mining




                                                              SIX
                                                              Phases


                   http://www.crisp-dm.org/ founded in 1996
Telecommunications




Financial data analysis

                            Retail
                          Industry

                                Healthcare and
  Web Data Mining
                              biomedical research
Scalability
Dimensionality
 Complex Data
  Data Quality
Data Ownership
Architecture difference between GPU and CPU
• More transistors for data processing
• Many-core (hundreds of cores)
General Purpose computation using GPU in
        applications “other than 3D graphics”

    Flexible and programmable
it fully supports vectorized floating
 point operations at IEEE single
 precision
additional levels of programmability
 are emerging with every generation of
 GPU (about every 18 months)
an attractive platform for general-
 purpose computation
Thread block
 “a batch of threads that can
cooperate together by
efficiently sharing data
through some fast shared
memory and synchronizing
their execution to coordinate
memory accesses.”
  Example of Block ID:
A block (x,y) of a grid of
DIM(X,Y) has block ID
        (x + y.X)
Data Mining on Cloud
                                        (Nov 22nd ‘10)


                                                            SVM
             GPU Miner                                 for Estimation of
http://code.google.com/p/gpuminer/                     Aqueous Solubility
An itemset is
                      frequent if its
                   support is not less
                    than a threshold
                   specified by users
Thresholds:
Minimum Confidence (in %): bond between the items of an itemset
Minimum Support Count (in Numbers): how many times an itemset
occur in the database
“if an itemset is not frequent, any of its
           superset is never frequent”
               Proposed by Agrawal & Srikant
                                @ VLDB’94




An influential algorithm for mining frequent itemsets for association rules.
No   YES
Vertical data layout




Horizontal data layout



                         Bitmap Representation
Agrawal & Srikant @ VLDB’94
o We have presented a GPU-based implementation of Apriori algorithm for

   frequent itemset mining.

o This implementation employs a bitmap data structure to encode the

   transaction database on the GPU and utilize the GPU's SIMD parallelism for

   support counting.

o Our implementation stores the itemsets in a bitmap, and runs entirely on the

   GPU.
Optimizing data mining process using graphic processors

More Related Content

Similar to Optimizing data mining process using graphic processors

Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
Ian Foster
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
Joshua Patterson
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Big data use cases
Big data use casesBig data use cases
Big data use cases
Russell Lankenau
 
Implementing a QbD program to make Process Validation a Lifestyle
Implementing a QbD program to make Process Validation a LifestyleImplementing a QbD program to make Process Validation a Lifestyle
Implementing a QbD program to make Process Validation a Lifestyle
Institute of Validation Technology
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Amazon Web Services
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
MapR Technologies
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
Manish Harsh
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
Amazon Web Services
 
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
Cheer Chain Enterprise Co., Ltd.
 
Big data
Big dataBig data
Big data
heena verma
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
testSri1
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
Ian Foster
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
Ian Foster
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Matt Stubbs
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
Amazon Web Services LATAM
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
Vinayak Hegde
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
inside-BigData.com
 

Similar to Optimizing data mining process using graphic processors (20)

Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Big data use cases
Big data use casesBig data use cases
Big data use cases
 
Implementing a QbD program to make Process Validation a Lifestyle
Implementing a QbD program to make Process Validation a LifestyleImplementing a QbD program to make Process Validation a Lifestyle
Implementing a QbD program to make Process Validation a Lifestyle
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
Nvidia gpu-application-catalog TESLA K80 GPU應用程式型錄
 
Big data
Big dataBig data
Big data
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
 

More from Gurupad Hegde

Schedule basketball
Schedule basketballSchedule basketball
Schedule basketball
Gurupad Hegde
 
Schedule football
Schedule footballSchedule football
Schedule football
Gurupad Hegde
 
Schedule cricket
Schedule cricketSchedule cricket
Schedule cricket
Gurupad Hegde
 
Schedule volleyball
Schedule volleyballSchedule volleyball
Schedule volleyball
Gurupad Hegde
 
Renesa: Oct 2010
Renesa: Oct 2010Renesa: Oct 2010
Renesa: Oct 2010
Gurupad Hegde
 
Resume gurupad s_hegde
Resume gurupad s_hegdeResume gurupad s_hegde
Resume gurupad s_hegde
Gurupad Hegde
 

More from Gurupad Hegde (8)

Schedule basketball
Schedule basketballSchedule basketball
Schedule basketball
 
Schedule football
Schedule footballSchedule football
Schedule football
 
Schedule cricket
Schedule cricketSchedule cricket
Schedule cricket
 
Schedule volleyball
Schedule volleyballSchedule volleyball
Schedule volleyball
 
Renesa feb11
Renesa feb11Renesa feb11
Renesa feb11
 
Svm han baker
Svm han bakerSvm han baker
Svm han baker
 
Renesa: Oct 2010
Renesa: Oct 2010Renesa: Oct 2010
Renesa: Oct 2010
 
Resume gurupad s_hegde
Resume gurupad s_hegdeResume gurupad s_hegde
Resume gurupad s_hegde
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Optimizing data mining process using graphic processors

  • 1. Optimizing Data Mining Process Using Graphic Processors
  • 2.
  • 3. MACHINE LEARNING Data Mining An interdisciplinary field DATABASE DATA PATTERN SYSTEMS MINING RECOGNITION INFORMATION STATISTICS SCIENCE “Extracting Knowledge from the Data”
  • 4. CRISP-DM CRoss Industry Standard Process for Data Mining SIX Phases http://www.crisp-dm.org/ founded in 1996
  • 5. Telecommunications Financial data analysis Retail Industry Healthcare and Web Data Mining biomedical research
  • 6. Scalability Dimensionality Complex Data Data Quality Data Ownership
  • 7.
  • 8. Architecture difference between GPU and CPU • More transistors for data processing • Many-core (hundreds of cores)
  • 9. General Purpose computation using GPU in applications “other than 3D graphics” Flexible and programmable it fully supports vectorized floating point operations at IEEE single precision additional levels of programmability are emerging with every generation of GPU (about every 18 months) an attractive platform for general- purpose computation
  • 10.
  • 11. Thread block “a batch of threads that can cooperate together by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses.” Example of Block ID: A block (x,y) of a grid of DIM(X,Y) has block ID (x + y.X)
  • 12.
  • 13.
  • 14. Data Mining on Cloud (Nov 22nd ‘10) SVM GPU Miner for Estimation of http://code.google.com/p/gpuminer/ Aqueous Solubility
  • 15.
  • 16. An itemset is frequent if its support is not less than a threshold specified by users Thresholds: Minimum Confidence (in %): bond between the items of an itemset Minimum Support Count (in Numbers): how many times an itemset occur in the database
  • 17. “if an itemset is not frequent, any of its superset is never frequent” Proposed by Agrawal & Srikant @ VLDB’94 An influential algorithm for mining frequent itemsets for association rules.
  • 18.
  • 19. No YES
  • 20.
  • 21. Vertical data layout Horizontal data layout Bitmap Representation
  • 22.
  • 23. Agrawal & Srikant @ VLDB’94
  • 24.
  • 25. o We have presented a GPU-based implementation of Apriori algorithm for frequent itemset mining. o This implementation employs a bitmap data structure to encode the transaction database on the GPU and utilize the GPU's SIMD parallelism for support counting. o Our implementation stores the itemsets in a bitmap, and runs entirely on the GPU.