SlideShare a Scribd company logo
1 of 1
Download to read offline
Software-Engineered Library Development to Support a
High Performance Machine Learning Visualization System
Introduction
The Northeastern Interactive Clustering Engine (NICE):
● Interactive open source data analysis tool for researchers
● GPU acceleration for quick results
Current Application:
● Study of link between environmental factors and preterm births
● Data sets generated studying a range of environmental factors
Machine Learning
● Clustering provides meaningful insight into previously unknown
relationships between the data’s attributes
● Two machine learning algorithms to cluster the users’ data:
Leslie Chase, Jason Booth, Anne Demosthene, Tory Leo, Collin Purcell,
Selean Ridley, Andrew Tu, Xiangyu Li, Shi Dong, David Kaeli
Operations Conclusion
● Develop an open source clustering engine
● Use machine learning algorithms
○ Accelerate on CPU using Eigen and KMlocal libraries
○ Accelerate parallelizable code on GPUs
● Provide visual data representation to user
○ Leads to insights into underlying correlations of the data
Acknowledgements
This research has been generously supported by the following grants:
● Grant Award ACI – 1559894 from the National Science Foundation
● Grant Award Number P42ES017198 from the National Institute of
Environmental Health Sciences
● NSF IIS-1546428 BIGDATA: IA: Exploring Analysis of Environment and Health
Through Multiple Alternative Clustering
Future Work
● Further accelerating algorithms with GPU implementations
● More machine learning algorithms based on beta testing
● Implement a recommendation engine based on users’ interests in
the different forms of clustering
○ May use a utility matrix that holds the views for each model
○ Data will be normalized to find similarity between the users
CPU Operations
● CPU (Central Processing Unit)
● Optimized for complicated
operations
● Written in C++
● Eigen Library for Matrix
operations
● KMlocal for K-Means library
● Few cores
GPU Operations
● GPU (Graphics Processing Unit)
● Optimized for simple operations
● Used in place of CPU operations
● Written in CUDA
● Uses CuSolver and Cublas
libraries
● Has thousands of cores for
parallel thread processing
How It Works
● User uploads their data to the NICE
server and selects model
● Using the CPU and GPU operations,
the server clusters the data in the
way that the user selected
● Front end uses clustering data to
create a visual environment for the
client
Images: Scikits Learn,. Comparing Different Clustering
Algorithms On Toy Datasets. 2016. Web. 2 Aug. 2016.
Pick a value for k
Choose k random points and
initialize them as cluster
centers
Assign points to cluster
centers by distance
Recalculate Centers
Reach Convergence
K-Means
● Clustering is based on points’
distance from the cluster centers
Spectral Clustering/
Alternative Views
● Clusters by density; distance
from point to point
● Alternative clusterings
made by dissimilarity
Website GitHub

More Related Content

What's hot

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Dawn Wright
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Making Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems ReproducibleMaking Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems ReproducibleMichel Albonico
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Thilina Gunarathne
 
Improving resource utilisation in the cloud environment using multivariate pr...
Improving resource utilisation in the cloud environment using multivariate pr...Improving resource utilisation in the cloud environment using multivariate pr...
Improving resource utilisation in the cloud environment using multivariate pr...Shrabanee Swagatika
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualizationbigdataviz_bay
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopNajima Begum
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska
 
A Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingA Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingMohd Hairey
 
Briefing - Dynamic Workers for Scheduling
Briefing - Dynamic Workers for SchedulingBriefing - Dynamic Workers for Scheduling
Briefing - Dynamic Workers for SchedulingBernie Chiu
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesieeepondy
 
Big Data Visualization Problem in IT Management
Big Data Visualization Problem in IT ManagementBig Data Visualization Problem in IT Management
Big Data Visualization Problem in IT Managementbigdataviz_bay
 
Job sequence scheduling for cloud computing
Job sequence scheduling for cloud computingJob sequence scheduling for cloud computing
Job sequence scheduling for cloud computingSamruddhi Gaikwad
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchFaithWestdorp
 
REVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingREVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingJaya Gautam
 

What's hot (20)

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Making Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems ReproducibleMaking Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems Reproducible
 
STDCS
STDCSSTDCS
STDCS
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
 
Mick
MickMick
Mick
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
 
Improving resource utilisation in the cloud environment using multivariate pr...
Improving resource utilisation in the cloud environment using multivariate pr...Improving resource utilisation in the cloud environment using multivariate pr...
Improving resource utilisation in the cloud environment using multivariate pr...
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using Hadoop
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
A Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingA Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud Computing
 
Briefing - Dynamic Workers for Scheduling
Briefing - Dynamic Workers for SchedulingBriefing - Dynamic Workers for Scheduling
Briefing - Dynamic Workers for Scheduling
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information services
 
Big Data Visualization Problem in IT Management
Big Data Visualization Problem in IT ManagementBig Data Visualization Problem in IT Management
Big Data Visualization Problem in IT Management
 
Job sequence scheduling for cloud computing
Job sequence scheduling for cloud computingJob sequence scheduling for cloud computing
Job sequence scheduling for cloud computing
 
Scheduling in cloud
Scheduling in cloudScheduling in cloud
Scheduling in cloud
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor search
 
REVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingREVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud Computing
 

Similar to Poster

Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackShapeBlue
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsAlluxio, Inc.
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...Gargee Hiray
 
Ci Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environmentsCi Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environmentsRob Gardner
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
 
Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCinside-BigData.com
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPUEduardo Gaspar
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfBuild User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfAlbert Wong
 
Satwik Mishra resume
Satwik Mishra resumeSatwik Mishra resume
Satwik Mishra resumeSatwik Mishra
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at ScaleNeo4j
 

Similar to Poster (20)

Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStack
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and Applications
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...
 
Resume
ResumeResume
Resume
 
Ci Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environmentsCi Connect: A service for building multi-institutional cluster environments
Ci Connect: A service for building multi-institutional cluster environments
 
resume
resumeresume
resume
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSC
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfBuild User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
 
Satwik Mishra resume
Satwik Mishra resumeSatwik Mishra resume
Satwik Mishra resume
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 

Poster

  • 1. Software-Engineered Library Development to Support a High Performance Machine Learning Visualization System Introduction The Northeastern Interactive Clustering Engine (NICE): ● Interactive open source data analysis tool for researchers ● GPU acceleration for quick results Current Application: ● Study of link between environmental factors and preterm births ● Data sets generated studying a range of environmental factors Machine Learning ● Clustering provides meaningful insight into previously unknown relationships between the data’s attributes ● Two machine learning algorithms to cluster the users’ data: Leslie Chase, Jason Booth, Anne Demosthene, Tory Leo, Collin Purcell, Selean Ridley, Andrew Tu, Xiangyu Li, Shi Dong, David Kaeli Operations Conclusion ● Develop an open source clustering engine ● Use machine learning algorithms ○ Accelerate on CPU using Eigen and KMlocal libraries ○ Accelerate parallelizable code on GPUs ● Provide visual data representation to user ○ Leads to insights into underlying correlations of the data Acknowledgements This research has been generously supported by the following grants: ● Grant Award ACI – 1559894 from the National Science Foundation ● Grant Award Number P42ES017198 from the National Institute of Environmental Health Sciences ● NSF IIS-1546428 BIGDATA: IA: Exploring Analysis of Environment and Health Through Multiple Alternative Clustering Future Work ● Further accelerating algorithms with GPU implementations ● More machine learning algorithms based on beta testing ● Implement a recommendation engine based on users’ interests in the different forms of clustering ○ May use a utility matrix that holds the views for each model ○ Data will be normalized to find similarity between the users CPU Operations ● CPU (Central Processing Unit) ● Optimized for complicated operations ● Written in C++ ● Eigen Library for Matrix operations ● KMlocal for K-Means library ● Few cores GPU Operations ● GPU (Graphics Processing Unit) ● Optimized for simple operations ● Used in place of CPU operations ● Written in CUDA ● Uses CuSolver and Cublas libraries ● Has thousands of cores for parallel thread processing How It Works ● User uploads their data to the NICE server and selects model ● Using the CPU and GPU operations, the server clusters the data in the way that the user selected ● Front end uses clustering data to create a visual environment for the client Images: Scikits Learn,. Comparing Different Clustering Algorithms On Toy Datasets. 2016. Web. 2 Aug. 2016. Pick a value for k Choose k random points and initialize them as cluster centers Assign points to cluster centers by distance Recalculate Centers Reach Convergence K-Means ● Clustering is based on points’ distance from the cluster centers Spectral Clustering/ Alternative Views ● Clusters by density; distance from point to point ● Alternative clusterings made by dissimilarity Website GitHub