Poster

•

0 likes•36 views

Collin Purcell

Software-Engineered Library Development to Support a
High Performance Machine Learning Visualization System
Introduction
The Northeastern Interactive Clustering Engine (NICE):
● Interactive open source data analysis tool for researchers
● GPU acceleration for quick results
Current Application:
● Study of link between environmental factors and preterm births
● Data sets generated studying a range of environmental factors
Machine Learning
● Clustering provides meaningful insight into previously unknown
relationships between the data’s attributes
● Two machine learning algorithms to cluster the users’ data:
Leslie Chase, Jason Booth, Anne Demosthene, Tory Leo, Collin Purcell,
Selean Ridley, Andrew Tu, Xiangyu Li, Shi Dong, David Kaeli
Operations Conclusion
● Develop an open source clustering engine
● Use machine learning algorithms
○ Accelerate on CPU using Eigen and KMlocal libraries
○ Accelerate parallelizable code on GPUs
● Provide visual data representation to user
○ Leads to insights into underlying correlations of the data
Acknowledgements
This research has been generously supported by the following grants:
● Grant Award ACI – 1559894 from the National Science Foundation
● Grant Award Number P42ES017198 from the National Institute of
Environmental Health Sciences
● NSF IIS-1546428 BIGDATA: IA: Exploring Analysis of Environment and Health
Through Multiple Alternative Clustering
Future Work
● Further accelerating algorithms with GPU implementations
● More machine learning algorithms based on beta testing
● Implement a recommendation engine based on users’ interests in
the different forms of clustering
○ May use a utility matrix that holds the views for each model
○ Data will be normalized to find similarity between the users
CPU Operations
● CPU (Central Processing Unit)
● Optimized for complicated
operations
● Written in C++
● Eigen Library for Matrix
operations
● KMlocal for K-Means library
● Few cores
GPU Operations
● GPU (Graphics Processing Unit)
● Optimized for simple operations
● Used in place of CPU operations
● Written in CUDA
● Uses CuSolver and Cublas
libraries
● Has thousands of cores for
parallel thread processing
How It Works
● User uploads their data to the NICE
server and selects model
● Using the CPU and GPU operations,
the server clusters the data in the
way that the user selected
● Front end uses clustering data to
create a visual environment for the
client
Images: Scikits Learn,. Comparing Different Clustering
Algorithms On Toy Datasets. 2016. Web. 2 Aug. 2016.
Pick a value for k
Choose k random points and
initialize them as cluster
centers
Assign points to cluster
centers by distance
Recalculate Centers
Reach Convergence
K-Means
● Clustering is based on points’
distance from the cluster centers
Spectral Clustering/
Alternative Views
● Clusters by density; distance
from point to point
● Alternative clusterings
made by dissimilarity
Website GitHub

What's hot

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...Dawn Wright

Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce

Making Elasticity Testing of Cloud-Based Systems ReproducibleMichel Albonico

STDCSyos_agg_acct_1

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media

MickMick_Lin

WTIA Cloud Computing Series - Part I: The FundamentalsWashington Technology Industry Association

Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Thilina Gunarathne

Improving resource utilisation in the cloud environment using multivariate pr...Shrabanee Swagatika

Interactive Latency in Big Data Visualizationbigdataviz_bay

Weather Data Analytics Using HadoopNajima Begum

V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska

A Survey on Resource Allocation & Monitoring in Cloud ComputingMohd Hairey

Briefing - Dynamic Workers for SchedulingBernie Chiu

Combining efficiency, fidelity, and flexibility in resource information servicesieeepondy

Big Data Visualization Problem in IT Managementbigdataviz_bay

Job sequence scheduling for cloud computingSamruddhi Gaikwad

Scheduling in cloudDr.Manjunath Kotari

Using Elastiknn for exact and approximate nearest neighbor searchFaithWestdorp

REVIEW PAPER on Scheduling in Cloud ComputingJaya Gautam

What's hot (20)

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...

Leveraging Map Reduce With Hadoop for Weather Data Analytics

Making Elasticity Testing of Cloud-Based Systems Reproducible

STDCS

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...

Mick

WTIA Cloud Computing Series - Part I: The Fundamentals

Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)

Improving resource utilisation in the cloud environment using multivariate pr...

Interactive Latency in Big Data Visualization

Weather Data Analytics Using Hadoop

V like Velocity, Predicting in Real-Time with Azure ML

A Survey on Resource Allocation & Monitoring in Cloud Computing

Briefing - Dynamic Workers for Scheduling

Combining efficiency, fidelity, and flexibility in resource information services

Big Data Visualization Problem in IT Management

Job sequence scheduling for cloud computing

Scheduling in cloud

Using Elastiknn for exact and approximate nearest neighbor search

REVIEW PAPER on Scheduling in Cloud Computing

Similar to Poster

Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud

Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr

Anurag Awasthi - Machine Learning applications for CloudStackShapeBlue

How to Develop and Operate Cloud Native Data Platforms and ApplicationsAlluxio, Inc.

Introduction to ChainerPreferred Networks

Introduction to ChainerShunta Saito

Implementing load balancing algorithm in middleware system of volunteer cloud...Gargee Hiray

ResumeKanika Midha

Ci Connect: A service for building multi-institutional cluster environmentsRob Gardner

resumeRahul Kumar Wadbude

Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings

Pioneering and Democratizing Scalable HPC+AI at PSCinside-BigData.com

Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPUEduardo Gaspar

Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA

Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfAlbert Wong

Satwik Mishra resumeSatwik Mishra

Future of hpcPutchong Uthayopas

Dynamic Resource Allocation Algorithm using ContainersIRJET Journal

Graph Data Science at ScaleNeo4j

Poster

1. Software-Engineered Library Development to Support a High Performance Machine Learning Visualization System Introduction The Northeastern Interactive Clustering Engine (NICE): ● Interactive open source data analysis tool for researchers ● GPU acceleration for quick results Current Application: ● Study of link between environmental factors and preterm births ● Data sets generated studying a range of environmental factors Machine Learning ● Clustering provides meaningful insight into previously unknown relationships between the data’s attributes ● Two machine learning algorithms to cluster the users’ data: Leslie Chase, Jason Booth, Anne Demosthene, Tory Leo, Collin Purcell, Selean Ridley, Andrew Tu, Xiangyu Li, Shi Dong, David Kaeli Operations Conclusion ● Develop an open source clustering engine ● Use machine learning algorithms ○ Accelerate on CPU using Eigen and KMlocal libraries ○ Accelerate parallelizable code on GPUs ● Provide visual data representation to user ○ Leads to insights into underlying correlations of the data Acknowledgements This research has been generously supported by the following grants: ● Grant Award ACI – 1559894 from the National Science Foundation ● Grant Award Number P42ES017198 from the National Institute of Environmental Health Sciences ● NSF IIS-1546428 BIGDATA: IA: Exploring Analysis of Environment and Health Through Multiple Alternative Clustering Future Work ● Further accelerating algorithms with GPU implementations ● More machine learning algorithms based on beta testing ● Implement a recommendation engine based on users’ interests in the different forms of clustering ○ May use a utility matrix that holds the views for each model ○ Data will be normalized to find similarity between the users CPU Operations ● CPU (Central Processing Unit) ● Optimized for complicated operations ● Written in C++ ● Eigen Library for Matrix operations ● KMlocal for K-Means library ● Few cores GPU Operations ● GPU (Graphics Processing Unit) ● Optimized for simple operations ● Used in place of CPU operations ● Written in CUDA ● Uses CuSolver and Cublas libraries ● Has thousands of cores for parallel thread processing How It Works ● User uploads their data to the NICE server and selects model ● Using the CPU and GPU operations, the server clusters the data in the way that the user selected ● Front end uses clustering data to create a visual environment for the client Images: Scikits Learn,. Comparing Different Clustering Algorithms On Toy Datasets. 2016. Web. 2 Aug. 2016. Pick a value for k Choose k random points and initialize them as cluster centers Assign points to cluster centers by distance Recalculate Centers Reach Convergence K-Means ● Clustering is based on points’ distance from the cluster centers Spectral Clustering/ Alternative Views ● Clusters by density; distance from point to point ● Alternative clusterings made by dissimilarity Website GitHub

Poster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Poster

Similar to Poster (20)

Poster