SlideShare a Scribd company logo
1 of 26
Download to read offline
Applications of HPCC Systems at Clemson University Amy Apon, PhD ● Linh Ngo, PhD ● Michael Payne Big Data Systems Laboratory Clemson University
Applications of HPCC Systems at Clemson University Clemson Strengths and Opportunities 
PhD-level faculty & research staff Talented students Significant industry collaborators 
Palmetto – Top 5 in US Academic Supercomputers ~2000 nodes, 20K cores, 600 GPUs 100Gb Internet connectivity 
Facilities 
People
Applications of HPCC Systems at Clemson University Big Data Systems Lab Overview 
Perform World Class Research on the Systems and Enabling Information Technology for Advanced Data Analytics 
Big Data Systems Lab Research Areas 
Systems and Architectures 
Tools and Operations 
Data Analytics and Applications 
Big Data Systems Lab Vision
Applications of HPCC Systems at Clemson University Effect of High Performance Computing on Academic Research Productivity 
Motivation: There is a lot of pressure on federal funding 
We propose efficiency as a measure from which to gain insights on return on investment 
We show that locally- available HPC has a positive effect on the ability of a university to do research
Motivation: Government and business need information about public sentiment. Research: We develop and apply methods to analyze large amounts of textual data to enable inquiry of social and business problems. 
Applications of HPCC Systems at Clemson University Text mining of news reports and social media for business intelligence
Shared Execution Environment 
Temporary Local Storage 
User Privileges Only 
Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers
Applications of HPCC Systems at Clemson University 
Linh Ngo, PhD 
HPCC Systems in a Shared Research Computing Environment
Shared Execution Environment 
Temporary Local Storage 
User Privileges Only 
Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers 
How to provision and configure an HPCC cluster dynamically for research purposes? 
• 
Step 1: Configure, install, and deploy HPCC as a non-root user 
• 
Step 2: Dynamically provision HPCC cluster in a shared research environment
binutils ICU XALAN APR … 
/ 
/usr 
/lib64 
/lib 
… 
/opt 
/ 
/home 
$USER 
… 
/parallel_scratch 
/local_scratch 
Applications of HPCC Systems at Clemson University Installation and Configuration of Dependencies
Administrative privileges 
Non-administrative privileges 
… 
/ 
etc 
init.d 
HPCCSystems 
opt 
HPCCSystems 
var 
lib 
HPCCSystems 
log 
HPCCSystems 
configmgr 
mydafilesrv 
mydafilesrv 
… 
/ 
home 
$USER 
hpcc 
local_scratch 
hpcc 
parallel_scratch 
lib 
HPCCSystems 
log 
$USER 
lock 
pid 
$USER 
Applications of HPCC Systems at Clemson University Resolving Non-default Installation Path Conflicts
Remove/relax root-level settings: 
i.e.: is_root 
Reduce default configuration settings for resource requirements: 
depended on resource allocation requests 
Applications of HPCC Systems at Clemson University Non-root Deployment
mydafilesrv mydafile myeclc mythor myroxie … 
PBS_NODEFILE 
environment.xml 
1 
2 
3 
4 
5 
user.palmetto.clemson.edu 
Applications of HPCC Systems at Clemson University 
Dynamic Provisioning 
Deploy to /local_scratch or /parallel_scratch?
Applications of HPCC Systems at Clemson University 
Michael Payne Using HPCC Systems to Manage Academic Data LexisNexis Summer 2014 Internship
• 
Research in Scholarly Data requires academic data from many different sources, which store data under various formats 
• 
Aggregating these sources into a useful and cohesive structure requires a data-intensive approach to preprocessing, integration and analysis 
• 
HPCC Systems is a platform to streamline this process 
Applications of HPCC Systems at Clemson University 
Using HPCC Systems to Manage Academic Data
Higher Education Institutions 
Research 
High Performance Computing Capability 
Funding Support 
Applications of HPCC Systems at Clemson University Categories of Scholarly Data
Higher Education Institutions 
Research 
High Performance Computing Capability 
Funding Support 
Detailed Award (XML) Federal Funding (tab- delimited) Expenditures (tab-delimited) Institution Patent (Excel) Degrees Conferred (Excel) 
NIH Award Data (CSV/Excel) 
Top500 Supercomputer affiliated with academic institutions (XML) 
Institutions from States with EPSCoR status (tab-delimited) 
Institutional Information with Carnegie’s Research Classifications (multi-sheet Excel) 
Enrollment (multi-sheet Excel) Financial (multi-sheet Excel) Faculty (multi-sheet Excel) 
Detailed list of articles by discipline with abstracts, references, and disambiguated authors. (XML) 
Applications of HPCC Systems at Clemson University Scholarly Data Description
Institution name/email 
Name similarity Address 
PI/Author name Institution name 
Name similarity 
Match name with WoS ‘s Organization-Enhanced name 
Acknowledgment attributes (2008 on ward, automatic) 
Applications of HPCC Systems at Clemson University Examples of Scholarly Data Links
• 
Porting data analytic processes to ECL 
• 
Applying Machine Learning techniques for article abstract classification 
Applications of HPCC Systems at Clemson University 
Ongoing Work
LexisNexis Internship Machine Learning Manager Timothy Humphrey Mentor Arjuna Chala 
Applications of HPCC Systems at Clemson University Summer 2014 Internship - Logistic Regression for Dense Matrices
• 
Prediction using continuous and discrete values 
• 
No distributional assumptions on the predictors 
• 
May not be normally distributed or linearly related 
• 
Relationship between the discrete variable and the predictor is non-linear 
Applications of HPCC Systems at Clemson University Logistic Regression
Matrices can be partitioned Schemes must be compatible There are multiple choices! 
X 
= 
4 x 4 
4 x 1 
4 x 1 
2 x 3 
X 
= 
2 x 1 
1 x 1 
2 x 1 
3 x 1 
Applications of HPCC Systems at Clemson University Parallel Block Basic Linear Algebra Subprograms (PB-BLAS)
• 
Logistic Runtimes 
• 
Hard Coded Mapping 
• 
Full Higgs Dataset 11,000,000 x 28 
0 
5 
10 
15 
20 
25 
30 
35 
Higgs 1,000 
Higgs 10,000 
Higgs 100,000 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL
• 
Logistic Runtimes 
• 
Auto Mapping 
• 
Full Elsevier Dataset 100,000 x 3,291 
0 
5 
10 
15 
20 
25 
Elsevier 100 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL
0 
50 
100 
150 
200 
250 
300 
350 
400 
450 
Elsevier 1,000 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL 
• 
Logistic Runtimes 
• 
Auto Mapping 
• 
Full Elsevier Dataset 100,000 x 3,291
• 
Logistic Regression code and supporting functions have been documented and merged to ECL-ML GitHub repository 
• 
Auto block vector mapping function for any user that wants to use PB-BLAS 
• 
Ready to use element wise multiplication in PB-BLAS 
• 
Updated debugging statements that a clear understanding of errors 
• 
Test functions for both block vector mapping function 
• 
Sample code for using logistic regression 
• 
Currently working on K-means implementation that utilizes PB-BLAS 
Applications of HPCC Systems at Clemson University 
Project Summary
Linh Ngo, PhD ● Alex Herzog, PhD ● Michael Payne ● Amy Apon, PhD {lngo, aherzog, mpayne3, aapon}@clemson.edu Big Data Systems Laboratory Clemson University

More Related Content

Viewers also liked

The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...Eyal Doron
 
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุงระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุงเคี่ยม พัทลุง
 
Affordable Homes India
Affordable Homes IndiaAffordable Homes India
Affordable Homes Indiashagungoel87
 
2013 qld pga championship sponsorship invitation
2013 qld pga championship   sponsorship invitation2013 qld pga championship   sponsorship invitation
2013 qld pga championship sponsorship invitationAndrew Allpass
 
Top 10 lời khuyên SEO thành công vào năm 2014
Top 10 lời khuyên SEO thành công vào năm 2014Top 10 lời khuyên SEO thành công vào năm 2014
Top 10 lời khuyên SEO thành công vào năm 2014Đào tạo Seo
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydi
Aaaaa english trabajo 11  de maritza,tatiana ,sofia,maria,leydiAaaaa english trabajo 11  de maritza,tatiana ,sofia,maria,leydi
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydiMaritza Campaz Valencia
 
Webinar 2013 11-21-campanile_esri_italia
Webinar 2013 11-21-campanile_esri_italiaWebinar 2013 11-21-campanile_esri_italia
Webinar 2013 11-21-campanile_esri_italiasmespire
 
backup prosess
backup prosessbackup prosess
backup prosessIkhwan Wey
 
Game-Changing NFL Analytics with KEL
Game-Changing NFL Analytics with KELGame-Changing NFL Analytics with KEL
Game-Changing NFL Analytics with KELHPCC Systems
 
Khóa học internet marketing, đào tạo bán hàng trực tuyến
Khóa học internet marketing, đào tạo bán hàng trực tuyếnKhóa học internet marketing, đào tạo bán hàng trực tuyến
Khóa học internet marketing, đào tạo bán hàng trực tuyếnĐào tạo Seo
 

Viewers also liked (17)

The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
 
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุงระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
 
Affordable Homes India
Affordable Homes IndiaAffordable Homes India
Affordable Homes India
 
Ta1 7º ano p1
Ta1 7º ano p1Ta1 7º ano p1
Ta1 7º ano p1
 
Uu ite
Uu iteUu ite
Uu ite
 
Trabajo conflicto armado en colombia
Trabajo conflicto armado en colombiaTrabajo conflicto armado en colombia
Trabajo conflicto armado en colombia
 
Perfect your pitch
Perfect your pitchPerfect your pitch
Perfect your pitch
 
2013 qld pga championship sponsorship invitation
2013 qld pga championship   sponsorship invitation2013 qld pga championship   sponsorship invitation
2013 qld pga championship sponsorship invitation
 
Top 10 lời khuyên SEO thành công vào năm 2014
Top 10 lời khuyên SEO thành công vào năm 2014Top 10 lời khuyên SEO thành công vào năm 2014
Top 10 lời khuyên SEO thành công vào năm 2014
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Builders Bridge
Builders Bridge   Builders Bridge
Builders Bridge
 
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydi
Aaaaa english trabajo 11  de maritza,tatiana ,sofia,maria,leydiAaaaa english trabajo 11  de maritza,tatiana ,sofia,maria,leydi
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydi
 
Author guidelines
Author guidelinesAuthor guidelines
Author guidelines
 
Webinar 2013 11-21-campanile_esri_italia
Webinar 2013 11-21-campanile_esri_italiaWebinar 2013 11-21-campanile_esri_italia
Webinar 2013 11-21-campanile_esri_italia
 
backup prosess
backup prosessbackup prosess
backup prosess
 
Game-Changing NFL Analytics with KEL
Game-Changing NFL Analytics with KELGame-Changing NFL Analytics with KEL
Game-Changing NFL Analytics with KEL
 
Khóa học internet marketing, đào tạo bán hàng trực tuyến
Khóa học internet marketing, đào tạo bán hàng trực tuyếnKhóa học internet marketing, đào tạo bán hàng trực tuyến
Khóa học internet marketing, đào tạo bán hàng trực tuyến
 

Similar to HPCC Systems Engineering Summit - Applications of HPCC Systems at Clemson University

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksLinh Ngo
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...Farley Lai
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
Alliance 2017 3891-University of California | Office of The President People...
Alliance 2017  3891-University of California | Office of The President People...Alliance 2017  3891-University of California | Office of The President People...
Alliance 2017 3891-University of California | Office of The President People...Smart ERP Solutions, Inc.
 
Studies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesStudies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesHPCC Systems
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOSQAware GmbH
 
ASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge EventASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge Eventjmustac
 
1 introduction
1 introduction1 introduction
1 introductionUtkarsh De
 
Scientific
Scientific Scientific
Scientific marpierc
 
ELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureBryan Ollendyke
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Developing a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed LearningDeveloping a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed Learningwebhostingguy
 
Developing a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed LearningDeveloping a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed Learningwebhostingguy
 

Similar to HPCC Systems Engineering Summit - Applications of HPCC Systems at Clemson University (20)

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
 
Research (HPC on Cloud)
Research (HPC on Cloud)Research (HPC on Cloud)
Research (HPC on Cloud)
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
UCPath at UCOP
UCPath at UCOPUCPath at UCOP
UCPath at UCOP
 
Alliance 2017 3891-University of California | Office of The President People...
Alliance 2017  3891-University of California | Office of The President People...Alliance 2017  3891-University of California | Office of The President People...
Alliance 2017 3891-University of California | Office of The President People...
 
Studies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesStudies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning Perspectives
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
ASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge EventASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge Event
 
1 introduction
1 introduction1 introduction
1 introduction
 
Scientific
Scientific Scientific
Scientific
 
CMK resume
CMK resumeCMK resume
CMK resume
 
Database Lecture Notes
Database Lecture NotesDatabase Lecture Notes
Database Lecture Notes
 
ELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System Architecture
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Developing a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed LearningDeveloping a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed Learning
 
Developing a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed LearningDeveloping a Viable ASP Model for Distributed Learning
Developing a Viable ASP Model for Distributed Learning
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

HPCC Systems Engineering Summit - Applications of HPCC Systems at Clemson University

  • 1. Applications of HPCC Systems at Clemson University Amy Apon, PhD ● Linh Ngo, PhD ● Michael Payne Big Data Systems Laboratory Clemson University
  • 2. Applications of HPCC Systems at Clemson University Clemson Strengths and Opportunities PhD-level faculty & research staff Talented students Significant industry collaborators Palmetto – Top 5 in US Academic Supercomputers ~2000 nodes, 20K cores, 600 GPUs 100Gb Internet connectivity Facilities People
  • 3. Applications of HPCC Systems at Clemson University Big Data Systems Lab Overview Perform World Class Research on the Systems and Enabling Information Technology for Advanced Data Analytics Big Data Systems Lab Research Areas Systems and Architectures Tools and Operations Data Analytics and Applications Big Data Systems Lab Vision
  • 4. Applications of HPCC Systems at Clemson University Effect of High Performance Computing on Academic Research Productivity Motivation: There is a lot of pressure on federal funding We propose efficiency as a measure from which to gain insights on return on investment We show that locally- available HPC has a positive effect on the ability of a university to do research
  • 5. Motivation: Government and business need information about public sentiment. Research: We develop and apply methods to analyze large amounts of textual data to enable inquiry of social and business problems. Applications of HPCC Systems at Clemson University Text mining of news reports and social media for business intelligence
  • 6. Shared Execution Environment Temporary Local Storage User Privileges Only Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers
  • 7. Applications of HPCC Systems at Clemson University Linh Ngo, PhD HPCC Systems in a Shared Research Computing Environment
  • 8. Shared Execution Environment Temporary Local Storage User Privileges Only Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers How to provision and configure an HPCC cluster dynamically for research purposes? • Step 1: Configure, install, and deploy HPCC as a non-root user • Step 2: Dynamically provision HPCC cluster in a shared research environment
  • 9. binutils ICU XALAN APR … / /usr /lib64 /lib … /opt / /home $USER … /parallel_scratch /local_scratch Applications of HPCC Systems at Clemson University Installation and Configuration of Dependencies
  • 10. Administrative privileges Non-administrative privileges … / etc init.d HPCCSystems opt HPCCSystems var lib HPCCSystems log HPCCSystems configmgr mydafilesrv mydafilesrv … / home $USER hpcc local_scratch hpcc parallel_scratch lib HPCCSystems log $USER lock pid $USER Applications of HPCC Systems at Clemson University Resolving Non-default Installation Path Conflicts
  • 11. Remove/relax root-level settings: i.e.: is_root Reduce default configuration settings for resource requirements: depended on resource allocation requests Applications of HPCC Systems at Clemson University Non-root Deployment
  • 12. mydafilesrv mydafile myeclc mythor myroxie … PBS_NODEFILE environment.xml 1 2 3 4 5 user.palmetto.clemson.edu Applications of HPCC Systems at Clemson University Dynamic Provisioning Deploy to /local_scratch or /parallel_scratch?
  • 13. Applications of HPCC Systems at Clemson University Michael Payne Using HPCC Systems to Manage Academic Data LexisNexis Summer 2014 Internship
  • 14. • Research in Scholarly Data requires academic data from many different sources, which store data under various formats • Aggregating these sources into a useful and cohesive structure requires a data-intensive approach to preprocessing, integration and analysis • HPCC Systems is a platform to streamline this process Applications of HPCC Systems at Clemson University Using HPCC Systems to Manage Academic Data
  • 15. Higher Education Institutions Research High Performance Computing Capability Funding Support Applications of HPCC Systems at Clemson University Categories of Scholarly Data
  • 16. Higher Education Institutions Research High Performance Computing Capability Funding Support Detailed Award (XML) Federal Funding (tab- delimited) Expenditures (tab-delimited) Institution Patent (Excel) Degrees Conferred (Excel) NIH Award Data (CSV/Excel) Top500 Supercomputer affiliated with academic institutions (XML) Institutions from States with EPSCoR status (tab-delimited) Institutional Information with Carnegie’s Research Classifications (multi-sheet Excel) Enrollment (multi-sheet Excel) Financial (multi-sheet Excel) Faculty (multi-sheet Excel) Detailed list of articles by discipline with abstracts, references, and disambiguated authors. (XML) Applications of HPCC Systems at Clemson University Scholarly Data Description
  • 17. Institution name/email Name similarity Address PI/Author name Institution name Name similarity Match name with WoS ‘s Organization-Enhanced name Acknowledgment attributes (2008 on ward, automatic) Applications of HPCC Systems at Clemson University Examples of Scholarly Data Links
  • 18. • Porting data analytic processes to ECL • Applying Machine Learning techniques for article abstract classification Applications of HPCC Systems at Clemson University Ongoing Work
  • 19. LexisNexis Internship Machine Learning Manager Timothy Humphrey Mentor Arjuna Chala Applications of HPCC Systems at Clemson University Summer 2014 Internship - Logistic Regression for Dense Matrices
  • 20. • Prediction using continuous and discrete values • No distributional assumptions on the predictors • May not be normally distributed or linearly related • Relationship between the discrete variable and the predictor is non-linear Applications of HPCC Systems at Clemson University Logistic Regression
  • 21. Matrices can be partitioned Schemes must be compatible There are multiple choices! X = 4 x 4 4 x 1 4 x 1 2 x 3 X = 2 x 1 1 x 1 2 x 1 3 x 1 Applications of HPCC Systems at Clemson University Parallel Block Basic Linear Algebra Subprograms (PB-BLAS)
  • 22. • Logistic Runtimes • Hard Coded Mapping • Full Higgs Dataset 11,000,000 x 28 0 5 10 15 20 25 30 35 Higgs 1,000 Higgs 10,000 Higgs 100,000 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL
  • 23. • Logistic Runtimes • Auto Mapping • Full Elsevier Dataset 100,000 x 3,291 0 5 10 15 20 25 Elsevier 100 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL
  • 24. 0 50 100 150 200 250 300 350 400 450 Elsevier 1,000 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL • Logistic Runtimes • Auto Mapping • Full Elsevier Dataset 100,000 x 3,291
  • 25. • Logistic Regression code and supporting functions have been documented and merged to ECL-ML GitHub repository • Auto block vector mapping function for any user that wants to use PB-BLAS • Ready to use element wise multiplication in PB-BLAS • Updated debugging statements that a clear understanding of errors • Test functions for both block vector mapping function • Sample code for using logistic regression • Currently working on K-means implementation that utilizes PB-BLAS Applications of HPCC Systems at Clemson University Project Summary
  • 26. Linh Ngo, PhD ● Alex Herzog, PhD ● Michael Payne ● Amy Apon, PhD {lngo, aherzog, mpayne3, aapon}@clemson.edu Big Data Systems Laboratory Clemson University