SlideShare a Scribd company logo
Studies of HPCC Systems® from
Machine Learning Perspectives
Ying Xie, Pooja Chenna, Ken Hoganson
Department of Computer Science
Kennesaw State University
Outline
• Leverage and Enhance Deep Learning Capability of HPCC
Systems®
• Comparative Studies of HPCC Systems® and Hadoop Systems
Leverage and Enhance Deep Learning
Capability of HPCC Systems®
• Deep learning has been emerged as a recent breakthrough in
the area of Machine Learning and revitalized research
activities in Artificial Intelligence (AI).
• The development of deep learning techniques has been
primarily driven by big data analytics.
• Since HPCC Systems is a well established big data platform, it
is very important to leverage and enhance the deep learning
capability of the HPCC Systems platform.
Images are copied from neuralnetworksanddeeplearning.com
• From Neural Network to Deep Learning
A typical multilayer perceptron
A deep neural architecture
Stacked Autoencoder
Images are copied from ufldl.stanford.edu
• Stacked Autoencoder
This code was implemented by Maryam Mousaarab Najafabadi from F
Deep Belief Network
Restricted Boltzmann Machine
Images are copied from http://deeplearning4j.org/
Deep Belief Network
Visualize High Dimensional Data
• Iris Data
Use DBN to conduct dimension reduction
• Breast Cancer Wisconsin (Original) Data Set
Use DBN to conduct dimension reduction
• Glass Data
Use stacked auto-encoder to conduct dimension reduction
• Based on the visualization, we may even gain
some good ideas what classification algorithm
may work well
Classification Algorithm #Miss-classified Instances
Logistic 72
MLP 66
Simple Logistic 70
7NN 76
5NN 70
3NN 64
Mapping Data to Higher Dimensional Spaces
• Blood Transfusion Data
• Classification performance on higher dimensional space
Original Space (4 Dim)
(#Miss-classified Instances)
Higher Dim (10 Dim)
(#Miss-classified Instances)
Logistic 171 157
MLP 160 166
Mapped to higher dimensional space by Stacked Auto-Encoder
Mapped to higher dimensional space DBN
• Wine Data
• Classification performance on higher dimensional space
Original Space (13 Dim)
(#Miss-classified Instances)
Higher Dim (15 Dim)
(#Miss-classified Instances)
Logistic 10 6
MLP 4 4
Original Space (13 Dim)
(#Miss-classified Instances)
Higher Dim (15 Dim)
(#Miss-classified Instances)
Logistic 10 7
MLP 4 2
Mapped to higher dimensional space by Stacked Auto-Encoder
• For a given data set, we can explore which
combination of deep learning mapping
techniques, dimensional space, and
supervised learning model may yield best
classification result
• For instance – Breast Cancer Data
3 Dim. Space
(#Miss-classified
Instances)
6 Dim. Space
(#Miss-classified
Instances)
9 Dim. Space
(#Miss-classified
Instances)
12 Dim. Space
(#Miss-classified
Instances)
Logistic 22 21 24 25
MLP 21 22 30 21
3 Dim. Space
(#Miss-classified
Instances)
6 Dim. Space
(#Miss-classified
Instances)
9 Dim. Space
(#Miss-classified
Instances)
12 Dim. Space
(#Miss-classified
Instances)
Logistic 22 23 24 25
MLP 20 21 30 22
Mapped to different dimensional spaces by Stacked Auto-Encoder
Mapped to different dimensional spaces by DBN
• Our next step: try to implement a meta
supervised learning algorithm on HPCC
– This algorithm will automatically map the given data
to different dimensional spaces by using both stacked
auto-encoder and DBN
– Then classification models will be trained on all
dimensional spaces in a distributed manner
– Cross-validation will be used to select the best
performed model as the final output.
Implementation of Deep Belief Network on
HPCC
Our Implementations of Deep Learning
on HPCC
• Restricted Boltzmann Machine (RBM) with
Contrastive Divergence learning algorithm
• Deep Belief Network by stacking RBMs
• Supervised Deep Belief Network
Machine Learning Routines
• Utility Module
• Matrix Library
• Dense Matrix Library
• PBblas
Restricted Boltzmann Machine - RBM
v
h
v’
h’
w(t+1) = w(t) + α(vhT – v’h’T)
Stacking Boltzmann Machines -
Deep Belief Network
Final Output
Input Parameters
• Iris Data Sample
Supervised Boltzmann Machine –
Deep Belief Network
y – actual output
h – hidden samples
v – visible samples
Supervised Deep Belief Network
• Our ultimate goal is to implement a full-stack
of deep learning techniques on HPCC and
conduct a wide range of experiences to show
how powerful the deep learning engine on
HPCC will be.
Comparative Studies of HPCC and Hadoop
• HPCC and Hadoop clusters on CSCloud
– HPCC cluster :
• 5 thor nodes
• 5 roxie nodes
• 2 middle-ware nodes
• 1 landing zone node for uploading files
– Hadoop cluster
• 1 job-tracker / name-node
• 1 support system (Web UI, hadoop ecosystem, etc)
• 4 worker nodes: task-tracker / data-node
• Algorithms for comparison
– Text Processing Algorithms:
• Word Count
• Inverted Index
– Machine Learning Algorithms:
• Supervised Learning Algorithm - Random Forests
• Unsupervised Learning Algorithm - KMeans
– Graph Algorithm:
• Page Rank
Text Processing Algorithms
Data: Authorized version of Bible downloaded from http://av1611.com/
HPCC Implementation of Inverted Index:
http://www.dabhand.org/ECL/construct_a_simple_bible_search.htm (implemented by David
Bayliss, Chief Data Scientist and VP of LexisNexis Risk Solutions)
Hadoop Implementation of Inverted Index: Victor Guana and Joshua Davidson. On Comparing
Inverted Index Parallel Implementations Using Map/Reduce Technical Report. May 09, 2012.
Algorithm HPCC Hadoop
Word Count 1.003 seconds 23.466 seconds
Inverted Index 34.205 seconds 27.047 seconds
Machine Learning Algorithms
Data: KDD Network Intrusion Dataset
Reference: C. Blake and C. J. Merz. UCI Repository of machine learning databases. Irvine,
CA: University of California, Department of Information and Computer Science.
[http://www.ics.uci.edu/~mlearn/MLRepository.html]
Description:
– Total number of Instances : 4000000
– Used instances : 20394 randomly picked
– Number of Attributes : 42
Hadoop Libraries – Apache Mahout:
– Version : CDH-5.4.2-1 (Cloudera)
HPCC Machine Learning Library
Efficiency:
Algorithm HPCC Hadoop
Random Forests 1 minutes 50 seconds 18 seconds
KMeans 36.675 seconds 1 min 45 seconds
Graph Algorithm
Data Sets: Randomly generated graph with 25 nodes with maximum degree 5.
HPCC Implementation of Pageranking – Our team’s implementation
Hadoop Implementation Pageranking - http://blog.xebia.com/2011/09/wiki-
pagerank-with-hadoop/
Algorithm HPCC Hadoop
Page Rank 29.817 seconds 36 minutes
THANK YOU

More Related Content

What's hot

Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Koichi Shirahata
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
Andy Petrella
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
Geoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
 
Tutorial5
Tutorial5Tutorial5
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
rhatr
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasortpramodbiligiri
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
Xiao Qin
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 

What's hot (20)

Giraph
GiraphGiraph
Giraph
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
InternReport
InternReportInternReport
InternReport
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 

Viewers also liked

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedIn
Mitul Tiwari
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Frameworksixtyone
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
Hakka Labs
 
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Spark Summit
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
Neo4j
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
Encrypting json in mule
Encrypting json in muleEncrypting json in mule
Encrypting json in mule
Vijay Shan
 
12213217 pss7
12213217 pss712213217 pss7
12213217 pss7irenechau
 
Propuesta digital v2
Propuesta digital v2Propuesta digital v2
Propuesta digital v2
Alex Villaquiran Astaiza
 
What's your job?
What's your job?What's your job?
What's your job?
Freelancer
 
Lesson 03 - Native English PPT
Lesson 03 - Native English PPTLesson 03 - Native English PPT
Lesson 03 - Native English PPT
jdgieschen
 
LEVICK Weekly - Aug 3 2012
LEVICK Weekly - Aug 3 2012LEVICK Weekly - Aug 3 2012
LEVICK Weekly - Aug 3 2012
LEVICK
 
Bab vii spektek kejari
Bab vii spektek kejariBab vii spektek kejari
Bab vii spektek kejari
Mohammad Rovik
 
Presentation1
Presentation1Presentation1
Presentation1oky_put13
 
Welcome to business strategy
Welcome to business strategyWelcome to business strategy
Welcome to business strategyPrime University
 
Shapes of Bones
Shapes of BonesShapes of Bones
Shapes of BonesLen Lapore
 
2013 qld pga championship sponsorship invitation
2013 qld pga championship   sponsorship invitation2013 qld pga championship   sponsorship invitation
2013 qld pga championship sponsorship invitation
Andrew Allpass
 
Trabajo manejo de materiales
Trabajo manejo de materialesTrabajo manejo de materiales
Trabajo manejo de materiales
Maryelin Rubio
 

Viewers also liked (20)

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedIn
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Framework
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Encrypting json in mule
Encrypting json in muleEncrypting json in mule
Encrypting json in mule
 
Isus
IsusIsus
Isus
 
12213217 pss7
12213217 pss712213217 pss7
12213217 pss7
 
Propuesta digital v2
Propuesta digital v2Propuesta digital v2
Propuesta digital v2
 
What's your job?
What's your job?What's your job?
What's your job?
 
Lesson 03 - Native English PPT
Lesson 03 - Native English PPTLesson 03 - Native English PPT
Lesson 03 - Native English PPT
 
LEVICK Weekly - Aug 3 2012
LEVICK Weekly - Aug 3 2012LEVICK Weekly - Aug 3 2012
LEVICK Weekly - Aug 3 2012
 
Bab vii spektek kejari
Bab vii spektek kejariBab vii spektek kejari
Bab vii spektek kejari
 
Presentation1
Presentation1Presentation1
Presentation1
 
Welcome to business strategy
Welcome to business strategyWelcome to business strategy
Welcome to business strategy
 
Shapes of Bones
Shapes of BonesShapes of Bones
Shapes of Bones
 
Animals
AnimalsAnimals
Animals
 
2013 qld pga championship sponsorship invitation
2013 qld pga championship   sponsorship invitation2013 qld pga championship   sponsorship invitation
2013 qld pga championship sponsorship invitation
 
Trabajo manejo de materiales
Trabajo manejo de materialesTrabajo manejo de materiales
Trabajo manejo de materiales
 

Similar to Studies of HPCC Systems from Machine Learning Perspectives

ASE2010
ASE2010ASE2010
ASE2010
swy351
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
Apache MXNet
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
Geoffrey Fox
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Linh Ngo
 
Scientific Computing in the Cloud
Scientific Computing in the CloudScientific Computing in the Cloud
Scientific Computing in the Cloud
Adianto Wibisono
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
S. Diana Hu
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Databricks
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
Feng Li
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 

Similar to Studies of HPCC Systems from Machine Learning Perspectives (20)

ASE2010
ASE2010ASE2010
ASE2010
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Spark
SparkSpark
Spark
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
 
Scientific Computing in the Cloud
Scientific Computing in the CloudScientific Computing in the Cloud
Scientific Computing in the Cloud
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
HPCC Systems
 
Welcome
WelcomeWelcome
Welcome
HPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
HPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
HPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
Docker Support
Docker Support Docker Support
Docker Support
HPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
HPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Studies of HPCC Systems from Machine Learning Perspectives

  • 1. Studies of HPCC Systems® from Machine Learning Perspectives Ying Xie, Pooja Chenna, Ken Hoganson Department of Computer Science Kennesaw State University
  • 2. Outline • Leverage and Enhance Deep Learning Capability of HPCC Systems® • Comparative Studies of HPCC Systems® and Hadoop Systems
  • 3. Leverage and Enhance Deep Learning Capability of HPCC Systems® • Deep learning has been emerged as a recent breakthrough in the area of Machine Learning and revitalized research activities in Artificial Intelligence (AI). • The development of deep learning techniques has been primarily driven by big data analytics. • Since HPCC Systems is a well established big data platform, it is very important to leverage and enhance the deep learning capability of the HPCC Systems platform.
  • 4. Images are copied from neuralnetworksanddeeplearning.com • From Neural Network to Deep Learning A typical multilayer perceptron
  • 5. A deep neural architecture
  • 7. Images are copied from ufldl.stanford.edu • Stacked Autoencoder
  • 8. This code was implemented by Maryam Mousaarab Najafabadi from F
  • 10. Restricted Boltzmann Machine Images are copied from http://deeplearning4j.org/
  • 14. Use DBN to conduct dimension reduction
  • 15.
  • 16.
  • 17. • Breast Cancer Wisconsin (Original) Data Set
  • 18. Use DBN to conduct dimension reduction
  • 19.
  • 20.
  • 22. Use stacked auto-encoder to conduct dimension reduction
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. • Based on the visualization, we may even gain some good ideas what classification algorithm may work well
  • 28. Classification Algorithm #Miss-classified Instances Logistic 72 MLP 66 Simple Logistic 70 7NN 76 5NN 70 3NN 64
  • 29. Mapping Data to Higher Dimensional Spaces
  • 31.
  • 32. • Classification performance on higher dimensional space Original Space (4 Dim) (#Miss-classified Instances) Higher Dim (10 Dim) (#Miss-classified Instances) Logistic 171 157 MLP 160 166 Mapped to higher dimensional space by Stacked Auto-Encoder Mapped to higher dimensional space DBN
  • 34. • Classification performance on higher dimensional space Original Space (13 Dim) (#Miss-classified Instances) Higher Dim (15 Dim) (#Miss-classified Instances) Logistic 10 6 MLP 4 4 Original Space (13 Dim) (#Miss-classified Instances) Higher Dim (15 Dim) (#Miss-classified Instances) Logistic 10 7 MLP 4 2 Mapped to higher dimensional space by Stacked Auto-Encoder
  • 35. • For a given data set, we can explore which combination of deep learning mapping techniques, dimensional space, and supervised learning model may yield best classification result
  • 36. • For instance – Breast Cancer Data 3 Dim. Space (#Miss-classified Instances) 6 Dim. Space (#Miss-classified Instances) 9 Dim. Space (#Miss-classified Instances) 12 Dim. Space (#Miss-classified Instances) Logistic 22 21 24 25 MLP 21 22 30 21 3 Dim. Space (#Miss-classified Instances) 6 Dim. Space (#Miss-classified Instances) 9 Dim. Space (#Miss-classified Instances) 12 Dim. Space (#Miss-classified Instances) Logistic 22 23 24 25 MLP 20 21 30 22 Mapped to different dimensional spaces by Stacked Auto-Encoder Mapped to different dimensional spaces by DBN
  • 37. • Our next step: try to implement a meta supervised learning algorithm on HPCC – This algorithm will automatically map the given data to different dimensional spaces by using both stacked auto-encoder and DBN – Then classification models will be trained on all dimensional spaces in a distributed manner – Cross-validation will be used to select the best performed model as the final output.
  • 38. Implementation of Deep Belief Network on HPCC
  • 39. Our Implementations of Deep Learning on HPCC • Restricted Boltzmann Machine (RBM) with Contrastive Divergence learning algorithm • Deep Belief Network by stacking RBMs • Supervised Deep Belief Network
  • 40. Machine Learning Routines • Utility Module • Matrix Library • Dense Matrix Library • PBblas
  • 41. Restricted Boltzmann Machine - RBM v h v’ h’
  • 42. w(t+1) = w(t) + α(vhT – v’h’T)
  • 43.
  • 44. Stacking Boltzmann Machines - Deep Belief Network
  • 46. Supervised Boltzmann Machine – Deep Belief Network y – actual output h – hidden samples v – visible samples
  • 48. • Our ultimate goal is to implement a full-stack of deep learning techniques on HPCC and conduct a wide range of experiences to show how powerful the deep learning engine on HPCC will be.
  • 49. Comparative Studies of HPCC and Hadoop
  • 50. • HPCC and Hadoop clusters on CSCloud – HPCC cluster : • 5 thor nodes • 5 roxie nodes • 2 middle-ware nodes • 1 landing zone node for uploading files – Hadoop cluster • 1 job-tracker / name-node • 1 support system (Web UI, hadoop ecosystem, etc) • 4 worker nodes: task-tracker / data-node
  • 51. • Algorithms for comparison – Text Processing Algorithms: • Word Count • Inverted Index – Machine Learning Algorithms: • Supervised Learning Algorithm - Random Forests • Unsupervised Learning Algorithm - KMeans – Graph Algorithm: • Page Rank
  • 52. Text Processing Algorithms Data: Authorized version of Bible downloaded from http://av1611.com/ HPCC Implementation of Inverted Index: http://www.dabhand.org/ECL/construct_a_simple_bible_search.htm (implemented by David Bayliss, Chief Data Scientist and VP of LexisNexis Risk Solutions) Hadoop Implementation of Inverted Index: Victor Guana and Joshua Davidson. On Comparing Inverted Index Parallel Implementations Using Map/Reduce Technical Report. May 09, 2012. Algorithm HPCC Hadoop Word Count 1.003 seconds 23.466 seconds Inverted Index 34.205 seconds 27.047 seconds
  • 53. Machine Learning Algorithms Data: KDD Network Intrusion Dataset Reference: C. Blake and C. J. Merz. UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. [http://www.ics.uci.edu/~mlearn/MLRepository.html] Description: – Total number of Instances : 4000000 – Used instances : 20394 randomly picked – Number of Attributes : 42 Hadoop Libraries – Apache Mahout: – Version : CDH-5.4.2-1 (Cloudera) HPCC Machine Learning Library
  • 54. Efficiency: Algorithm HPCC Hadoop Random Forests 1 minutes 50 seconds 18 seconds KMeans 36.675 seconds 1 min 45 seconds
  • 55. Graph Algorithm Data Sets: Randomly generated graph with 25 nodes with maximum degree 5. HPCC Implementation of Pageranking – Our team’s implementation Hadoop Implementation Pageranking - http://blog.xebia.com/2011/09/wiki- pagerank-with-hadoop/ Algorithm HPCC Hadoop Page Rank 29.817 seconds 36 minutes