SlideShare a Scribd company logo
Using MapReduce for
Large-scale Medical Image
Analysis
HISB 2012
Presented by : Roger Schaer - HES-SO Valais
Summary
Introduction
Methods
Results & Interpretation
Conclusions
2
Introduction
Introduction
Exponential growth of imaging data (past 20 years)
Year
Amountofimagesproduced
perdayattheHUG
4
Introduction (continued)
Mainly caused by :
Modern imaging techniques (3D, 4D) : Large files !
Large collections (available on the Internet)
Increasingly complex algorithms make processing
this data more challenging
Requires a lot of computation power, storage and
network bandwidth
5
Introduction (continued)
Flexible and scalable infrastructures are needed
Several approaches exist :
Single, powerful machine
Local cluster / grid
Alternative infrastructures (Graphics cards)
Cloud computing solutions
First two approaches have been tested and compared
6
Introduction (continued)
3 large-scale medical image processing use cases
Parameter optimization for Support Vector Machines
Content-based image feature extraction & indexing
3D texture feature extraction using the Riesz
transform
NOTE : I mostly handled the infrastructure
aspects !
7
Methods
Methods
MapReduce
Hadoop Cluster
Support Vector Machines
Image Indexing
Solid 3D Texture Analysis Using the Riesz Transform
9
MapReduce
MapReduce is a programming model
Developed by Google
Map Phase : Key/Value pair input, Intermediate
output
Reduce phase : For each intermediate key, process
the list of associated values
Trivial example : Word Count application
10
MapReduce : WordCount
11
MapReduce : WordCount
INPUT
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
INPUT
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
goodbye 1
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
goodbye 1
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
goodbye 1
hadoop 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
goodbye 1
hadoop 2
11
MapReduce : WordCount
#1 hello world
#2 goodbye world
#3 hello hadoop
#4 bye hadoop
...
hello 1
world 1
goodbye 1
world 1
hello 1
hadoop 1
bye 1
hadoop 1
INPUT MAP REDUCE
hello 2
world 2
goodbye 1
hadoop 2
bye 1
11
Hadoop
Apache’s implementation of MapReduce
Consists of
Distributed storage system : HDFS
Execution framework : Hadoop MapReduce
Master node which orchestrates the task distribution
Worker nodes which perform the tasks
Typical node runs a DataNode and TaskTracker
12
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
13
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
0
5
10
15
20
0 5 10 15 20 13
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
0
5
10
15
20
0 5 10 15 20 13
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
0
5
10
15
20
0 5 10 15 20 13
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
0
5
10
15
20
0 5 10 15 20
?
13
Support Vector Machines
Computes a decision boundary (hyperplane) that
separates inputs of different classes represented in a
given feature space transformed by a given kernel
The values of two parameters need to be adapted to
the data:
Cost C of errors
σ of the Gaussian kernel
0
5
10
15
20
0 5 10 15 20 13
SVM (continued)
Goal : find optimal value couple (C, σ) to train a SVM
Allowing best classification performance of 5 lung
texture patterns
Execution on 1 PC (without Hadoop) can take weeks
Due to extensive leave-one-patient-out cross-
validation with 86 patients
Parallelization : Split job by parameter value couples 14
Image Indexing
Vocabulary
File
Image Files
Feature
Extractor
Feature Vectors
Files
Bag of Visual
Words Factory
Index File
Two phases
Extract features from
images
Construct bags of
visual words by
quantization
Component-based /
Monolithic approaches
Parallelization : Each task
processes N images 15
Image Indexing
Vocabulary
File
Image Files
Feature
Extractor
Feature Vectors
Files
Bag of Visual
Words Factory
Index File
Two phases
Extract features from
images
Construct bags of
visual words by
quantization
Component-based /
Monolithic approaches
Parallelization : Each task
processes N images 15
Feature
Extractor
+
Bag of
Visual
Words
Factory
3D Texture Analysis (Riesz)
Features are extracted from 3D images (see below)
Parallelization : Each task processes N images
16
Results & Interpretation
Hadoop Cluster
Minimally invasive setup (>=2 free cores per node)
18
Support Vector Machines
Optimization : Longer tasks = bad performance
Because the optimization of the hyperplane is more
difficult to compute (more iterations needed)
After 2 patients (out of 86), check if : ti ≥ F · tref.
If time exceeds average (+margin), terminate task
19
Support Vector Machines
Black : tasks to be interrupted by the new algorithm
Optimized algorithm : ~50h → ~9h15min
All the best tasks (highest accuracy) are not killed 20
σ (Sigma)
C
(Cost)
Accuracy(%)
Image Indexing
1K IMAGES
Shows the calculation time in function of the # of tasks
Both experiments were executed using hadoop
Once on a single computer, then on our cluster of PCs 21
Image Indexing
1K IMAGES 10K IMAGES
Shows the calculation time in function of the # of tasks
Both experiments were executed using hadoop
Once on a single computer, then on our cluster of PCs 21
Image Indexing
1K IMAGES 10K IMAGES 100K IMAGES
Shows the calculation time in function of the # of tasks
Both experiments were executed using hadoop
Once on a single computer, then on our cluster of PCs 21
Riesz 3D
Particularity : code was a series of Matlab® scripts
Instead of rewriting the whole application :
Used Hadoop streaming feature (uses stdin/stdout)
To maximize scalability, GNU Octave was used
Great compatibility between Matlab® and Octave
22
Riesz 3D
Particularity : code was a series of Matlab® scripts
Instead of rewriting the whole application :
Used Hadoop streaming feature (uses stdin/stdout)
To maximize scalability, GNU Octave was used
Great compatibility between Matlab® and Octave
RESULTS
1 task (no Hadoop) 42 tasks (idle) 42 tasks (normal)
131h32m42s 6h29m51s 5h51m31s
22
Conclusions
Conclusions
MapReduce is
Flexible (worked with very varied use cases)
Easy to use (2-phase programming model is simple)
Efficient (>=20x speedup for all use cases)
Hadoop is
Easy to deploy & manage
User-friendly (nice Web UIs)
24
Conclusions (continued)
Speedups for the different use cases
SVMs
Image
Indexing
3D Feature
Extraction
Single task 990h* 21h* 131h30
42 tasks on
hadoop
50h / 9h15** 1h 5h50
Speedup 20x / 107x** 21x 22.5x
* estimation ** using the optimized algorithm 25
Lessons Learned
It is important to use physically distributed resources
Overloading a single machine hurts performance
Data locality notably speeds up jobs
Not every application is infinitely scalable
Performance improvements level off at some point
26
Future work
Take it to the next level : The Cloud
Amazon Elastic Cloud Compute (IaaS)
Amazon Elastic MapReduce (PaaS)
Cloudbursting
Use both local resources + Cloud (for peak usage)
27
Thank you ! Questions ?

More Related Content

Viewers also liked

15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
Sven Meys
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Daqing Zhao
 
Rapid JCR applications development with Sling
Rapid JCR applications development with SlingRapid JCR applications development with Sling
Rapid JCR applications development with Sling
Bertrand Delacretaz
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
StampedeCon
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
DataWorks Summit
 
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
Health Catalyst
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
David Feinleib
 
Large-scale social media analysis with Hadoop
Large-scale social media analysis with HadoopLarge-scale social media analysis with Hadoop
Large-scale social media analysis with Hadoop
jakehofman
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
Harsh Kishore Mishra
 
Big image analytics for (Re-) insurer
 Big image analytics for (Re-) insurer Big image analytics for (Re-) insurer
Big image analytics for (Re-) insurer
Flavio Trolese
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Health Catalyst
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 

Viewers also liked (18)

15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Rapid JCR applications development with Sling
Rapid JCR applications development with SlingRapid JCR applications development with Sling
Rapid JCR applications development with Sling
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Large-scale social media analysis with Hadoop
Large-scale social media analysis with HadoopLarge-scale social media analysis with Hadoop
Large-scale social media analysis with Hadoop
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big image analytics for (Re-) insurer
 Big image analytics for (Re-) insurer Big image analytics for (Re-) insurer
Big image analytics for (Re-) insurer
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Similar to Using MapReduce for Large–scale Medical Image Analysis

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
Bhupesh Chawda
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Kelly Technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
2 hadoop
2  hadoop2  hadoop
2 hadoop
Akram Al-Kouz
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
Yogender Singh
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Somnath Mazumdar
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
attilacsordas
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
David Gleich
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
Vibrant Technologies & Computers
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 

Similar to Using MapReduce for Large–scale Medical Image Analysis (20)

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
2 hadoop
2  hadoop2  hadoop
2 hadoop
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 

More from Institute of Information Systems (HES-SO)

MIE20232.pptx
MIE20232.pptxMIE20232.pptx
Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
Institute of Information Systems (HES-SO)
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Institute of Information Systems (HES-SO)
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Institute of Information Systems (HES-SO)
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
Institute of Information Systems (HES-SO)
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...
Institute of Information Systems (HES-SO)
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Institute of Information Systems (HES-SO)
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Institute of Information Systems (HES-SO)
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Institute of Information Systems (HES-SO)
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Institute of Information Systems (HES-SO)
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
Institute of Information Systems (HES-SO)
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
Institute of Information Systems (HES-SO)
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
Institute of Information Systems (HES-SO)
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
Institute of Information Systems (HES-SO)
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
Institute of Information Systems (HES-SO)
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
Institute of Information Systems (HES-SO)
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
Institute of Information Systems (HES-SO)
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
Institute of Information Systems (HES-SO)
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
Institute of Information Systems (HES-SO)
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
Institute of Information Systems (HES-SO)
 

More from Institute of Information Systems (HES-SO) (20)

MIE20232.pptx
MIE20232.pptxMIE20232.pptx
MIE20232.pptx
 
Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 

Using MapReduce for Large–scale Medical Image Analysis

  • 1. Using MapReduce for Large-scale Medical Image Analysis HISB 2012 Presented by : Roger Schaer - HES-SO Valais
  • 4. Introduction Exponential growth of imaging data (past 20 years) Year Amountofimagesproduced perdayattheHUG 4
  • 5. Introduction (continued) Mainly caused by : Modern imaging techniques (3D, 4D) : Large files ! Large collections (available on the Internet) Increasingly complex algorithms make processing this data more challenging Requires a lot of computation power, storage and network bandwidth 5
  • 6. Introduction (continued) Flexible and scalable infrastructures are needed Several approaches exist : Single, powerful machine Local cluster / grid Alternative infrastructures (Graphics cards) Cloud computing solutions First two approaches have been tested and compared 6
  • 7. Introduction (continued) 3 large-scale medical image processing use cases Parameter optimization for Support Vector Machines Content-based image feature extraction & indexing 3D texture feature extraction using the Riesz transform NOTE : I mostly handled the infrastructure aspects ! 7
  • 9. Methods MapReduce Hadoop Cluster Support Vector Machines Image Indexing Solid 3D Texture Analysis Using the Riesz Transform 9
  • 10. MapReduce MapReduce is a programming model Developed by Google Map Phase : Key/Value pair input, Intermediate output Reduce phase : For each intermediate key, process the list of associated values Trivial example : Word Count application 10
  • 13. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... INPUT 11
  • 14. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... INPUT MAP 11
  • 15. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... INPUT MAP 11
  • 16. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 INPUT MAP 11
  • 17. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 INPUT MAP 11
  • 18. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 INPUT MAP 11
  • 19. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 INPUT MAP 11
  • 20. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 INPUT MAP 11
  • 21. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 INPUT MAP 11
  • 22. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 INPUT MAP 11
  • 23. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 INPUT MAP 11
  • 24. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 INPUT MAP 11
  • 25. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 INPUT MAP 11
  • 26. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP 11
  • 27. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE 11
  • 28. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE 11
  • 29. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 11
  • 30. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 11
  • 31. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 11
  • 32. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 11
  • 33. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 goodbye 1 11
  • 34. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 goodbye 1 11
  • 35. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 goodbye 1 hadoop 2 11
  • 36. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 goodbye 1 hadoop 2 11
  • 37. MapReduce : WordCount #1 hello world #2 goodbye world #3 hello hadoop #4 bye hadoop ... hello 1 world 1 goodbye 1 world 1 hello 1 hadoop 1 bye 1 hadoop 1 INPUT MAP REDUCE hello 2 world 2 goodbye 1 hadoop 2 bye 1 11
  • 38. Hadoop Apache’s implementation of MapReduce Consists of Distributed storage system : HDFS Execution framework : Hadoop MapReduce Master node which orchestrates the task distribution Worker nodes which perform the tasks Typical node runs a DataNode and TaskTracker 12
  • 39. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 13
  • 40. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 0 5 10 15 20 0 5 10 15 20 13
  • 41. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 0 5 10 15 20 0 5 10 15 20 13
  • 42. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 0 5 10 15 20 0 5 10 15 20 13
  • 43. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 0 5 10 15 20 0 5 10 15 20 ? 13
  • 44. Support Vector Machines Computes a decision boundary (hyperplane) that separates inputs of different classes represented in a given feature space transformed by a given kernel The values of two parameters need to be adapted to the data: Cost C of errors σ of the Gaussian kernel 0 5 10 15 20 0 5 10 15 20 13
  • 45. SVM (continued) Goal : find optimal value couple (C, σ) to train a SVM Allowing best classification performance of 5 lung texture patterns Execution on 1 PC (without Hadoop) can take weeks Due to extensive leave-one-patient-out cross- validation with 86 patients Parallelization : Split job by parameter value couples 14
  • 46. Image Indexing Vocabulary File Image Files Feature Extractor Feature Vectors Files Bag of Visual Words Factory Index File Two phases Extract features from images Construct bags of visual words by quantization Component-based / Monolithic approaches Parallelization : Each task processes N images 15
  • 47. Image Indexing Vocabulary File Image Files Feature Extractor Feature Vectors Files Bag of Visual Words Factory Index File Two phases Extract features from images Construct bags of visual words by quantization Component-based / Monolithic approaches Parallelization : Each task processes N images 15 Feature Extractor + Bag of Visual Words Factory
  • 48. 3D Texture Analysis (Riesz) Features are extracted from 3D images (see below) Parallelization : Each task processes N images 16
  • 50. Hadoop Cluster Minimally invasive setup (>=2 free cores per node) 18
  • 51. Support Vector Machines Optimization : Longer tasks = bad performance Because the optimization of the hyperplane is more difficult to compute (more iterations needed) After 2 patients (out of 86), check if : ti ≥ F · tref. If time exceeds average (+margin), terminate task 19
  • 52. Support Vector Machines Black : tasks to be interrupted by the new algorithm Optimized algorithm : ~50h → ~9h15min All the best tasks (highest accuracy) are not killed 20 σ (Sigma) C (Cost) Accuracy(%)
  • 53. Image Indexing 1K IMAGES Shows the calculation time in function of the # of tasks Both experiments were executed using hadoop Once on a single computer, then on our cluster of PCs 21
  • 54. Image Indexing 1K IMAGES 10K IMAGES Shows the calculation time in function of the # of tasks Both experiments were executed using hadoop Once on a single computer, then on our cluster of PCs 21
  • 55. Image Indexing 1K IMAGES 10K IMAGES 100K IMAGES Shows the calculation time in function of the # of tasks Both experiments were executed using hadoop Once on a single computer, then on our cluster of PCs 21
  • 56. Riesz 3D Particularity : code was a series of Matlab® scripts Instead of rewriting the whole application : Used Hadoop streaming feature (uses stdin/stdout) To maximize scalability, GNU Octave was used Great compatibility between Matlab® and Octave 22
  • 57. Riesz 3D Particularity : code was a series of Matlab® scripts Instead of rewriting the whole application : Used Hadoop streaming feature (uses stdin/stdout) To maximize scalability, GNU Octave was used Great compatibility between Matlab® and Octave RESULTS 1 task (no Hadoop) 42 tasks (idle) 42 tasks (normal) 131h32m42s 6h29m51s 5h51m31s 22
  • 59. Conclusions MapReduce is Flexible (worked with very varied use cases) Easy to use (2-phase programming model is simple) Efficient (>=20x speedup for all use cases) Hadoop is Easy to deploy & manage User-friendly (nice Web UIs) 24
  • 60. Conclusions (continued) Speedups for the different use cases SVMs Image Indexing 3D Feature Extraction Single task 990h* 21h* 131h30 42 tasks on hadoop 50h / 9h15** 1h 5h50 Speedup 20x / 107x** 21x 22.5x * estimation ** using the optimized algorithm 25
  • 61. Lessons Learned It is important to use physically distributed resources Overloading a single machine hurts performance Data locality notably speeds up jobs Not every application is infinitely scalable Performance improvements level off at some point 26
  • 62. Future work Take it to the next level : The Cloud Amazon Elastic Cloud Compute (IaaS) Amazon Elastic MapReduce (PaaS) Cloudbursting Use both local resources + Cloud (for peak usage) 27
  • 63. Thank you ! Questions ?