SlideShare a Scribd company logo
1 of 21
Optimal Chain Matrix Multiplication Big Data
Perspective
Presented By
Pollab Kumar Roy
pollabroy.242@gmail.com
STUDY AND REPORT
Presentation Outline
 Introduction
 Big Data Overview
• Definition
• Three V presentation
• Application
 Introduction to Hadoop
• Architecture
• How it works
• Advantage
 MapReduce
• What is MapReduce?
• The Algorithm
• Example Scenario
 HDFS
 Matrix Multiplication
 Multi Way Join
 Proposed Work
 Conclusions
Dept. of ICT, MBSTU
2
Introduction
Matrix multiplication is widely used for many graph algorithms, such
as those that calculate the transitive closure. MapReduce is good to
implement multi way join operation for very large graphs and metrices.
 In this presentation we will see Big Data overview. Matrix
multiplication representation in database. Parallel multi way matrix
join in database with benefit and limitation.
 And a proposal for making chain multiplication more optimal with
raw join key.
Dept. of ICT, MBSTU
3
Big Data Overview
Big data is a term that refers to data sets whose size , complexity, and
rate of growth make them difficult to be captured, managed,
processed by conventional technologies.
 Big Data Source :
Dept. of ICT, MBSTU
4
Stock
Exchange
data
Social
Media
data
Black Box
data
Volume
Till 2003 was 5 billion GB.
Two days in 2011.
Every ten minutes in 2013
Variety
Structured: Relational data.
Semi Structured: XML data.
Unstructured: Word, PDF, Text,
Media Logs.
Velocity
Big Data Velocity deals with the
pace at which data flows in from sources and human interaction.
The three dimensions of Big Data
Dept. of ICT, MBSTU
5
Big Data Application Segments
Analytics
Predictive Modeling
Decision Processing
Behavior Analysis
Demographics
Data Warehouse
Hosting
Digitization/archive
Backup
Web 2.0
Engineering Collaborating
Design Optimization
Process Flow
Fluid Dynamics
3D Modeling
Analytics
Predictive Modeling
Decision Processing
Behavior Analysis
Demographics
Dept. of ICT, MBSTU
6
Introduction to Hadoop
 Hadoop: Apache open source framework written in java that allows
distributed processing of large datasets across clusters of computers using
simple programming models.
 Doug Cutting son’s toy.
 Hadoop Architecture :
Two major layers.
• Processing layer :
MapReduce
• Storage layer :
Hadoop Distributed
File System
Dept. of ICT, MBSTU
7
MapReduce
(Distributed Computation)
HDFS
(Distributed Storage)
YARN Framework Common Utilities
Introduction to Hadoop (cont.)
 How Hadoop works : Core tasks across a cluster of computers
• Data dividing into directories and files(128M/64M).
• Files are then distributed across various cluster nodes.
• HDFS, supervises the processing.
• Blocks are replicated.
• Performing sort between the map and reduce stages.
• Sending the sorted data to a certain computer.
 Advantage :
• Low-cost alternative to build bigger servers.
• Fault-tolerance and high availability.
• Dynamic clustering.
• Automatic data distribution and open source
Dept. of ICT, MBSTU
8
MapReduce
 What is MapReduce : A processing technique and a program
model for distributed computing based on java.
• Mapper
• Shuffle
• Reducer
• Java based
• Key Value
Dept. of ICT, MBSTU
9
MapReduce (cont.)
 The algorithm: Mapper Reducer Key Value
Dept. of ICT, MBSTU
10
MapReduce (cont.)
 Word Count Example :
Dept. of ICT, MBSTU
11
Apple Orange Mango
Orange Grapes Plum
Apple Orange Mango
Orange Grapes Plum
Apple Plum Mango
Apple Apple Plum
Apple Plum Mango
Apple Apple Plum
Apple,1
Orange ,1
Mango,1
Orange,1
Grapes ,1
Plum,1
Apple,1
Plum ,1
Mango,1
Apple,1
Apple ,1
Plum,1
Apple,1
Apple,1
Apple,1
Apple,1
Grapes ,1
Mango,1
Mango,1
Orange,1
Orange,1
Plum,1
Plum,1
Plum,1
Apple,4
Grapes,1
Mango,2
Orange,2
Plum,3
Apple,4
Grapes,1
Mango,2
Orange,2
Plum,3
input Files each line to individual mapper
map key value splitting sort, shuffle Produce key value pairs
Final output
Hadoop Distributed File System(HDFS)
 The HDFS is a distributed, scalable, and portable file-system written
in Java for the Hadoop framework.
 Feature :
• Distributed storage and processing
• Name Node
• Data Node
• Interface in Hadoop
• Streaming access
• Cluster status check
Dept. of ICT, MBSTU
12
Hadoop Distributed File System(cont.)
 Architecture : Data Node, Name Node, Block
Dept. of ICT, MBSTU
13
Name Node
Meta data(Name, replica…)
/home/foo/data, 3…
Client
Blocks
Replication
Read
D a t a n o d e s D a t a n o d e s
Rack 1 Rack 2
Matrix Multiplication (Via multi-way join)
 Usage : Widely used in many graph algorithms
• Transitive closure
• N-hop neighbors
 Join Operation :
• Matrices A [p×q] and B [q×r]
• C [p×r] = 𝐀 × 𝑩
• Each (i,k) th element of C is 𝒋=𝟏
𝒒
𝑨𝒊𝒋 × 𝑩𝒋𝒌
• A and B by relations 𝑹 𝟏 and 𝑹 𝟐 in database, attributes{row, col, val}
• 𝐀 × 𝑩 in terms of SQL
Dept. of ICT, MBSTU
14
User_1
User_2
User_7
User_3
User_5
User_6
User_4
Fig : Social Network
SELECT 𝑅1.row, 𝑅2.col, sum(𝑅1.val* 𝑅2.val)
FROM 𝑅1, 𝑅2
WHERE 𝑅1.col= 𝑅2.row
GROUP BY 𝑅1.row, 𝑅2.col
Matrix Multiplication (cont.)
Dept. of ICT, MBSTU
15
Fig : Database representation
Matrix Multiplication (cont.)
 Chain way join :
• Eq.(1) typical method,serial two-way join (S2). Separate MR
job, Intra-operation parallelism.
• Eq.(2) parallel two-way join (P2). Inter-operation parallelism.
and simultaneously
• Eq.(3) parallel m-way join (PM)
Dept. of ICT, MBSTU
16
((A *B) * (C *D))= (2)
(A * B * C * D)= (3)
(((A *B) * C) * D)= (1)A * B * C * D
A * B C * D
Matrix Multiplication (cont.)
 Parallel M-way join :
• S2(n-1) = 4
• P2 = 3
• PM = 2
Dept. of ICT, MBSTU
17
Input : Relations M1, M2,…. Mn representing matrices
1: LIST_Mnext <= M1, M2,…. Mn
2: while |LIST_Mnext|> 1 do
3: for I = 1 to |LIST_Mnext | do
4: if ( i mod m ) == 1 then
5: add Mi to LIST_Mleft
6: Mleft = Mi
7: else
8: add Mi to LIST_Mright ( Mleft )
9: end if
10: end for
11: LIST_Mnext = doMR-PM (LIST_Mleft,LIST_Mright )
12: end while
M1
M4 M5M2 M3M1
M1 M4
<1st MR job>
<2nd MR job> < result >
Fig : Example of parallel 3 way
Fig : Algorithm for PM join
[𝑙𝑜𝑔2
𝑛
]
[𝑙𝑜𝑔 𝑚
𝑛 ]
Matrix Multiplication (cont.)
 Efficiency of m-way join :
• MR job iteration
• Time
 Limitation :
• Join key number
• Greater network
and sorting overhead
Dept. of ICT, MBSTU
18
Fig : PM Join key
Future study and Proposed Work
 Future study :
• Amazon EC2
• Apache Whirr tools
• Larger graph datasets to matrix
• Hadoop , more Papers
 Proposed work :
• PM with the raw key.
• This improvement should reduce the number of duplications and
increase the diversity of the join key.
• MapReduce framework that does not perform sort operations in
mappers.
Dept. of ICT, MBSTU
19
Conclusion
In this presentation, i explained the multiplication of matrices
into a multi-way join operation s. The implementation of three
types algorithms: S2, P2, and PM.
Parallel m-way join operation can improve the performance of
the matrix chain multiplication process.
However, using the composite key introduces a number of
disadvantages, such as greater network and sorting overhead.
Finally i propose Parallel m-way join operation with raw key to
make it optimal.
Dept. of ICT, MBSTU
20
References
 Apache hadoop. Website. http://hadoop.apache.org
 http://www.sas.com/en_us/insights/big-data/hadoop.html
 Zikopoulos, P. C., Eaton, C., DeRoos, D., Deutsch, T., & Lapis, G.
(2012). Understanding big data. New York et al: McGraw-Hill.
 Myung, J., & Lee, S. G. (2012, February). Matrix chain
multiplication via multi-way join algorithms in MapReduce. In
Proceedings of the 6th International Conference on Ubiquitous
Information Management and Communication (p. 53). ACM.
 J. Dean and S. Ghemawat Map-Reduce: simplified data processing
on large clusters.
Dept. of ICT, MBSTU
21

More Related Content

What's hot

Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithmRicha Kumari
 
Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Pramit Kumar
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceUniversity of Technology - Iraq
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
 

What's hot (20)

Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Matrix Multiplication Report
Matrix Multiplication ReportMatrix Multiplication Report
Matrix Multiplication Report
 
Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Chap12 slides
Chap12 slidesChap12 slides
Chap12 slides
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Chap10 slides
Chap10 slidesChap10 slides
Chap10 slides
 
Chap9 slides
Chap9 slidesChap9 slides
Chap9 slides
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Chap11 slides
Chap11 slidesChap11 slides
Chap11 slides
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Digital Signal Processing Homework Help
Digital Signal Processing Homework HelpDigital Signal Processing Homework Help
Digital Signal Processing Homework Help
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Environmental Engineering Assignment Help
Environmental Engineering Assignment HelpEnvironmental Engineering Assignment Help
Environmental Engineering Assignment Help
 
MATLAB
MATLABMATLAB
MATLAB
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 

Similar to Optimal Chain Matrix Multiplication Big Data Perspective

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningGianvito Siciliano
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine LearningJanani C
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudBharat Rane
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...Editor IJCATR
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 

Similar to Optimal Chain Matrix Multiplication Big Data Perspective (20)

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine Learning
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Pregel
PregelPregel
Pregel
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Cross cloud map reduce for big data
Cross cloud map reduce for big dataCross cloud map reduce for big data
Cross cloud map reduce for big data
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
E031201032036
E031201032036E031201032036
E031201032036
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Optimal Chain Matrix Multiplication Big Data Perspective

  • 1. Optimal Chain Matrix Multiplication Big Data Perspective Presented By Pollab Kumar Roy pollabroy.242@gmail.com STUDY AND REPORT
  • 2. Presentation Outline  Introduction  Big Data Overview • Definition • Three V presentation • Application  Introduction to Hadoop • Architecture • How it works • Advantage  MapReduce • What is MapReduce? • The Algorithm • Example Scenario  HDFS  Matrix Multiplication  Multi Way Join  Proposed Work  Conclusions Dept. of ICT, MBSTU 2
  • 3. Introduction Matrix multiplication is widely used for many graph algorithms, such as those that calculate the transitive closure. MapReduce is good to implement multi way join operation for very large graphs and metrices.  In this presentation we will see Big Data overview. Matrix multiplication representation in database. Parallel multi way matrix join in database with benefit and limitation.  And a proposal for making chain multiplication more optimal with raw join key. Dept. of ICT, MBSTU 3
  • 4. Big Data Overview Big data is a term that refers to data sets whose size , complexity, and rate of growth make them difficult to be captured, managed, processed by conventional technologies.  Big Data Source : Dept. of ICT, MBSTU 4 Stock Exchange data Social Media data Black Box data
  • 5. Volume Till 2003 was 5 billion GB. Two days in 2011. Every ten minutes in 2013 Variety Structured: Relational data. Semi Structured: XML data. Unstructured: Word, PDF, Text, Media Logs. Velocity Big Data Velocity deals with the pace at which data flows in from sources and human interaction. The three dimensions of Big Data Dept. of ICT, MBSTU 5
  • 6. Big Data Application Segments Analytics Predictive Modeling Decision Processing Behavior Analysis Demographics Data Warehouse Hosting Digitization/archive Backup Web 2.0 Engineering Collaborating Design Optimization Process Flow Fluid Dynamics 3D Modeling Analytics Predictive Modeling Decision Processing Behavior Analysis Demographics Dept. of ICT, MBSTU 6
  • 7. Introduction to Hadoop  Hadoop: Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.  Doug Cutting son’s toy.  Hadoop Architecture : Two major layers. • Processing layer : MapReduce • Storage layer : Hadoop Distributed File System Dept. of ICT, MBSTU 7 MapReduce (Distributed Computation) HDFS (Distributed Storage) YARN Framework Common Utilities
  • 8. Introduction to Hadoop (cont.)  How Hadoop works : Core tasks across a cluster of computers • Data dividing into directories and files(128M/64M). • Files are then distributed across various cluster nodes. • HDFS, supervises the processing. • Blocks are replicated. • Performing sort between the map and reduce stages. • Sending the sorted data to a certain computer.  Advantage : • Low-cost alternative to build bigger servers. • Fault-tolerance and high availability. • Dynamic clustering. • Automatic data distribution and open source Dept. of ICT, MBSTU 8
  • 9. MapReduce  What is MapReduce : A processing technique and a program model for distributed computing based on java. • Mapper • Shuffle • Reducer • Java based • Key Value Dept. of ICT, MBSTU 9
  • 10. MapReduce (cont.)  The algorithm: Mapper Reducer Key Value Dept. of ICT, MBSTU 10
  • 11. MapReduce (cont.)  Word Count Example : Dept. of ICT, MBSTU 11 Apple Orange Mango Orange Grapes Plum Apple Orange Mango Orange Grapes Plum Apple Plum Mango Apple Apple Plum Apple Plum Mango Apple Apple Plum Apple,1 Orange ,1 Mango,1 Orange,1 Grapes ,1 Plum,1 Apple,1 Plum ,1 Mango,1 Apple,1 Apple ,1 Plum,1 Apple,1 Apple,1 Apple,1 Apple,1 Grapes ,1 Mango,1 Mango,1 Orange,1 Orange,1 Plum,1 Plum,1 Plum,1 Apple,4 Grapes,1 Mango,2 Orange,2 Plum,3 Apple,4 Grapes,1 Mango,2 Orange,2 Plum,3 input Files each line to individual mapper map key value splitting sort, shuffle Produce key value pairs Final output
  • 12. Hadoop Distributed File System(HDFS)  The HDFS is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.  Feature : • Distributed storage and processing • Name Node • Data Node • Interface in Hadoop • Streaming access • Cluster status check Dept. of ICT, MBSTU 12
  • 13. Hadoop Distributed File System(cont.)  Architecture : Data Node, Name Node, Block Dept. of ICT, MBSTU 13 Name Node Meta data(Name, replica…) /home/foo/data, 3… Client Blocks Replication Read D a t a n o d e s D a t a n o d e s Rack 1 Rack 2
  • 14. Matrix Multiplication (Via multi-way join)  Usage : Widely used in many graph algorithms • Transitive closure • N-hop neighbors  Join Operation : • Matrices A [p×q] and B [q×r] • C [p×r] = 𝐀 × 𝑩 • Each (i,k) th element of C is 𝒋=𝟏 𝒒 𝑨𝒊𝒋 × 𝑩𝒋𝒌 • A and B by relations 𝑹 𝟏 and 𝑹 𝟐 in database, attributes{row, col, val} • 𝐀 × 𝑩 in terms of SQL Dept. of ICT, MBSTU 14 User_1 User_2 User_7 User_3 User_5 User_6 User_4 Fig : Social Network SELECT 𝑅1.row, 𝑅2.col, sum(𝑅1.val* 𝑅2.val) FROM 𝑅1, 𝑅2 WHERE 𝑅1.col= 𝑅2.row GROUP BY 𝑅1.row, 𝑅2.col
  • 15. Matrix Multiplication (cont.) Dept. of ICT, MBSTU 15 Fig : Database representation
  • 16. Matrix Multiplication (cont.)  Chain way join : • Eq.(1) typical method,serial two-way join (S2). Separate MR job, Intra-operation parallelism. • Eq.(2) parallel two-way join (P2). Inter-operation parallelism. and simultaneously • Eq.(3) parallel m-way join (PM) Dept. of ICT, MBSTU 16 ((A *B) * (C *D))= (2) (A * B * C * D)= (3) (((A *B) * C) * D)= (1)A * B * C * D A * B C * D
  • 17. Matrix Multiplication (cont.)  Parallel M-way join : • S2(n-1) = 4 • P2 = 3 • PM = 2 Dept. of ICT, MBSTU 17 Input : Relations M1, M2,…. Mn representing matrices 1: LIST_Mnext <= M1, M2,…. Mn 2: while |LIST_Mnext|> 1 do 3: for I = 1 to |LIST_Mnext | do 4: if ( i mod m ) == 1 then 5: add Mi to LIST_Mleft 6: Mleft = Mi 7: else 8: add Mi to LIST_Mright ( Mleft ) 9: end if 10: end for 11: LIST_Mnext = doMR-PM (LIST_Mleft,LIST_Mright ) 12: end while M1 M4 M5M2 M3M1 M1 M4 <1st MR job> <2nd MR job> < result > Fig : Example of parallel 3 way Fig : Algorithm for PM join [𝑙𝑜𝑔2 𝑛 ] [𝑙𝑜𝑔 𝑚 𝑛 ]
  • 18. Matrix Multiplication (cont.)  Efficiency of m-way join : • MR job iteration • Time  Limitation : • Join key number • Greater network and sorting overhead Dept. of ICT, MBSTU 18 Fig : PM Join key
  • 19. Future study and Proposed Work  Future study : • Amazon EC2 • Apache Whirr tools • Larger graph datasets to matrix • Hadoop , more Papers  Proposed work : • PM with the raw key. • This improvement should reduce the number of duplications and increase the diversity of the join key. • MapReduce framework that does not perform sort operations in mappers. Dept. of ICT, MBSTU 19
  • 20. Conclusion In this presentation, i explained the multiplication of matrices into a multi-way join operation s. The implementation of three types algorithms: S2, P2, and PM. Parallel m-way join operation can improve the performance of the matrix chain multiplication process. However, using the composite key introduces a number of disadvantages, such as greater network and sorting overhead. Finally i propose Parallel m-way join operation with raw key to make it optimal. Dept. of ICT, MBSTU 20
  • 21. References  Apache hadoop. Website. http://hadoop.apache.org  http://www.sas.com/en_us/insights/big-data/hadoop.html  Zikopoulos, P. C., Eaton, C., DeRoos, D., Deutsch, T., & Lapis, G. (2012). Understanding big data. New York et al: McGraw-Hill.  Myung, J., & Lee, S. G. (2012, February). Matrix chain multiplication via multi-way join algorithms in MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (p. 53). ACM.  J. Dean and S. Ghemawat Map-Reduce: simplified data processing on large clusters. Dept. of ICT, MBSTU 21