SlideShare a Scribd company logo
1 of 28
Big Data – Project Presentation
By:
Yonas Gidey -985054
Submitted to
Professor Prem Nair
April 25, 2017
Relative Frequency-Project
1. Pseudo code for Pair Approach Algorithm
2. Java code for Pair Approach Algorithm
3. Result of Pair Approach Algorithm
4. Pseudo code for Stripe Approach Algorithm
5. Java code for Stripe Approach
6. Result of Stripe Approach Algorithm
7. Pseudo code for Hybrid Approach Algorithm
8. Java code for Hybrid Approach Algorithm
9. Result of Hybrid Approach Algorithm
10. Comparison
11. Spark Project
Steps for Implementing the pairs approach
I. For each line passed in when the map function is called, we will
split on spaces creating a String Array.
II. The next step would be to construct two loops.
III. The outer loop will iterate over each word in the array and the
inner loop will iterate over the “neighbors” of the current word.
IV. The number of iterations for the inner loop is dictated by the
size of our “window” to capture neighbors of the current word.
V. At the bottom of each iteration in the inner loop, we will emit a
WordPair object (consisting of the current word on the left and
the neighbor word on the right) as the key, and a count of one
as the value
VI. The Reducer for the Pairs implementation will simply sum all of
the numbers for the given WordPair key
1. Pseudo code for PAIR Approach
Class Mapper{
method map(inKey,text )
{
for each word w in text
for each Neighbour n of word w
Pair p= Pair(w,n)
emit(p,1)
emit(*,1)
}
}
Class reducer {
method Reduce(pair p; counts [c1; c2; …])
s = 0
count=0
for all pair(w,*) in p do
s=s+1;
for all count c in pair(w, u) in counts [c1; c2; …]
do
count=count+c
Emit(pair p; count / s)
}
2. Java code for PAIR approach
2. Java code for PAIR approach
Hadoop Commands
• #!/bin/sh
• hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative-
frequency/pair /user/cloudera/relative-frequency/pair/input
• hadoop fs -put files/input.txt /user/cloudera/relative-frequency/pair/input
• hadoop fs -rm -r /user/cloudera/relative-frequency/pair/output
• hadoop jar files/pairsrf.jar
project.crystalBall.pairsApproachAlgorithm.PairRelativeFrequencyDriver
/user/cloudera/relative-frequency/pair/input /user/cloudera/relative-
frequency/pair/output
• hadoop fs -cat /user/cloudera/relative-frequency/pair/output/*
3. Result of PAIR approach
Steps for Stripes implementation
I. The approach is the same to Pairs, but all of the “neighbor” words are
collected in a HashMap with the neighbor word as the key and an integer
count as the value.
II. When all of the values have been collected for a given word (the bottom
of the outer loop), the word and the hashmap are emitted.
III. The Reducer for the Stripes approach iterates over a collection of maps,
then for each map, iterate over all of the values in the map:
4.Pseudo code for STRIPE approach
Class mapper
method Map(docid a; doc d)
H = new AssociativeArray
for all term w in doc d do
for all term u in Neighbors(w) do
H{u} = H{u} + 1
for all term u in H do
Emit(Term w; Stripe H)
Class Reducer
method Reduce(term w; stripes [H1;H2;H3;:])
Hf = new AssociativeArray
for all stripe H in stripes [H1;H2;H3; …] do
Sum (Hf; H).
//Calulate frequencies
int count = 0;
Hnew = new AssociativeArray
for each u in Hf do
count+=Hf(u);
for each u inHf do
Hnew{u}=Hf{u}/count;
Emit (term w, stripe Hnew);
}
5. Java code for Stripe Approach
5. Java code for Stripe Approach
Hadoop Commands
• hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative-
frequency/stripe /user/cloudera/relative-frequency/stripe/input
• hadoop fs -put files/input.txt /user/cloudera/relative-frequency/stripe/input
• hadoop fs -rm -r /user/cloudera/relative-frequency/stripe/output
• hadoop jar files/stripesrf.jar
project.crystalBall.stripesApproachAlgorithm.StripeRelativeFrequencyDriver
/user/cloudera/relative-frequency/stripe/input /user/cloudera/relative-
frequency/stripe/output
• hadoop fs -cat /user/cloudera/relative-frequency/stripe/output/*
6. Result of Stripe approach
7. Pseudo Code for HYBRID approach
Class Mapper
method map(inKey,text )
for each word w in text
for each Neighbour n of word w
Pair p= Pair(w,n)
emit(p,1)
Class Reducer{
Hf=new Associative Array
last =empty;
method Reduce(pair p(w,u); counts [c1;c2;c3; : : :]){
Count=0
for all count c in pair(w, u) in counts [c1; c2; …] do
Hf{u} = Hf{u}+c //do Stripe for all Pair for term w
for all u in Hf do
count += Hf{u} //all occurring for term w
for all term u in Hf do
Hf{u}=Hf{u} /count //element wise division
if(last==w)
Emit (term w; stripe Hf);
Clear Hf; }
method clear(){
emit(last, Hf); } }
8. Java code for HYBRID approach
8. Java code for HYBRID Approach
Hadoop Commands
• hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative-
frequency/pair-stripe /user/cloudera/relative-frequency/pair-stripe/input
• hadoop fs -put files/input.txt /user/cloudera/relative-frequency/pair-
stripe/input
• hadoop fs -rm -r /user/cloudera/relative-frequency/pair-stripe/output
• hadoop jar files/pairsStriperf.jar
project.crystalBall.pairsStrpesHybridAlgorithm.PairStripeRelativeFrequencyDriv
er /user/cloudera/relative-frequency/pair-stripe/input
/user/cloudera/relative-frequency/pair-stripe/output
• hadoop fs -cat /user/cloudera/relative-frequency/pair-stripe/output/*
9. Result of HYBRID approach
10. Comparison
Spark Project
Statement of the problem
In this project I want to analyze some Apache access log
files using spark framework and Scala programming
language.
1. In this project I tried to analyze logs collected from website
by analyzing the request coming from users
2. Analyze the response code and how many of them are “page
not found”, “OK”, “Unauthorized” and etc…
HTTP Status 200 Success 200 OK 301 Moved Permanently
HTTP Error 401 Unauthorized HTTP status 503 Service unavailable
HTTP status 403 Forbidden HTTP status 500 Internal Server Error
HTTP status 404 Not Found
I processed the log files using spark and came out with
outputs. And much more analysis can be done on demand.
Scala Code
Scala Code
Details
• Execute Spark job by handover jar file, main class name, input location and
output location via following terminal commands.
• hdfsdfs –mkdir spark/input
• hdfsdfs –put input spark
• spark-submit --class sparkPackage --master local SparkProject.jar
spark/input spark/output
Spark Output
Thank you!

More Related Content

What's hot

GitRecruit final 1
GitRecruit final 1GitRecruit final 1
GitRecruit final 1Yinghan Fu
 
DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright Andrei Alexandrescu
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage CollectionEelco Visser
 
Boosting command line experience with python and awk
Boosting command line experience with python and awkBoosting command line experience with python and awk
Boosting command line experience with python and awkKirill Pavlov
 
Introduction to cython: example of GCoptimization
Introduction to cython: example of GCoptimizationIntroduction to cython: example of GCoptimization
Introduction to cython: example of GCoptimizationKevin Keraudren
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)Sławomir Zborowski
 
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexRob Skillington
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetupsource{d}
 
The Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLThe Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLUlisses Costa
 

What's hot (19)

Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
 
GitRecruit final 1
GitRecruit final 1GitRecruit final 1
GitRecruit final 1
 
DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright
 
Rcpp
RcppRcpp
Rcpp
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Performance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. JanowskiPerformance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. Janowski
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage Collection
 
Boosting command line experience with python and awk
Boosting command line experience with python and awkBoosting command line experience with python and awk
Boosting command line experience with python and awk
 
Introduction to cython: example of GCoptimization
Introduction to cython: example of GCoptimizationIntroduction to cython: example of GCoptimization
Introduction to cython: example of GCoptimization
 
All functions
All functionsAll functions
All functions
 
R and cpp
R and cppR and cpp
R and cpp
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
Python Bindings Overview
Python Bindings OverviewPython Bindings Overview
Python Bindings Overview
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
The Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLThe Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDL
 

Similar to Bigdata presentation

Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaGlenn K. Lockwood
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkbhargavi804095
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Chris Fregly
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?Ronny
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkJivan Nepali
 
Apache spark session
Apache spark sessionApache spark session
Apache spark sessionknowbigdata
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Dataflow: Declarative concurrency in Ruby
Dataflow: Declarative concurrency in RubyDataflow: Declarative concurrency in Ruby
Dataflow: Declarative concurrency in RubyLarry Diehl
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic JourneySneha Inguva
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to RakuSimon Proctor
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyUri Laserson
 
Avro Data | Washington DC HUG
Avro Data | Washington DC HUGAvro Data | Washington DC HUG
Avro Data | Washington DC HUGCloudera, Inc.
 

Similar to Bigdata presentation (20)

Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop framework
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
 
Apache spark session
Apache spark sessionApache spark session
Apache spark session
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Dataflow: Declarative concurrency in Ruby
Dataflow: Declarative concurrency in RubyDataflow: Declarative concurrency in Ruby
Dataflow: Declarative concurrency in Ruby
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic Journey
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to Raku
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
Avro Data | Washington DC HUG
Avro Data | Washington DC HUGAvro Data | Washington DC HUG
Avro Data | Washington DC HUG
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Bigdata presentation

  • 1. Big Data – Project Presentation By: Yonas Gidey -985054 Submitted to Professor Prem Nair April 25, 2017
  • 2. Relative Frequency-Project 1. Pseudo code for Pair Approach Algorithm 2. Java code for Pair Approach Algorithm 3. Result of Pair Approach Algorithm 4. Pseudo code for Stripe Approach Algorithm 5. Java code for Stripe Approach 6. Result of Stripe Approach Algorithm 7. Pseudo code for Hybrid Approach Algorithm 8. Java code for Hybrid Approach Algorithm 9. Result of Hybrid Approach Algorithm 10. Comparison 11. Spark Project
  • 3. Steps for Implementing the pairs approach I. For each line passed in when the map function is called, we will split on spaces creating a String Array. II. The next step would be to construct two loops. III. The outer loop will iterate over each word in the array and the inner loop will iterate over the “neighbors” of the current word. IV. The number of iterations for the inner loop is dictated by the size of our “window” to capture neighbors of the current word. V. At the bottom of each iteration in the inner loop, we will emit a WordPair object (consisting of the current word on the left and the neighbor word on the right) as the key, and a count of one as the value VI. The Reducer for the Pairs implementation will simply sum all of the numbers for the given WordPair key
  • 4. 1. Pseudo code for PAIR Approach Class Mapper{ method map(inKey,text ) { for each word w in text for each Neighbour n of word w Pair p= Pair(w,n) emit(p,1) emit(*,1) } } Class reducer { method Reduce(pair p; counts [c1; c2; …]) s = 0 count=0 for all pair(w,*) in p do s=s+1; for all count c in pair(w, u) in counts [c1; c2; …] do count=count+c Emit(pair p; count / s) }
  • 5. 2. Java code for PAIR approach
  • 6. 2. Java code for PAIR approach
  • 7. Hadoop Commands • #!/bin/sh • hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative- frequency/pair /user/cloudera/relative-frequency/pair/input • hadoop fs -put files/input.txt /user/cloudera/relative-frequency/pair/input • hadoop fs -rm -r /user/cloudera/relative-frequency/pair/output • hadoop jar files/pairsrf.jar project.crystalBall.pairsApproachAlgorithm.PairRelativeFrequencyDriver /user/cloudera/relative-frequency/pair/input /user/cloudera/relative- frequency/pair/output • hadoop fs -cat /user/cloudera/relative-frequency/pair/output/*
  • 8. 3. Result of PAIR approach
  • 9. Steps for Stripes implementation I. The approach is the same to Pairs, but all of the “neighbor” words are collected in a HashMap with the neighbor word as the key and an integer count as the value. II. When all of the values have been collected for a given word (the bottom of the outer loop), the word and the hashmap are emitted. III. The Reducer for the Stripes approach iterates over a collection of maps, then for each map, iterate over all of the values in the map:
  • 10. 4.Pseudo code for STRIPE approach Class mapper method Map(docid a; doc d) H = new AssociativeArray for all term w in doc d do for all term u in Neighbors(w) do H{u} = H{u} + 1 for all term u in H do Emit(Term w; Stripe H) Class Reducer method Reduce(term w; stripes [H1;H2;H3;:]) Hf = new AssociativeArray for all stripe H in stripes [H1;H2;H3; …] do Sum (Hf; H). //Calulate frequencies int count = 0; Hnew = new AssociativeArray for each u in Hf do count+=Hf(u); for each u inHf do Hnew{u}=Hf{u}/count; Emit (term w, stripe Hnew); }
  • 11. 5. Java code for Stripe Approach
  • 12. 5. Java code for Stripe Approach
  • 13. Hadoop Commands • hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative- frequency/stripe /user/cloudera/relative-frequency/stripe/input • hadoop fs -put files/input.txt /user/cloudera/relative-frequency/stripe/input • hadoop fs -rm -r /user/cloudera/relative-frequency/stripe/output • hadoop jar files/stripesrf.jar project.crystalBall.stripesApproachAlgorithm.StripeRelativeFrequencyDriver /user/cloudera/relative-frequency/stripe/input /user/cloudera/relative- frequency/stripe/output • hadoop fs -cat /user/cloudera/relative-frequency/stripe/output/*
  • 14. 6. Result of Stripe approach
  • 15. 7. Pseudo Code for HYBRID approach Class Mapper method map(inKey,text ) for each word w in text for each Neighbour n of word w Pair p= Pair(w,n) emit(p,1)
  • 16. Class Reducer{ Hf=new Associative Array last =empty; method Reduce(pair p(w,u); counts [c1;c2;c3; : : :]){ Count=0 for all count c in pair(w, u) in counts [c1; c2; …] do Hf{u} = Hf{u}+c //do Stripe for all Pair for term w for all u in Hf do count += Hf{u} //all occurring for term w for all term u in Hf do Hf{u}=Hf{u} /count //element wise division if(last==w) Emit (term w; stripe Hf); Clear Hf; } method clear(){ emit(last, Hf); } }
  • 17. 8. Java code for HYBRID approach
  • 18. 8. Java code for HYBRID Approach
  • 19. Hadoop Commands • hadoop fs -mkdir /user/cloudera/relative-frequency /user/cloudera/relative- frequency/pair-stripe /user/cloudera/relative-frequency/pair-stripe/input • hadoop fs -put files/input.txt /user/cloudera/relative-frequency/pair- stripe/input • hadoop fs -rm -r /user/cloudera/relative-frequency/pair-stripe/output • hadoop jar files/pairsStriperf.jar project.crystalBall.pairsStrpesHybridAlgorithm.PairStripeRelativeFrequencyDriv er /user/cloudera/relative-frequency/pair-stripe/input /user/cloudera/relative-frequency/pair-stripe/output • hadoop fs -cat /user/cloudera/relative-frequency/pair-stripe/output/*
  • 20. 9. Result of HYBRID approach
  • 23. Statement of the problem In this project I want to analyze some Apache access log files using spark framework and Scala programming language. 1. In this project I tried to analyze logs collected from website by analyzing the request coming from users 2. Analyze the response code and how many of them are “page not found”, “OK”, “Unauthorized” and etc… HTTP Status 200 Success 200 OK 301 Moved Permanently HTTP Error 401 Unauthorized HTTP status 503 Service unavailable HTTP status 403 Forbidden HTTP status 500 Internal Server Error HTTP status 404 Not Found I processed the log files using spark and came out with outputs. And much more analysis can be done on demand.
  • 26. Details • Execute Spark job by handover jar file, main class name, input location and output location via following terminal commands. • hdfsdfs –mkdir spark/input • hdfsdfs –put input spark • spark-submit --class sparkPackage --master local SparkProject.jar spark/input spark/output