BBigig DData Analysis for Pageata Analysis for Page
Ranking using Map/ReduceRanking using Map/Reduce
R.Renuka,
R.Vidhya Priya,
IIIB.Sc., IT,
The S.F.R.College forWomen,
Sivakasi.
Overview
Introduction
What isBig Data!
Why Big Data?
4 V’sOf Big Data
Big DataAnalyticsTechnologies
Map/Reduce
Applications
CaseStudy
Conclusion
Introduction
Datahaveoutgrown thestorageand processing capabilitiesof
asinglehost.
Two fundamental challenges:
– how to storeand
– how to work with voluminousdatasizes, and,
– how to understand dataand turn it into acompetitive
advantage.
What isBig Data!
‘Big-data’ issimilar to ‘Small-data’, but bigger
But having databigger requiresdifferent approaches:
techniques, tools& architectures
To solve:
New problemsand old problemsin abetter way.
TheBlind men and theElephant
Why Big Data?
Key enablersfor thegrowth of “Big Data” are:
Increaseof Processing Power
Increaseof StorageCapacities
Availability of Data
4 V’sof Big Data
Big DataAnalyticsTechnologies
Hadoop
PLATFORA
WibiData
PIG
Hive
MapReduce
NoSQL databases
Column-oriented databases
Hadoop
Hadoop isadistributed filesystem and data
processing engine
Hadoop hastwo components:
– TheHadoop distributed filesystem (HDFS)
– TheMapReduceprograming.
Map / Reduce
A High level abstracted framework for distributed processing of large
datasets
Fault Tolerant , Parallelization
Computation consistsof two phases
Map
Reduce
A Master-Slavearchitecture
Computationsoccursin multipleslavenodes
And it triesto providedatalocality asmuch aspossible.
MR model
Map
– Processakey/valuepair to generateintermediatekey/value
pairs
Reduce
– Mergeall intermediatevaluesassociated with thesamekey
Usersimplement interfaceof two primary methods:
1. Map: (key1, val1) → (key2, val2)
2. Reduce: (key2, [val2]) → [val3]
Applications
Homeland Security
FinanceSmarter Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retails
CaseStudy
Conclusion
Real-time big data isn’t just a process for storing
petabytesor exabytesof datain adatawarehouse, It’s
about the ability to make better decisions and take
meaningful actionsat theright time.
Queries ??
Big data analysis using map/reduce

Big data analysis using map/reduce