SlideShare a Scribd company logo
1 of 1103
Download to read offline
Vu Pham
Introduction to Big Data
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Introduction to Big Data
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will discuss a brief introduction to
Big Data: Why Big Data, Where did it come from?,
Challenges and applications of Big Data, Characteristics
of Big Data i.e. Volume, Velocity, Variety and more V’s.
Big Data Computing Introduction to Big Data
Vu Pham
What’s Big Data?
Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing applications.
The challenges include capture, curation, storage, search, sharing, transfer,
analysis, and visualization.
The trend to larger data sets is due to the additional information derivable
from analysis of a single large set of related data, as compared to separate
smaller sets with the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of research, prevent
diseases, link legal citations, combat crime, and determine real-time
roadway traffic conditions.”
Big Data Computing Introduction to Big Data
Vu Pham
Walmart handles 1 million customer transactions/hour.
Facebook handles 40 billion photos from its user base!
Facebook inserts 500 terabytes of new data every day.
Facebook stores, accesses, and analyzes 30+ Petabytes of user
generated data.
A flight generates 240 terabytes of flight data in 6-8 hours of flight.
More than 5 billion people are calling, texting, tweeting and
browsing on mobile phones worldwide.
Decoding the human genome originally took 10 years to process;
now it can be achieved in one week.8
The largest AT&T database boasts titles including the largest volume
of data in one unique database (312 terabytes) and the second
largest number of rows in a unique database (1.9 trillion), which
comprises AT&T’s extensive calling records.
Facts and Figures
Big Data Computing Introduction to Big Data
Vu Pham
Byte: One grain of rice
KB(3): One cup of rice:
MB (6): 8 bags of rice: Desktop
GB (9): 3 Semi trucks of rice:
TB (12): 2 container ships of rice Internet
PB (15): Blankets ½ of Jaipur
Exabyte (18): Blankets West coast Big Data
Or 1/4th of India
Zettabyte (21): Fills Pacific Ocean Future
Yottabyte(24): An earth-sized rice bowl
Brontobyte (27): Astronomical size
An Insight
Big Data Computing Introduction to Big Data
Vu Pham
What’s making so much data?
Sources: People, machine, organization: Ubiquitous
computing
More people carrying data-generating devices
(Mobile phones with facebook, GPS, Cameras, etc.)
Data on the Internet:
Internet live stats
http://www.internetlivestats.com/
Big Data Computing Introduction to Big Data
Vu Pham
Source of Data Generation
2+ billion
people
on the
Web by
end 2011
30 billion RFID tags
today
(1.3B in 2005)
4.6 billion
camera
phones
world
wide
100s of
millions of
GPS
enabled
devices
sold
annually
76 million smart meters
in 2009…
200M by 2014
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?
TBs
of
data
every
day
Big Data Computing Introduction to Big Data
Vu Pham
Crowdsourcing
An Example of Big Data at Work
Big Data Computing Introduction to Big Data
Vu Pham
Where is the problem?
Traditional RDBMS queries isn't sufficient to get useful
information out of the huge volume of data
To search it with traditional tools to find out if a
particular topic was trending would take so long that
the result would be meaningless by the time it was
computed.
Big Data come up with a solution to store this data in
novel ways in order to make it more accessible, and
also to come up with methods of performing analysis
on it.
Big Data Computing Introduction to Big Data
Vu Pham
Challenges
Capturing
Storing
Searching
Sharing
Analysing
Visualization
Big Data Computing Introduction to Big Data
Vu Pham
IBM considers Big Data (3V’s):
The 3V’s: Volume, Velocity and Variety.
Big Data Computing Introduction to Big Data
Vu Pham
Volume (Scale)
Volume: Enterprises are awash with ever-growing
data of all types, easily amassing terabytes even
Petabytes of information.
Turn 12 terabytes of Tweets created each day into
improved product sentiment analysis
Convert 350 billion annual meter readings to
better predict power consumption
Big Data Computing Introduction to Big Data
Vu Pham
Volume (Scale)
Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Exponential increase in
collected/generated data
Big Data Computing Introduction to Big Data
Vu Pham
CERN’s Large Hydron Collider (LHC) generates 15 PB a year
Big Data Computing Introduction to Big Data
Example 1: CERN’s Large Hydron Collider(LHC)
Vu Pham
Example 2: The Earthscope
• The Earthscope is the world's largest
science project. Designed to track
North America's geological evolution,
this observatory records data over
3.8 million square miles, amassing
67 terabytes of data. It analyzes
seismic slips in the San Andreas fault,
sure, but also the plume of magma
underneath Yellowstone and much,
much more.
(http://www.msnbc.msn.com/id/44363
598/ns/technology_and_science-
future_of_technology/#.TmetOdQ--uI)
Big Data Computing Introduction to Big Data
Vu Pham
Velocity (Speed)
Velocity: Sometimes 2 minutes is too late. For time-
sensitive processes such as catching fraud, big data
must be used as it streams into your enterprise in
order to maximize its value.
Scrutinize 5 million trade events created each day
to identify potential fraud
Analyze 500 million daily call detail records in real-
time to predict customer churn faster
Big Data Computing Introduction to Big Data
Vu Pham
Examples: Velocity (Speed)
Data is begin generated fast and need to be
processed fast
Online Data Analytics
Late decisions ➔ missing opportunities
Examples
E-Promotions: Based on your current location, your purchase history,
what you like ➔ send promotions right now for store next to you
Healthcare monitoring: sensors monitoring your activities and body ➔
any abnormal measurements require immediate reaction
Big Data Computing Introduction to Big Data
Vu Pham
Real-time/Fast Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
Big Data Computing Introduction to Big Data
Vu Pham
Customer
Influence
Behavior
Product
Recommendations
that are Relevant
& Compelling
Friend Invitations
to join a
Game or Activity
that expands
business
Preventing Fraud
as it is Occurring
& preventing more
proactively
Learning why Customers
Switch to competitors
and their offers; in
time to Counter
Improving the
Marketing
Effectiveness of a
Promotion while it
is still in Play
Real-Time Analytics/Decision Requirement
Big Data Computing Introduction to Big Data
Vu Pham
Variety (Complexity)
Variety: Big data is any type of data –
Structured Data (example: tabular data)
Unstructured –text, sensor data, audio, video
Semi Structured : web data, log files
Big Data Computing Introduction to Big Data
Vu Pham
Examples: Variety (Complexity)
Relational Data (Tables/Transaction/Legacy
Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can only scan the data once
A single application can be
generating/collecting many types of data
Big Public Data (online, weather, finance, etc)
To extract knowledge➔ all these types of data need to
linked together
Big Data Computing Introduction to Big Data
Vu Pham
The 3 Big V’s (+1)
Big 3V’s
Volume
Velocity
Variety
Plus 1
Value
Big Data Computing Introduction to Big Data
Vu Pham
The 3 Big V’s (+1) (+ N more)
Plus many more
Veracity
Validity
Variability
Viscosity & Volatility
Viability,
Venue,
Vocabulary, Vagueness,
…
Big Data Computing Introduction to Big Data
Vu Pham
Big Data Computing Introduction to Big Data
Vu Pham
Value
Integrating Data
Reducing data complexity
Increase data availability
Unify your data systems
All 3 above will lead to increased data collaboration
-> add value to your big data
Big Data Computing Introduction to Big Data
Vu Pham
Veracity
Veracity refers to the biases ,noise and
abnormality in data, trustworthiness of data.
1 in 3 business leaders don’t trust the information
they use to make decisions.
How can you act upon information if you don’t
trust it?
Establishing trust in big data presents a huge
challenge as the variety and number of sources
grows.
Big Data Computing Introduction to Big Data
Vu Pham
Valence
Valence refers to the connectedness of big data.
Such as in the form of graph networks
Big Data Computing Introduction to Big Data
Vu Pham
Validity
Accuracy and correctness of the data relative to a
particular use
Example: Gauging storm intensity
satellite imagery vs social media posts
prediction quality vs human impact
Big Data Computing Introduction to Big Data
Vu Pham
Variability
How the meaning of the data changes over time
Language evolution
Data availability
Sampling processes
Changes in characteristics of the data source
Big Data Computing Introduction to Big Data
Vu Pham
Viscosity & Volatility
Both related to velocity
Viscosity: data velocity relative to timescale of
event being studied
Volatility: rate of data loss and stable lifetime
of data
Scientific data often has practically unlimited
lifespan, but social / business data may evaporate
in finite time
Big Data Computing Introduction to Big Data
Vu Pham
More V’s
Viability
Which data has meaningful relations to questions of
interest?
Venue
Where does the data live and how do you get it?
Vocabulary
Metadata describing structure, content, & provenance
Schemas, semantics, ontologies, taxonomies, vocabularies
Vagueness
Confusion about what “Big Data” means
Big Data Computing Introduction to Big Data
Vu Pham
Dealing with Volume
Distill big data down to small information
Parallel and automated analysis
Automation requires standardization
Standardize by reducing Variety:
Format
Standards
Structure
Big Data Computing Introduction to Big Data
Vu Pham
Harnessing Big Data
OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
Big Data Computing Introduction to Big Data
Vu Pham
The Model Has Changed…
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
Big Data Computing Introduction to Big Data
Vu Pham
What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
Big Data Computing Introduction to Big Data
Vu Pham
Big Data Analytics
Big data is more real-time in
nature than traditional
Dataware house (DW)
applications
Traditional DW architectures
(e.g. Exadata, Teradata) are
not well-suited for big data
apps
Shared nothing, massively
parallel processing, scale out
architectures are well-suited
for big data apps
Big Data Computing Introduction to Big Data
Vu Pham
Big Data Technology
Big Data Computing Introduction to Big Data
Vu Pham
Conclusion
In this lecture, we have defined Big Data and discussed
the challenges and applications of Big Data.
We have also described characteristics of Big Data i.e.
Volume, Velocity, Variety and more V’s, Big Data Analytics,
Big Data Landscape and Big Data Technology.
Big Data Computing Introduction to Big Data
Vu Pham
Big Data Enabling Technologies
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Big Data Enabling Technologies
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will discuss a brief introduction to
Big Data Enabling Technologies.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Introduction
Big Data is used for a collection of data sets so large
and complex that it is difficult to process using
traditional tools.
A recent survey says that 80% of the data created in
the world are unstructured.
One challenge is how we can store and process this big
amount of data. In this lecture, we will discuss the top
technologies used to store and analyse Big Data.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Apache Hadoop
Apache Hadoop is an open source software framework for
big data.
It has two basic parts:
Hadoop Distributed File System (HDFS) is the storage
system of Hadoop which splits big data and distribute
across many nodes in a cluster.
a. Scaling out of H/W resources
b. Fault Tolerant
MapReduce: Programming model that simplifies parallel
programming.
a. Map-> apply ()
b. Reduce-> summarize ()
c. Google used MapReduce for Indexing websites.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Big Data Computing Big Data Enabling Technologies
Vu Pham
Big Data Computing Big Data Enabling Technologies
Vu Pham
Map Reduce
MapReduce is a programming model and an
associated implementation for processing and
generating large data sets.
Users specify a map function that processes a
key/value pair to generate a set of intermediate
key/value pairs, and a reduce function that merges all
intermediate values associated with the same
intermediate key
Big Data Computing Big Data Enabling Technologies
Vu Pham
Map Reduce
Big Data Computing Big Data Enabling Technologies
Vu Pham
Hadoop Ecosystem
Big Data Computing Big Data Enabling Technologies
Vu Pham
Hadoop Ecosystem
Big Data Computing Big Data Enabling Technologies
Vu Pham
HDFS Architecture
Big Data Computing Big Data Enabling Technologies
Vu Pham
YARN
YARN – Yet Another Resource Manager.
Apache Hadoop YARN is the resource management and
job scheduling technology in the open source Hadoop
distributed processing framework.
YARN is responsible for allocating system resources to
the various applications running in a Hadoop cluster
and scheduling tasks to be executed on different
cluster nodes.
Big Data Computing Big Data Enabling Technologies
Vu Pham
YARN Architecture
Big Data Computing Big Data Enabling Technologies
Vu Pham
Hive
Hive is a distributed data management for Hadoop.
It supports SQL-like query option HiveSQL (HSQL) to
access big data.
It can be primarily used for Data mining purpose.
It runs on top of Hadoop.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Apache Spark
Apache Spark is a big data analytics framework that
was originally developed at the University of California,
Berkeley's AMPLab, in 2012. Since then, it has gained a
lot of attraction both in academia and in industry.
Apache Spark is a lightning-fast cluster computing
technology, designed for fast computation.
Apache Spark is a lightning-fast cluster computing
technology, designed for fast computation
Big Data Computing Big Data Enabling Technologies
Vu Pham
ZooKeeper is a highly reliable distributed coordination kernel,
which can be used for distributed locking, configuration
management, leadership election, work queues,….
Zookeeper is a replicated service that holds the metadata of
distributed applications.
Key attributed of such data
Small size
Performance sensitive
Dynamic
Critical
In very simple words, it is a central store of key-value using
which distributed systems can coordinate. Since it needs to be
able to handle the load, Zookeeper itself runs on many
machines.
ZooKeeper
https://zookeeper.apache.org/
Big Data Computing Big Data Enabling Technologies
Vu Pham
NoSQL
While the traditional SQL can be effectively used to
handle large amount of structured data, we need
NoSQL (Not Only SQL) to handle unstructured data.
NoSQL databases store unstructured data with no
particular schema
Each row can have its own set of column values. NoSQL
gives better performance in storing massive amount of
data.
Big Data Computing Big Data Enabling Technologies
Vu Pham
NoSQL
Big Data Computing Big Data Enabling Technologies
Vu Pham
Cassandra
Apache Cassandra is highly scalable, distributed and
high-performance NoSQL database. Cassandra is
designed to handle a huge amount of data.
Cassandra handles the huge amount of data with its
distributed architecture.
Data is placed on different machines with more than
one replication factor that provides high availability
and no single point of failure.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Cassandra
In the image above, circles are Cassandra nodes and
lines between the circles shows distributed
architecture, while the client is sending data to the
node
Big Data Computing Big Data Enabling Technologies
Vu Pham
HBase
HBase is an open source, distributed database,
developed by Apache Software foundation.
Initially, it was Google Big Table, afterwards it was re-
named as HBase and is primarily written in Java.
HBase can store massive amounts of data from
terabytes to petabytes.
Big Data Computing Big Data Enabling Technologies
Vu Pham
HBase
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark Streaming
Spark Streaming is an extension of the core Spark API
that enables scalable, high-throughput, fault-tolerant
stream processing of live data streams.
Streaming data input from HDFS, Kafka, Flume, TCP
sockets, Kinesis, etc.
Spark ML (Machine Learning) functions and GraphX
graph processing algorithms are fully applicable to
streaming data .
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark Streaming
Big Data Computing Big Data Enabling Technologies
Vu Pham
Kafka, Streaming Ecosystem
Apache Kafka is an open-source stream-processing
software platform developed by the Apache Software
Foundation written in Scala and Java.
Apache Kafka is an open source distributed streaming
platform capable of handling trillions of events a day,
Kafka is based on an abstraction of a distributed
commit log
Big Data Computing Big Data Enabling Technologies
Vu Pham
Kafka
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark MLlib
Spark MLlib is a distributed machine-learning
framework on top of Spark Core.
MLlib is Spark's scalable machine learning library
consisting of common learning algorithms and utilities,
including classification, regression, clustering,
collaborative filtering, dimensionality reduction.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark MLlib Component
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark GraphX
GraphX is a new component in Spark for graphs and
graph-parallel computation. At a high level, GraphX
extends the Spark RDD by introducing a new graph
abstraction.
GraphX reuses Spark RDD concept, simplifies graph
analytics tasks, provides the ability to make operations
on a directed multigraph with properties attached to
each vertex and edge.
Big Data Computing Big Data Enabling Technologies
Vu Pham
Spark GraphX
GraphX is a thin layer on top of the Spark
general-purpose dataflow framework (lines of code).
Big Data Computing Big Data Enabling Technologies
Vu Pham
Conclusion
In this lecture, we given a brief overview of following Big Data
Enabling Technologies:
Apache Hadoop
Hadoop Ecosystem
HDFS Architecture
YARN
NoSQL
Hive
Map Reduce
Apache Spark
Zookeeper
Cassandra
Hbase
Spark Streaming
Kafka
Spark MLlib
GraphX
Big Data Computing Big Data Enabling Technologies
Vu Pham
Hadoop Stack for Big Data
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Big Data Hadoop Stack
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will provide insight into Hadoop
technologies opportunities and challenges for Big
Data.
We will also look into the Hadoop stack and
applications and technologies associated with Big Data
solutions.
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hadoop Beginnings
Big Data Computing Big Data Hadoop Stack
Vu Pham
What is Hadoop ?
Apache Hadoop is an open source software
framework for storage and large scale
processing of the data-sets on clusters of
commodity hardware.
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hadoop Beginnings
Hadoop was created by Doug Cutting and Mike Cafarella in
2005
It was originally developed to support distribution of the
Nutch Search Engine Project.
Doug, who was working at Yahoo at the time, who is now
actually a chief architect at Cloudera, has named this project
after his son’s toy elephant, Hadoop.
Big Data Computing Big Data Hadoop Stack
Vu Pham
Moving Computation to Data
Big Data Computing Big Data Hadoop Stack
Hadoop started out as a simple batch processing framework.
The idea behind Hadoop is that instead of moving data to
computation, we move computation to data.
Vu Pham
Scalability
Big Data Computing Big Data Hadoop Stack
Scalability's at it's core of a Hadoop system.
We have cheap computing storage.
We can distribute and scale across very easily
in a very cost effective manner.
Vu Pham
Reliability
Hardware Failures
Handles
Automatically!
Big Data Computing Big Data Hadoop Stack
If we think about an individual machine or rack of machines, or a
large cluster or super computer, they all fail at some point of time or
some of their components will fail. These failures are so common
that we have to account for them ahead of the time.
And all of these are actually handled within the Hadoop framework
system. So the Apache's Hadoop MapReduce and HDFS components
were originally derived from the Google's MapReduce and Google's
file system. Another very interesting thing that Hadoop brings is a
new approach to data.
Vu Pham
New Approach to Data: Keep all data
Big Data Computing Big Data Hadoop Stack
A new approach is, we can keep all the data that we have, and we
can take that data and analyze it in new interesting ways. We can
do something that's called schema and read style.
And we can actually allow new analysis. We can bring more data
into simple algorithms, which has shown that with more
granularity, you can actually achieve often better results in taking
a small amount of data and then some really complex analytics on
it.
Vu Pham
Apache Hadoop Framework
& its Basic Modules
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hadoop Common: It contains libraries and utilities needed
by other Hadoop modules.
Hadoop Distributed File System (HDFS): It is a distributed
file system that stores data on a commodity machine.
Providing very high aggregate bandwidth across the entire
cluster.
Hadoop YARN: It is a resource management platform
responsible for managing compute resources in the cluster
and using them in order to schedule users and
applications.
Hadoop MapReduce: It is a programming model that
scales data across a lot of different processes.
Apache Framework Basic Modules
Big Data Computing Big Data Hadoop Stack
Vu Pham
Apache Framework Basic Modules
Big Data Computing Big Data Hadoop Stack
Vu Pham
High Level Architecture of Hadoop
Big Data Computing Big Data Hadoop Stack
Two major pieces of Hadoop are: Hadoop Distribute the File System and the
MapReduce, a parallel processing framework that will map and reduce data.
These are both open source and inspired by the technologies developed at
Google.
If we talk about this high level infrastructure, we start talking about things like
TaskTrackers and JobTrackers, the NameNodes and DataNodes.
Vu Pham
HDFS
Hadoop distributed file system
Big Data Computing Big Data Hadoop Stack
Vu Pham
Distributed, scalable, and portable file-system written in
Java for the Hadoop framework.
Each node in Hadoop instance typically has a single name
node, and a cluster of data nodes that formed this HDFS
cluster.
Each HDFS stores large files, typically in ranges of
gigabytes to terabytes, and now petabytes, across
multiple machines. And it can achieve reliability by
replicating the cross multiple hosts, and therefore does
not require any range storage on hosts.
HDFS: Hadoop distributed file system
Big Data Computing Big Data Hadoop Stack
Vu Pham
HDFS
Big Data Computing Big Data Hadoop Stack
Vu Pham
HDFS
Big Data Computing Big Data Hadoop Stack
Vu Pham
MapReduce Engine
Big Data Computing Big Data Hadoop Stack
The typical MapReduce engine will consist of a job tracker, to which client
applications can submit MapReduce jobs, and this job tracker typically pushes
work out to all the available task trackers, now it's in the cluster. Struggling to
keep the word as close to the data as possible, as balanced as possible.
Vu Pham
Apache Hadoop NextGen MapReduce (YARN)
Big Data Computing Big Data Hadoop Stack
Yarn enhances the power of the
Hadoop compute cluster, without
being limited by the map produce
kind of framework.
It's scalability's great. The processing
power and data centers continue to
grow quickly, because the YARN
research manager focuses
exclusively on scheduling. It can
manage those very large clusters
quite quickly and easily.
YARN is completely compatible with
the MapReduce. Existing
MapReduce application end users
can run on top of the Yarn without
disrupting any of their existing
processes.
Vu Pham
Hadoop 1.0 vs. Hadoop 2.0
Big Data Computing Big Data Hadoop Stack
Hadoop 2.0 provides a more general processing platform, that is not constraining to this
map and reduce kinds of processes.
The fundamental idea behind the MapReduce 2.0 is to split up two major functionalities
of the job tracker, resource management, and the job scheduling and monitoring, and
to do two separate units. The idea is to have a global resource manager, and per
application master manager.
Vu Pham
Yarn enhances the power of the Hadoop compute cluster, without
being limited by the map produce kind of framework.
It's scalability's great. The processing power and data centers
continue to grow quickly, because the YARN research manager
focuses exclusively on scheduling. It can manage those very large
clusters quite quickly and easily.
YARN is completely compatible with the MapReduce. Existing
MapReduce application end users can run on top of the Yarn without
disrupting any of their existing processes.
It does have a Improved cluster utilization as well. The resource
manager is a pure schedule or they just optimize this cluster
utilization according to the criteria such as capacity, guarantees,
fairness, how to be fair, maybe different SLA's or service level
agreements.
What is Yarn ?
Big Data Computing Big Data Hadoop Stack
Scalability MapReduce Compatibility Improved cluster utilization
Vu Pham
It supports other work flows other than just map reduce.
Now we can bring in additional programming models, such as graph
process or iterative modeling, and now it's possible to process the
data in your base. This is especially useful when we talk about
machine learning applications.
Yarn allows multiple access engines, either open source or
proprietary, to use Hadoop as a common standard for either batch or
interactive processing, and even real time engines that can
simultaneous acts as a lot of different data, so you can put streaming
kind of applications on top of YARN inside a Hadoop architecture,
and seamlessly work and communicate between these
environments.
What is Yarn ?
Big Data Computing Big Data Hadoop Stack
Fairness
Supports Other Workloads
Iterative Modeling
Machine
Learning
Multiple
Access
Engines
Vu Pham
The Hadoop “Zoo”
Big Data Computing Big Data Hadoop Stack
Vu Pham
Apache Hadoop Ecosystem
Big Data Computing Big Data Hadoop Stack
Vu Pham
Original Google Stack
Big Data Computing Big Data Hadoop Stack
Had their original MapReduce, and they were storing and processing
large amounts of data.
Like to be able to access that data and access it in a SQL like
language. So they built the SQL gateway to adjust the data into the
MapReduce cluster and be able to query some of that data as well.
Vu Pham
Original Google Stack
Big Data Computing Big Data Hadoop Stack
Then, they realized they needed a high-level specific language to
access MapReduce in the cluster and submit some of those jobs. So
Sawzall came along.
Then, Evenflow came along and allowed to chain together complex
work codes and coordinate events and service across this kind of a
framework or the specific cluster they had at the time.
Vu Pham
Original Google Stack
Big Data Computing Big Data Hadoop Stack
Then, Dremel came along. Dremel was a columnar storage in the
metadata manager that allows us to manage the data and is able to
process a very large amount of unstructured data.
Then Chubby came along as a coordination system that would manage
all of the products in this one unit or one ecosystem that could process
all these large amounts of structured data seamlessly.
Vu Pham
Facebook’s Version of the Stack
Big Data Computing Big Data Hadoop Stack
Vu Pham
Yahoo Version of the Stack
Big Data Computing Big Data Hadoop Stack
Vu Pham
LinkedIn’s Version of the Stack
Big Data Computing Big Data Hadoop Stack
Vu Pham
Cloudera’s Version of the Stack
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hadoop Ecosystem
Major Components
Big Data Computing Big Data Hadoop Stack
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
Tool designed for efficiently transferring bulk
data between Apache Hadoop and structured
datastores such as relational databases
Apache Sqoop
Big Data Computing Big Data Hadoop Stack
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hbase is a key component of the Hadoop stack, as its
design caters to applications that require really fast random
access to significant data set.
Column-oriented database management system
Key-value store
Based on Google Big Table
Can hold extremely large data
Dynamic data model
Not a Relational DBMS
HBASE
Big Data Computing Big Data Hadoop Stack
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
High level programming on top of Hadoop
MapReduce
The language: Pig Latin
Data analysis problems as data flows
Originally developed at Yahoo 2006
PIG
Big Data Computing Big Data Hadoop Stack
Vu Pham
PIG for ETL
Big Data Computing Big Data Hadoop Stack
A good example of PIG applications is ETL transaction model that
describes how a process will extract data from a source, transporting
according to the rules set that we specify, and then load it into a data
store.
PIG can ingest data from files, streams, or any other sources using the
UDF: a user-defined functions that we can write ourselves.
When it has all the data it can perform, select, iterate and do kinds of
transformations.
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
Data warehouse software facilitates querying and
managing large datasets residing in distributed
storage
SQL-like language!
Facilitates querying and managing large datasets in
HDFS
Mechanism to project structure onto this data and
query the data using a SQL-like language called
HiveQL
Apache Hive
Big Data Computing Big Data Hadoop Stack
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
Workflow scheduler system to manage Apache
Hadoop jobs
Oozie Coordinator jobs!
Supports MapReduce, Pig, Apache Hive, and
Sqoop, etc.
Oozie
Workflow
Big Data Computing Big Data Hadoop Stack
Vu Pham
Big Data Computing Big Data Hadoop Stack
Vu Pham
Provides operational services for a
Hadoop cluster group services
Centralized service for: maintaining
configuration information naming
services
Providing distributed synchronization
and providing group services
Zookeeper
Big Data Computing Big Data Hadoop Stack
Vu Pham
Flume
Distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data
It has a simple and very flexible architecture based on streaming
data flows. It's quite robust and fall tolerant, and it's really tunable
to enhance the reliability mechanisms, fail over, recovery, and all
the other mechanisms that keep the cluster safe and reliable.
It uses simple extensible data model that allows us to apply all
kinds of online analytic applications.
Big Data Computing Big Data Hadoop Stack
Vu Pham
Additional Cloudera Hadoop Components Impala
Big Data Computing Big Data Hadoop Stack
Vu Pham
Cloudera, Impala was designed specifically at Cloudera, and it's a query
engine that runs on top of the Apache Hadoop. The project was officially
announced at the end of 2012, and became a publicly available, open source
distribution.
Impala brings scalable parallel database technology to Hadoop and allows
users to submit low latencies queries to the data that's stored within the
HDFS or the Hbase without acquiring a ton of data movement and
manipulation.
Impala is integrated with Hadoop, and it works within the same power
system, within the same format metadata, all the security and reliability
resources and management workflows.
It brings that scalable parallel database technology on top of the Hadoop.
It actually allows us to submit SQL like queries at much faster speeds with a
lot less latency.
Impala
Big Data Computing Big Data Hadoop Stack
Vu Pham
Additional Cloudera Hadoop Components Spark
The New Paradigm
Big Data Computing Big Data Hadoop Stack
Vu Pham
Apache Spark™ is a fast and general engine for large-scale data
processing
Spark is a scalable data analytics platform that incorporates primitives
for in-memory computing and therefore, is allowing to exercise some
different performance advantages over traditional Hadoop's cluster
storage system approach. And it's implemented and supports
something called Scala language, and provides unique environment
for data processing.
Spark is really great for more complex kinds of analytics, and it's great
at supporting machine learning libraries.
It is yet again another open source computing frame work and it was
originally developed at MP labs at the University of California
Berkeley and it was later donated to the Apache software foundation
where it remains today as well.
Spark
Big Data Computing Big Data Hadoop Stack
Vu Pham
In contrast to Hadoop's two stage disk based MapReduce paradigm
Multi-stage in-memory primitives provides performance up to 100
times faster for certain applications.
Allows user programs to load data into a cluster's memory and query
it repeatedly
Spark is really well suited for these machined learning kinds of
applications that often times have iterative sorting in memory kinds
of computation.
Spark requires a cluster management and a distributed storage
system. So for the cluster management, Spark supports standalone
native Spark clusters, or you can actually run Spark on top of a
Hadoop yarn, or via patching mesas.
For distributor storage, Spark can interface with any of the variety of
storage systems, including the HDFS, Amazon S3.
Spark Benefits
Big Data Computing Big Data Hadoop Stack
Vu Pham
Conclusion
In this lecture, we have discussed the specific components
and basic processes of the Hadoop architecture, software
stack, and execution environment.
Big Data Computing Big Data Hadoop Stack
Vu Pham
Hadoop Distributed File System
(HDFS)
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will discuss design goals of HDFS, the
read/write process to HDFS, the main configuration
tuning parameters to control HDFS performance and
robustness.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Introduction
Hadoop provides a distributed file system and a framework for
the analysis and transformation of very large data sets using
the MapReduce paradigm.
An important characteristic of Hadoop is the partitioning of
data and computation across many (thousands) of hosts, and
executing application computations in parallel close to their
data.
A Hadoop cluster scales computation capacity, storage capacity
and IO bandwidth by simply adding commodity servers.
Hadoop clusters at Yahoo! span 25,000 servers, and store 25
petabytes of application data, with the largest cluster being
3500 servers. One hundred other organizations worldwide
report using Hadoop.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Introduction
Hadoop is an Apache project; all components are available via
the Apache open source license.
Yahoo! has developed and contributed to 80% of the core of
Hadoop (HDFS and MapReduce).
HBase was originally developed at Powerset, now a department
at Microsoft.
Hive was originated and developed at Facebook.
Pig, ZooKeeper, and Chukwa were originated and developed at
Yahoo!
Avro was originated at Yahoo! and is being co-developed with
Cloudera.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Hadoop Project Components
Big Data Computing
HDFS Distributed file system
MapReduce Distributed computation framework
HBase Column-oriented table service
Pig
Dataflow language and parallel execution
framework
Hive Data warehouse infrastructure
ZooKeeper Distributed coordination service
Chukwa System for collecting management data
Avro Data serialization system
Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Design Concepts
Scalable distributed filesystem: So essentially, as you add disks
you get scalable performance. And as you add more, you're
adding a lot of disks, and that scales out the performance.
Distributed data on local disks on several nodes.
Low cost commodity hardware: A lot of performance out of it
because you're aggregating performance.
Big Data Computing
Node 1
B1
Node 2
B2
Node n
Bn
…
Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Design Goals
Hundreds/Thousands of nodes and disks:
It means there's a higher probability of hardware failure. So the design
needs to handle node/disk failures.
Portability across heterogeneous hardware/software:
Implementation across lots of different kinds of hardware and software.
Handle large data sets:
Need to handle terabytes to petabytes.
Enable processing with high throughput
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Techniques to meet HDFS design goals
Simplified coherency model:
The idea is to write once and then read many times. And that simplifies
the number of operations required to commit the write.
Data replication:
Helps to handle hardware failures.
Try to spread the data, same piece of data on different nodes.
Move computation close to the data:
So you're not moving data around. That improves your performance and
throughput.
Relax POSIX requirements to increase the throughput.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Basic architecture of HDFS
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Architecture: Key Components
Single NameNode: A master server that manages the file system
namespace and basically regulates access to these files from
clients, and it also keeps track of where the data is on the
DataNodes and where the blocks are distributed essentially.
Multiple DataNodes: Typically one per node in a cluster. So
you're basically using storage which is local.
Basic Functions:
Manage the storage on the DataNode.
Read and write requests on the clients
Block creation, deletion, and replication is all based on instructions from
the NameNode.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Original HDFS Design
Single NameNode
Multiple DataNodes
Manage storage- blocks of data
Serving read/write requests from clients
Block creation, deletion, replication
Big Data Computing Big Data Enabling Technologies
Vu Pham
HDFS in Hadoop 2
HDFS Federation: Basically what we are doing is trying to have
multiple data nodes, and multiple name nodes. So that we can
increase the name space data. So, if you recall from the first design
you have essentially a single node handling all the namespace
responsibilities. And you can imagine as you start having thousands of
nodes that they'll not scale, and if you have billions of files, you will
have scalability issues. So to address that, the federation aspect was
brought in. That also brings performance improvements.
Benefits:
Increase namespace scalability
Performance
Isolation
Big Data Computing Big Data Enabling Technologies
Vu Pham
HDFS in Hadoop 2
How its done
Multiple Namenode servers
Multiple namespaces
Data is now stored in Block pools
So there is a pool associated with each namenode or
namespace.
And these pools are essentially spread out over all the data
nodes.
Big Data Computing Big Data Enabling Technologies
Vu Pham
HDFS in Hadoop 2
High Availability-
Redundant NameNodes
Heterogeneous Storage
and Archival Storage
ARCHIVE, DISK, SSD, RAM_DISK
Big Data Computing Big Data Enabling Technologies
Vu Pham
Federation: Block Pools
Big Data Computing Big Data Enabling Technologies
So, if you remember the original design you have one name space and a bunch of
data nodes. So, the structure looks similar.
You have a bunch of NameNodes, instead of one NameNode. And each of those
NameNodes is essentially right into these pools, but the pools are spread out over the
data nodes just like before. This is where the data is spread out. You can gloss over
the different data nodes. So, the block pool is essentially the main thing that's
different.
Vu Pham
HDFS Performance Measures
Determine the number of blocks for a given file size,
Key HDFS and system components that are affected
by the block size.
An impact of using a lot of small files on HDFS and
system
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Recall: HDFS Architecture
Distributed data on local disks on several nodes
Big Data Computing
Node 1
B1
Node 2
B2
Node n
Bn
…
Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Block Size
Default block size is 64 megabytes.
Good for large files!
So a 10GB file will be broken into: 10 x 1024/64=160 blocks
Big Data Computing
Node 1
B1
Node 2
B2
Node n
Bn
…
Hadoop Distributed File System (HDFS)
Vu Pham
Importance of No. of Blocks in a file
NameNode memory usage: Every block that you create basically
every file could be a lot of blocks as we saw in the previous case,
160 blocks. And if you have millions of files that's millions of
objects essentially. And for each object, it uses a bit of memory on
the NameNode, so that is a direct effect of the number of blocks.
But if you have replication, then you have 3 times the number of
blocks.
Number of map tasks: Number of maps typically depends on the
number of blocks being processed.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Large No. of small files: Impact on Name node
Memory usage: Typically, the usage is around 150 bytes per
object. Now, if you have a billion objects, that's going to be like
300GB of memory.
Network load: Number of checks with datanodes proportional
to number of blocks
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Large No. of small files: Performance Impact
Number of map tasks: Suppose we have 10GB of data to
process and you have them all in lots of 32k file sizes? Then we
will end up with 327680 map tasks.
Huge list of tasks that are queued.
The other impact of this is the map tasks, each time they spin up
and spin down, there's a latency involved with that because you
are starting up Java processes and stopping them.
Inefficient disk I/O with small sizes
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS optimized for large files
Lots of small files is bad!
Solution:
Merge/Concatenate files
Sequence files
HBase, HIVE configuration
CombineFileInputFormat
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Big Data Computing
Read/Write Processes in HDFS
Hadoop Distributed File System (HDFS)
Vu Pham
Read Process in HDFS
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Write Process in HDFS
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Big Data Computing
HDFS Tuning Parameters
Hadoop Distributed File System (HDFS)
Vu Pham
Overview
Tuning parameters
Specifically DFS Block size
NameNode, DataNode system/dfs parameters.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS XML configuration files
Tuning environment typically in HDFS XML configuration files,
for example, in the hdfs-site.xml.
This is more for system administrators of Hadoop clusters, but
it's good to know what changes affect impact the performance,
and especially if your trying things out on your own there some
important parameters to keep in mind.
Commercial vendors have GUI based management console
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Block Size
Recall: impacts how much NameNode memory is used, number
of map tasks that are showing up, and also have impacts on
performance.
Default 64 megabytes: Typically bumped up to 128 megabytes
and can be changed based on workloads.
The parameter that this changes dfs.blocksize or dfs.block.size.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Replication
Default replication is 3.
Parameter: dfs.replication
Tradeoffs:
Lower it to reduce replication cost
Less robust
Higher replication can make data local to more workers
Lower replication ➔ More space
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Lot of other parameters
Various tunables for datanode, namenode.
Examples:
Dfs.datanode.handler.count (10): Sets the number of server
threads on each datanode
Dfs.namenode.fs-limits.max-blocks-per-file: Maximum number
of blocks per file.
Full List:
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Big Data Computing
HDFS Performance and
Robustness
Hadoop Distributed File System (HDFS)
Vu Pham
Common Failures
DataNode Failures: Server can fail, disk can crash, data
corruption.
Network Failures: Sometimes there's data corruption because
of network issues or disk issue. So, all of that could lead to a
failure in the DataNode aspect of HDFS. You could have network
failures. So, you could have a network go down between a
particular and the name node that can affect a lot of data nodes
at the same time.
NameNode Failures: Could have name node failures, disk failure
on the name node itself or the name node itself could corrupt
this process.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
HDFS Robustness
NameNode receives heartbeat and block reports from
DataNodes
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Mitigation of common failures
Periodic heartbeat: from DataNode to NameNode.
DataNodes without recent heartbeat:
Mark the data. And any new I/O that comes up is not going to be sent to
that data node. Also remember that NameNode has information on all
the replication information for the files on the file system. So, if it knows
that a datanode fails which blocks will follow that replication factor.
Now this replication factor is set for the entire system and also you could
set it for particular file when you're writing the file. Either way, the
NameNode knows which blocks fall below replication factor. And it will
restart the process to re-replicate.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Mitigation of common failures
Checksum computed on file creation.
Checksums stored in HDFS namespace.
Used to check retrieved data.
Re-read from alternate replica
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Mitigation of common failures
Multiple copies of central meta data structures.
Failover to standby NameNode- manual by default.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Performance
Changing blocksize and replication factor can improve
performance.
Example: Distributed copy
Hadoop distcp allows parallel transfer of files.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Replication trade off with respect to robustness
One performance tradeoff is, actually when you go out
to do some of the map reduce jobs, having replicas
gives additional locality possibilities, but the big trade
off is the robustness. In this case, we said no replicas.
Might lose a node or a local disk: can't recover because
there is no replication.
Similarly, with data corruption, if you get a checksum
that's bad, now you can't recover because you don't
have a replica.
Other parameters changes can have similar effects.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Conclusion
In this lecture, we have discussed design goals of HDFS,
the read/write process to HDFS, the main configuration
tuning parameters to control HDFS performance and
robustness.
Big Data Computing Hadoop Distributed File System (HDFS)
Vu Pham
Hadoop MapReduce 1.0
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Hadoop MapReduce 1.0
Vu Pham
What is Map Reduce
MapReduce is the execution engine of Hadoop.
Big Data Computing Hadoop MapReduce 1.0
Vu Pham
Map Reduce Components
The Job Tracker
Task Tracker
Big Data Computing Hadoop MapReduce 1.0
Vu Pham
The Job Tracker
Big Data Computing
The Job Tracker is hosted
inside the master and it
receives the job execution
request from the client.
Its main duties are to
break down the receive job
that is big computations in
small parts allocate the
partial computations that
is tasks to the slave nodes
monitoring the progress
and report of task
execution from the slave.
The unit of execution is
job.
Hadoop MapReduce 1.0
Vu Pham
The Task Tracker
Big Data Computing
Task tracker is the MapReduce
component on the slave machine
as there are multiple slave
machines.
Many task trackers are available in
a cluster its duty is to perform
computation given by job tracker
on the data available on the slave
machine.
The task tracker will communicate
the progress and report the
results to the job tracker.
The master node contains the job
tracker and name node whereas
all slaves contain the task tracker
and data node.
Hadoop MapReduce 1.0
Vu Pham
Execution Steps
Big Data Computing
Step-1The client submits the job to Job
Tracker
Step-2 Job Tracker asks Name node the
location of data
Step-3 As per the reply from name
node, the Job Tracker ask respective
task trackers to execute the task on
their data
Step-4 All the results are stored on
some Data Node and the Name Node is
informed about the same.
Step-5 The task trackers inform the job
completion and progress to Job Tracker
Step-6 The Job Tracker inform the
completion to client
Step-7 Client contacts the Name Node
and retrieve the results
Hadoop MapReduce 1.0
Vu Pham
Hadoop MapReduce 2.0
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will discuss the ‘MapReduce
paradigm’ and its internal working and
implementation overview.
We will also see many examples and different
applications of MapReduce being used, and look into
how the ‘scheduling and fault tolerance’ works inside
MapReduce.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Introduction
MapReduce is a programming model and an associated
implementation for processing and generating large data
sets.
Users specify a map function that processes a key/value
pair to generate a set of intermediate key/value pairs, and
a reduce function that merges all intermediate values
associated with the same intermediate key.
Many real world tasks are expressible in this model.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Contd…
Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity
machines.
The run-time system takes care of the details of partitioning the
input data, scheduling the program's execution across a set of
machines, handling machine failures, and managing the required
inter-machine communication.
This allows programmers without any experience with parallel and
distributed systems to easily utilize the resources of a large
distributed system.
A typical MapReduce computation processes many terabytes of
data on thousands of machines. Hundreds of MapReduce
programs have been implemented and upwards of one thousand
MapReduce jobs are executed on Google's clusters every day.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Distributed File System
Chunk Servers
File is split into contiguous chunks
Typically each chunk is 16-64MB
Each chunk replicated (usually 2x or 3x)
Try to keep replicas in different racks
Master node
Also known as Name Nodes in HDFS
Stores metadata
Might be replicated
Client library for file access
Talks to master to find chunk servers
Connects directly to chunkservers to access data
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Motivation for Map Reduce (Why)
Large-Scale Data Processing
Want to use 1000s of CPUs
But don’t want hassle of managing things
MapReduce Architecture provides
Automatic parallelization & distribution
Fault tolerance
I/O scheduling
Monitoring & status updates
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
MapReduce Paradigm
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
What is MapReduce?
Terms are borrowed from Functional Language (e.g., Lisp)
Sum of squares:
(map square ‘(1 2 3 4))
Output: (1 4 9 16)
[processes each record sequentially and independently]
(reduce + ‘(1 4 9 16))
(+ 16 (+ 9 (+ 4 1) ) )
Output: 30
[processes set of all records in batches]
Let’s consider a sample application: Wordcount
You are given a huge dataset (e.g., Wikipedia dump or all of
Shakespeare’s works) and asked to list the count for each of the
words in each of the documents therein
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Map
Process individual records to generate intermediate
key/value pairs.
Welcome Everyone
Hello Everyone
Welcome 1
Everyone 1
Hello 1
Everyone 1
Input <filename, file text>
Key Value
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Map
Parallelly Process individual records to generate
intermediate key/value pairs.
Welcome Everyone
Hello Everyone
Welcome 1
Everyone 1
Hello 1
Everyone 1
Input <filename, file text>
MAP TASK 1
MAP TASK 2
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Map
Parallelly Process a large number of individual
records to generate intermediate key/value pairs.
Welcome Everyone
Hello Everyone
Why are you here
I am also here
They are also here
Yes, it’s THEM!
The same people we were thinking of
…….
Welcome 1
Everyone 1
Hello 1
Everyone 1
Why 1
Are 1
You 1
Here 1
…….
Input <filename, file text>
MAP TASKS
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Reduce
Reduce processes and merges all intermediate values
associated per key
Welcome 1
Everyone 1
Hello 1
Everyone 1
Everyone 2
Hello 1
Welcome 1
Key Value
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Reduce
• Each key assigned to one Reduce
• Parallelly Processes and merges all intermediate values
by partitioning keys
• Popular: Hash partitioning, i.e., key is assigned to
– reduce # = hash(key)%number of reduce tasks
Welcome 1
Everyone 1
Hello 1
Everyone 1
Everyone 2
Hello 1
Welcome 1
REDUCE
TASK 1
REDUCE
TASK 2
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Programming Model
The computation takes a set of input key/value pairs, and
produces a set of output key/value pairs.
The user of the MapReduce library expresses the
computation as two functions:
(i) The Map
(ii) The Reduce
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
(i) Map Abstraction
Map, written by the user, takes an input pair and produces
a set of intermediate key/value pairs.
The MapReduce library groups together all intermediate
values associated with the same intermediate key ‘I’ and
passes them to the Reduce function.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
(ii) Reduce Abstraction
The Reduce function, also written by the user, accepts an
intermediate key ‘I’ and a set of values for that key.
It merges together these values to form a possibly smaller
set of values.
Typically just zero or one output value is produced per
Reduce invocation. The intermediate values are supplied to
the user's reduce function via an iterator.
This allows us to handle lists of values that are too large to
fit in memory.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Map-Reduce Functions for Word Count
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Map-Reduce Functions
Input: a set of key/value pairs
User supplies two functions:
map(k,v) → list(k1,v1)
reduce(k1, list(v1)) → v2
(k1,v1) is an intermediate key/value pair
Output is the set of (k1,v2) pairs
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
MapReduce Applications
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Applications
Here are a few simple applications of interesting programs that
can be easily expressed as MapReduce computations.
Distributed Grep: The map function emits a line if it matches a
supplied pattern. The reduce function is an identity function that
just copies the supplied intermediate data to the output.
Count of URL Access Frequency: The map function processes
logs of web page requests and outputs (URL; 1). The reduce
function adds together all values for the same URL and emits a
(URL; total count) pair.
ReverseWeb-Link Graph: The map function outputs (target;
source) pairs for each link to a target URL found in a page named
source. The reduce function concatenates the list of all source
URLs associated with a given target URL and emits the pair:
(target; list(source))
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Contd…
Term-Vector per Host: A term vector summarizes the
most important words that occur in a document or a set
of documents as a list of (word; frequency) pairs.
The map function emits a (hostname; term vector) pair
for each input document (where the hostname is
extracted from the URL of the document).
The reduce function is passed all per-document term
vectors for a given host. It adds these term vectors
together, throwing away infrequent terms, and then emits
a final (hostname; term vector) pair
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Contd…
Inverted Index: The map function parses each document,
and emits a sequence of (word; document ID) pairs. The
reduce function accepts all pairs for a given word, sorts
the corresponding document IDs and emits a (word;
list(document ID)) pair. The set of all output pairs forms a
simple inverted index. It is easy to augment this
computation to keep track of word positions.
Distributed Sort: The map function extracts the key from
each record, and emits a (key; record) pair. The reduce
function emits all pairs unchanged.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Applications of MapReduce
(1) Distributed Grep:
Input: large set of files
Output: lines that match pattern
Map – Emits a line if it matches the supplied
pattern
Reduce – Copies the intermediate data to output
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Applications of MapReduce
(2) Reverse Web-Link Graph:
Input: Web graph: tuples (a, b)
where (page a → page b)
Output: For each page, list of pages that link to it
Map – process web log and for each input <source,
target>, it outputs <target, source>
Reduce - emits <target, list(source)>
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Applications of MapReduce
(3) Count of URL access frequency:
Input: Log of accessed URLs, e.g., from proxy server
Output: For each URL, % of total accesses for that URL
Map – Process web log and outputs <URL, 1>
Multiple Reducers - Emits <URL, URL_count>
(So far, like Wordcount. But still need %)
Chain another MapReduce job after above one
Map – Processes <URL, URL_count> and outputs
<1, (<URL, URL_count> )>
1 Reducer – Does two passes. In first pass, sums up all
URL_count’s to calculate overall_count. In second pass
calculates %’s
Emits multiple <URL, URL_count/overall_count>
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Applications of MapReduce
(4) Map task’s output is sorted (e.g., quicksort)
Reduce task’s input is sorted (e.g., mergesort)
Sort
Input: Series of (key, value) pairs
Output: Sorted <value>s
Map – <key, value> → <value, _> (identity)
Reducer – <key, value> → <key, value> (identity)
Partitioning function – partition keys across reducers
based on ranges (can’t use hashing!)
• Take data distribution into account to balance
reducer tasks
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
The YARN Scheduler
• Used underneath Hadoop 2.x +
• YARN = Yet Another Resource Negotiator
• Treats each server as a collection of containers
– Container = fixed CPU + fixed memory
• Has 3 main components
– Global Resource Manager (RM)
• Scheduling
– Per-server Node Manager (NM)
• Daemon and server-specific functions
– Per-application (job) Application Master (AM)
• Container negotiation with RM and NMs
• Detecting task failures of that job
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
YARN: How a job gets a container
Resource Manager
Capacity Scheduler
Node A
Application
Master 1
Node B
Node Manager B
Application
Master 2
Task
(App2)
2. Container Completed
1. Need
container
3. Container on Node B
In this figure
• 2 servers (A, B)
• 2 jobs (1, 2)
Node Manager A
4. Start task, please!
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
MapReduce Examples
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example: 1 Word Count using MapReduce
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Count Illustrated
map(key=url, val=contents):
For each word w in contents, emit (w, “1”)
reduce(key=word, values=uniq_counts):
Sum all “1”s in values list
Emit result “(word, sum)”
see bob run
see spot throw
see 1
bob 1
run 1
see 1
spot 1
throw 1
bob 1
run 1
see 2
spot 1
throw 1
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 2: Counting words of different lengths
The map function takes a value and outputs key:value
pairs.
For instance, if we define a map function that takes a
string and outputs the length of the word as the key and
the word itself as the value then
map(steve) would return 5:steve and
map(savannah) would return 8:savannah.
This allows us to run the map function against values in
parallel and provides a huge advantage.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 2: Counting words of different lengths
Before we get to the reduce function, the mapreduce
framework groups all of the values together by key, so if the
map functions output the following key:value pairs:
3 : the
3 : and
3 : you
4 : then
4 : what
4 : when
5 : steve
5 : where
8 : savannah
8 : research
They get grouped as:
3 : [the, and, you]
4 : [then, what, when]
5 : [steve, where]
8 : [savannah, research]
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 2: Counting words of different lengths
Each of these lines would then be passed as an argument
to the reduce function, which accepts a key and a list of
values.
In this instance, we might be trying to figure out how many
words of certain lengths exist, so our reduce function will
just count the number of items in the list and output the
key with the size of the list, like:
3 : 3
4 : 3
5 : 2
8 : 2
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 2: Counting words of different lengths
The reductions can also be done in parallel, again providing
a huge advantage. We can then look at these final results
and see that there were only two words of length 5 in the
corpus, etc...
The most common example of mapreduce is for counting
the number of times words occur in a corpus.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 3: Word Length Histogram
Abridged Declaration of Independence
A Declaration By the Representatives of the United States of America, in General Congress Assembled. When in the
course of human events it becomes necessary for a people to advance from that subordination in which they have
hitherto remained, and to assume among powers of the earth the equal and independent station to which the laws
of nature and of nature's god entitle them, a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the change. We hold these truths to be self-evident; that all men are
created equal and independent; that from that equal creation they derive rights inherent and inalienable, among
which are the preservation of life, and liberty, and the pursuit of happiness; that to secure these ends, governments
are instituted among men, deriving their just power from the consent of the governed; that whenever any form of
government shall become destructive of these ends, it is the right of the people to alter or to abolish it, and to
institute new government, laying it’s foundation on such principles and organizing it's power in such form, as to
them shall seem most likely to effect their safety and happiness. Prudence indeed will dictate that governments long
established should not be changed for light and transient causes: and accordingly all experience hath shewn that
mankind are more disposed to suffer while evils are sufferable, than to right themselves by abolishing the forms to
which they are accustomed. But when a long train of abuses and usurpations, begun at a distinguished period, and
pursuing invariably the same object, evinces a design to reduce them to arbitrary power, it is their right, it is their
duty, to throw off such government and to provide new guards for future security. Such has been the patient
sufferings of the colonies; and such is now the necessity which constrains them to expunge their former systems of
government. the history of his present majesty is a history of unremitting injuries and usurpations, among which no
one fact stands single or solitary to contradict the uniform tenor of the rest, all of which have in direct object the
establishment of an absolute tyranny over these states. To prove this, let facts be submitted to a candid world, for
the truth of which we pledge a faith yet unsullied by falsehood.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 3: Word Length Histogram
Abridged Declaration of Independence
A Declaration By the Representatives of the United States of America, in General Congress Assembled. When in the
course of human events it becomes necessary for a people to advance from that subordination in which they have
hitherto remained, and to assume among powers of the earth the equal and independent station to which the laws
of nature and of nature's god entitle them, a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the change. We hold these truths to be self-evident; that all men are
created equal and independent; that from that equal creation they derive rights inherent and inalienable, among
which are the preservation of life, and liberty, and the pursuit of happiness; that to secure these ends, governments
are instituted among men, deriving their just power from the consent of the governed; that whenever any form of
government shall become destructive of these ends, it is the right of the people to alter or to abolish it, and to
institute new government, laying it’s foundation on such principles and organizing it's power in such form, as to
them shall seem most likely to effect their safety and happiness. Prudence indeed will dictate that governments long
established should not be changed for light and transient causes: and accordingly all experience hath shewn that
mankind are more disposed to suffer while evils are sufferable, than to right themselves by abolishing the forms to
which they are accustomed. But when a long train of abuses and usurpations, begun at a distinguished period, and
pursuing invariably the same object, evinces a design to reduce them to arbitrary power, it is their right, it is their
duty, to throw off such government and to provide new guards for future security. Such has been the patient
sufferings of the colonies; and such is now the necessity which constrains them to expunge their former systems of
government. the history of his present majesty is a history of unremitting injuries and usurpations, among which no
one fact stands single or solitary to contradict the uniform tenor of the rest, all of which have in direct object the
establishment of an absolute tyranny over these states. To prove this, let facts be submitted to a candid world, for
the truth of which we pledge a faith yet unsullied by falsehood.
Big Data Computing
How many “big”, “medium” and “small” words, are used ?
Hadoop MapReduce 2.0
Vu Pham
Big = Yellow = 10+ letters
Medium = Red = 5..9 letters
Small = Blue = 2..4 letters
Tiny = Pink = 1 letter
Example 3: Word Length Histogram
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 3: Word Length Histogram
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 3: Word Length Histogram
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 3: Word Length Histogram
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 4: Build an Inverted Index
Big Data Computing
Input:
tweet1, (“I love pancakes for breakfast”)
tweet2, (“I dislike pancakes”)
tweet3, (“What should I eat for breakfast?”)
tweet4, (“I love to eat”)
Desired output:
“pancakes”, (tweet1, tweet2)
“breakfast”, (tweet1, tweet3)
“eat”, (tweet3, tweet4)
“love”, (tweet1, tweet4)
…
Hadoop MapReduce 2.0
Vu Pham
Example 5: Relational Join
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 5: Relational Join: Before Map Phase
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 5: Relational Join: Map Phase
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 5: Relational Join: Reduce Phase
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 5: Relational Join in MapReduce, again
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
Facebook has a list of friends (note that friends are a bi-directional
thing on Facebook. If I'm your friend, you're mine).
They also have lots of disk space and they serve hundreds of millions
of requests everyday. They've decided to pre-compute calculations
when they can to reduce the processing time of requests. One
common processing request is the "You and Joe have 230 friends in
common" feature.
When you visit someone's profile, you see a list of friends that you
have in common. This list doesn't change frequently so it'd be
wasteful to recalculate it every time you visited the profile (sure you
could use a decent caching strategy, but then we wouldn't be able to
continue writing about mapreduce for this problem).
We're going to use mapreduce so that we can calculate everyone's
common friends once a day and store those results. Later on it's just
a quick lookup. We've got lots of disk, it's cheap.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
Assume the friends are stored as Person->[List of Friends], our
friends list is then:
A -> B C D
B -> A C D E
C -> A B D E
D -> A B C E
E -> B C D
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
For map(A -> B C D) :
(A B) -> B C D
(A C) -> B C D
(A D) -> B C D
For map(B -> A C D E) : (Note that A comes before B in the key)
(A B) -> A C D E
(B C) -> A C D E
(B D) -> A C D E
(B E) -> A C D E
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
For map(C -> A B D E) :
(A C) -> A B D E
(B C) -> A B D E
(C D) -> A B D E
(C E) -> A B D E
For map(D -> A B C E) :
(A D) -> A B C E
(B D) -> A B C E
(C D) -> A B C E
(D E) -> A B C E
And finally for map(E -> B C D):
(B E) -> B C D
(C E) -> B C D
(D E) -> B C D
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
Before we send these key-value pairs to the reducers, we
group them by their keys and get:
(A B) -> (A C D E) (B C D)
(A C) -> (A B D E) (B C D)
(A D) -> (A B C E) (B C D)
(B C) -> (A B D E) (A C D E)
(B D) -> (A B C E) (A C D E)
(B E) -> (A C D E) (B C D)
(C D) -> (A B C E) (A B D E)
(C E) -> (A B D E) (B C D)
(D E) -> (A B C E) (B C D)
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
Each line will be passed as an argument to a reducer.
The reduce function will simply intersect the lists of values
and output the same key with the result of the intersection.
For example, reduce((A B) -> (A C D E) (B C D))
will output (A B) : (C D)
and means that friends A and B have C and D as common
friends.
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Example 6: Finding Friends
The result after reduction is:
(A B) -> (C D)
(A C) -> (B D)
(A D) -> (B C)
(B C) -> (A D E)
(B D) -> (A C E)
(B E) -> (C D)
(C D) -> (A B E)
(C E) -> (B D)
(D E) -> (B C)
Now when D visits B's profile,
we can quickly look up (B D) and
see that they have three friends
in common, (A C E).
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Reading
Jeffrey Dean and Sanjay Ghemawat,
“MapReduce: Simplified Data Processing on Large
Clusters”
http://labs.google.com/papers/mapreduce.html
Big Data Computing Hadoop MapReduce 2.0
Vu Pham
Parallel Programming with Spark
Dr. Rajiv Misra
Dept. of Computer Science & Engg.
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Big Data Computing Parallel Programming with Spark
Vu Pham
Preface
Content of this Lecture:
In this lecture, we will discuss:
Overview of Spark
Fundamentals of Scala & functional programming
Spark concepts
Spark operations
Job execution
Big Data Computing Parallel Programming with Spark
Vu Pham
Introduction to Spark
Big Data Computing Parallel Programming with Spark
Vu Pham
Fast, expressive cluster computing system compatible with
Apache Hadoop
Works with any Hadoop-supported storage system
(HDFS, S3,SequenceFile, Avro, …)
Improves efficiency through:
In-memory computing primitives
General computation graphs
Improves usability through:
Rich APIs in Java, Scala, Python
Interactive shell
Up to 100× faster
Often 2-10× less code
What is Spark?
Big Data Computing Parallel Programming with Spark
Vu Pham
Local multicore: just a library in your program
EC2: scripts for launching a Spark cluster
Private cluster: Mesos, YARN, Standalone Mode
How to Run It
Big Data Computing Parallel Programming with Spark
Vu Pham
Scala vs Java APIs
Spark originally written in Scala, which allows concise
function syntax and interactive use
APIs in Java, Scala and Python
Interactive shells in Scala and Python
Big Data Computing Parallel Programming with Spark
Vu Pham
Introduction to Scala &
functional programming
Big Data Computing Parallel Programming with Spark
Vu Pham
High-level language for the Java VM
Object-oriented + functional programming
Statically typed
Comparable in speed to Java
But often no need to write types due to type inference
Interoperates with Java
Can use any Java class, inherit from it, etc; can also call Scala code
from Java
About Scala
Big Data Computing Parallel Programming with Spark
Vu Pham
Interactive shell: just type scala
Supports importing libraries, tab completion and all
constructs in the language.
Best Way to Learn Scala
Big Data Computing Parallel Programming with Spark
Vu Pham
Quick Tour
Declaring variables:
var x: Int = 7
var x = 7 // type inferred
val y = “hi” // read-only
Java equivalent:
int x = 7;
final String y = “hi”;
Functions:
def square(x: Int): Int = x*x
def square(x: Int): Int = {
x*x
}
def announce(text: String) {
println(text)
}
Java equivalent:
int square(int x) {
return x*x;
}
void announce(String text) {
System.out.println(text);
}
Last expression in block returned
Big Data Computing Parallel Programming with Spark
Vu Pham
Quick Tour
Generic types:
var arr = new Array[Int](8)
var lst = List(1, 2, 3)
// type of lst is List[Int]
Java equivalent:
int[] arr = new int[8];
List<Integer> lst =
new ArrayList<Integer>();
lst.add(...)
Indexing:
arr(5) = 7
println(lst(5))
Java equivalent:
arr[5] = 7;
System.out.println(lst.get(5));
Factory method
Can’t hold primitive types
Big Data Computing Parallel Programming with Spark
Vu Pham
Processing collections with functional programming:
val list = List(1, 2, 3)
list.foreach(x => println(x)) // prints 1, 2, 3
list.foreach(println) // same
list.map(x => x + 2) // => List(3, 4, 5)
list.map(_ + 2) // same, with placeholder notation
list.filter(x => x % 2 == 1) // => List(1, 3)
list.filter(_ % 2 == 1) // => List(1, 3)
list.reduce((x, y) => x + y) // => 6
list.reduce(_ + _) // => 6
Function expression (closure)
All of these leave the list unchanged (List is immutable)
Quick Tour
Big Data Computing Parallel Programming with Spark
Vu Pham
Scala Closure Syntax
(x: Int) => x + 2 // full version
x => x + 2 // type inferred
_ + 2 // when each argument is used exactly once
x => { // when body is a block of code
val numberToAdd = 2
x + numberToAdd
}
// If closure is too long, can always pass a function
def addTwo(x: Int): Int = x + 2
list.map(addTwo)
Scala allows defining a “local
function” inside another function
Big Data Computing Parallel Programming with Spark
Vu Pham
Other Collection Methods
Scala collections provide many other functional
methods; for example, Google for “Scala Seq”
Method on Seq[T] Explanation
map(f: T => U): Seq[U] Pass each element through f
flatMap(f: T => Seq[U]): Seq[U] One-to-many map
filter(f: T => Boolean): Seq[T] Keep elements passing f
exists(f: T => Boolean): Boolean True if one element passes
forall(f: T => Boolean): Boolean True if all elements pass
reduce(f: (T, T) => T): T Merge elements using f
groupBy(f: T => K): Map[K,List[T]] Group elements by f(element)
sortBy(f: T => K): Seq[T] Sort elements by f(element)
. . .
Big Data Computing Parallel Programming with Spark
Vu Pham
Spark Concepts
Big Data Computing Parallel Programming with Spark
Vu Pham
Spark Overview
Goal: Work with distributed collections as you would
with local ones
Concept: resilient distributed datasets (RDDs)
Immutable collections of objects spread across a
cluster
Built through parallel transformations (map, filter, etc)
Automatically rebuilt on failure
Controllable persistence (e.g. caching in RAM)
Big Data Computing Parallel Programming with Spark
Vu Pham
Main Primitives
Resilient distributed datasets (RDDs)
Immutable, partitioned collections of objects
Transformations (e.g. map, filter, groupBy, join)
Lazy operations to build RDDs from other RDDs
Actions (e.g. count, collect, save)
Return a result or write it to storage
Big Data Computing Parallel Programming with Spark
Vu Pham
lines = spark.textFile(“hdfs://...”)
errors = lines.filter(lambda s: s.startswith(“ERROR”))
messages = errors.map(lambda s: s.split(‘t’)[2])
messages.cache()
Block 1
Block 2
Block 3
Worker
Worker
Worker
Driver
messages.filter(lambda s: “foo” in s).count()
messages.filter(lambda s: “bar” in s).count()
. . .
tasks
results
Cache 1
Cache 2
Cache 3
Base RDD
Transformed RDD
Action
Result: full-text search of Wikipedia in <1 sec
(vs 20 sec for on-disk data)
Result: scaled to 1 TB data in 5-7 sec
(vs 170 sec for on-disk data)
Example: Mining Console Logs
Load error messages from a log into memory, then
interactively search for patterns
Big Data Computing Parallel Programming with Spark
Vu Pham
RDD Fault Tolerance
RDDs track the transformations used to build them
(their lineage) to recompute lost data
E.g:
messages = textFile(...).filter(lambda s: s.contains(“ERROR”))
.map(lambda s: s.split(‘t’)[2])
HadoopRDD
path = hdfs://…
FilteredRDD
func = contains(...)
MappedRDD
func = split(…)
Big Data Computing Parallel Programming with Spark
Vu Pham
Fault Recovery Test
119
57 56 58 58
81
57 59 57 59
0
50
100
150
1 2 3 4 5 6 7 8 9 10
Iteratrion
time
(s)
Iteration
Failure happens
Big Data Computing Parallel Programming with Spark
Vu Pham
Behavior with Less RAM
Big Data Computing Parallel Programming with Spark
Vu Pham
Which Language Should I Use?
Standalone programs can be written in any, but
console is only Python & Scala
Python developers: can stay with Python for both
Java developers: consider using Scala for console (to
learn the API)
Performance: Java / Scala will be faster (statically
typed), but Python can do well for numerical work
with NumPy
Big Data Computing Parallel Programming with Spark
Vu Pham
Tour of Spark operations
Big Data Computing Parallel Programming with Spark
Vu Pham
Learning Spark
Easiest way: Spark interpreter (spark-shell or
pyspark)
Special Scala and Python consoles for cluster use
Runs in local mode on 1 thread by default, but can control
with MASTER environment var:
MASTER=local ./spark-shell # local, 1 thread
MASTER=local[2] ./spark-shell # local, 2 threads
MASTER=spark://host:port ./spark-shell # Spark standalone
cluster
Big Data Computing Parallel Programming with Spark
Vu Pham
Main entry point to Spark functionality
Created for you in Spark shells as variable sc
In standalone programs, you’d make your own (see
later for details)
First Stop: SparkContext
Big Data Computing Parallel Programming with Spark
Vu Pham
Creating RDDs
# Turn a local collection into an RDD
sc.parallelize([1, 2, 3])
# Load text file from local FS, HDFS, or S3
sc.textFile(“file.txt”)
sc.textFile(“directory/*.txt”)
sc.textFile(“hdfs://namenode:9000/path/file”)
# Use any existing Hadoop InputFormat
sc.hadoopFile(keyClass, valClass, inputFmt,
conf)
Big Data Computing Parallel Programming with Spark
Vu Pham
Basic Transformations
nums = sc.parallelize([1, 2, 3])
# Pass each element through a function
squares = nums.map(lambda x: x*x) # => {1,
4, 9}
# Keep elements passing a predicate
even = squares.filter(lambda x: x % 2 == 0) #
=> {4}
# Map each element to zero or more others
nums.flatMap(lambda x: range(0, x)) # => {0,
0, 1, 0, 1, 2} Range object (sequence of
numbers 0, 1, …, x-1)
Big Data Computing Parallel Programming with Spark
Vu Pham
nums = sc.parallelize([1, 2, 3])
# Retrieve RDD contents as a local collection
nums.collect() # => [1, 2, 3]
# Return first K elements
nums.take(2) # => [1, 2]
# Count number of elements
nums.count() # => 3
# Merge elements with an associative function
nums.reduce(lambda x, y: x + y) # => 6
# Write elements to a text file
nums.saveAsTextFile(“hdfs://file.txt”)
Basic Actions
Big Data Computing Parallel Programming with Spark
Vu Pham
Spark’s “distributed reduce” transformations act on RDDs
of key-value pairs
Python: pair = (a, b)
pair[0] # => a
pair[1] # => b
Scala: val pair = (a, b)
pair._1 // => a
pair._2 // => b
Java: Tuple2 pair = new Tuple2(a, b);
// class scala.Tuple2
pair._1 // => a
pair._2 // => b
Working with Key-Value Pairs
Big Data Computing Parallel Programming with Spark
Vu Pham
Some Key-Value Operations
pets = sc.parallelize([(“cat”, 1), (“dog”, 1),
(“cat”, 2)])
pets.reduceByKey(lambda x, y: x + y)
# => {(cat, 3), (dog, 1)}
pets.groupByKey()
# => {(cat, Seq(1, 2)), (dog, Seq(1)}
pets.sortByKey()
# => {(cat, 1), (cat, 2), (dog, 1)}
reduceByKey also automatically implements combiners on the
map side
Big Data Computing Parallel Programming with Spark
Vu Pham
lines = sc.textFile(“hamlet.txt”)
counts = lines.flatMap(lambda line: line.split(“ ”)) 
.map(lambda word: (word, 1)) 
.reduceByKey(lambda x, y: x + y)
“to be or”
“not to be”
“to”
“be”
“or”
“not”
“to”
“be”
(to, 1)
(be, 1)
(or, 1)
(not, 1)
(to, 1)
(be, 1)
(be, 2)
(not, 1)
(or, 1)
(to, 2)
Example: Word Count
Big Data Computing Parallel Programming with Spark
Vu Pham
Other Key-Value Operations
val visits = sc.parallelize(List(
(“index.html”, “1.2.3.4”),
(“about.html”, “3.4.5.6”),
(“index.html”, “1.3.3.1”)))
val pageNames = sc.parallelize(List(
(“index.html”, “Home”), (“about.html”, “About”)))
visits.join(pageNames)
// (“index.html”, (“1.2.3.4”, “Home”))
// (“index.html”, (“1.3.3.1”, “Home”))
// (“about.html”, (“3.4.5.6”, “About”))
visits.cogroup(pageNames)
// (“index.html”, (Seq(“1.2.3.4”, “1.3.3.1”),
Seq(“Home”)))
// (“about.html”, (Seq(“3.4.5.6”), Seq(“About”)))
Big Data Computing Parallel Programming with Spark
Vu Pham
visits = sc.parallelize([(“index.html”, “1.2.3.4”),
(“about.html”, “3.4.5.6”),
(“index.html”, “1.3.3.1”)])
pageNames = sc.parallelize([(“index.html”, “Home”),
(“about.html”, “About”)])
visits.join(pageNames)
# (“index.html”, (“1.2.3.4”, “Home”))
# (“index.html”, (“1.3.3.1”, “Home”))
# (“about.html”, (“3.4.5.6”, “About”))
visits.cogroup(pageNames)
# (“index.html”, (Seq(“1.2.3.4”, “1.3.3.1”), Seq(“Home”)))
# (“about.html”, (Seq(“3.4.5.6”), Seq(“About”)))
Multiple Datasets
Big Data Computing Parallel Programming with Spark
Vu Pham
Controlling the Level of Parallelism
All the pair RDD operations take an optional second
parameter for number of tasks
words.reduceByKey(lambda x, y: x + y, 5)
words.groupByKey(5)
visits.join(pageViews, 5)
Can also set spark.default.parallelism
property
Big Data Computing Parallel Programming with Spark
Vu Pham
External variables you use in a closure will automatically
be shipped to the cluster:
query = raw_input(“Enter a query:”)
pages.filter(lambda x:
x.startswith(query)).count()
Some caveats:
Each task gets a new copy (updates aren’t sent back)
Variable must be Serializable (Java/Scala) or Pickle-able
(Python)
Don’t use fields of an outer object (ships all of it!)
Using Local Variables
Big Data Computing Parallel Programming with Spark
Vu Pham
class MyCoolRddApp {
val param = 3.14
val log = new Log(...)
...
def work(rdd: RDD[Int]) {
rdd.map(x => x + param)
.reduce(...)
}
}
How to get around it:
class MyCoolRddApp {
...
def work(rdd: RDD[Int]) {
val param_ = param
rdd.map(x => x + param_)
.reduce(...)
}
}
NotSerializableException:
MyCoolRddApp (or Log) References only local variable
instead of this.param
Closure Mishap Example
Big Data Computing Parallel Programming with Spark
Vu Pham
Other RDD Operations
sample(): deterministically sample a subset
union(): merge two RDDs
cartesian(): cross product
pipe(): pass through external program
See Programming Guide for more:
www.spark-project.org/documentation.html
Big Data Computing Parallel Programming with Spark
Vu Pham
Spark supports lots of other operations!
Full programming guide: spark-project.org/documentation
More Details
Big Data Computing Parallel Programming with Spark
Vu Pham
Job execution
Big Data Computing Parallel Programming with Spark
Vu Pham
Software Components
Spark runs as a library in your
program
(one instance per app)
Runs tasks locally or on a
cluster
Standalone deploy cluster,
Mesos or YARN
Accesses storage via Hadoop
InputFormat API
Can use HBase, HDFS, S3, …
Your application
SparkContext
Local
threads
Cluster
manager
Worker Worker
HDFS or other storage
Spark
executor
Spark
executor
Big Data Computing Parallel Programming with Spark
Vu Pham
join
filter
groupBy
Stage 3
Stage 1
Stage 2
A: B:
C: D: E:
F:
= cached partition
= RDD
map
Task Scheduler
Supports general task
graphs
Pipelines functions
where possible
Cache-aware data
reuse & locality
Partitioning-aware to
avoid shuffles
Big Data Computing Parallel Programming with Spark
Vu Pham
More Information
Scala resources:
www.artima.com/scalazine/articles/steps.html
(First Steps to Scala)
www.artima.com/pins1ed (free book)
Spark documentation: www.spark-
project.org/documentation.html
Big Data Computing Parallel Programming with Spark
Vu Pham
Spark can read/write to any storage system / format that
has a plugin for Hadoop!
Examples: HDFS, S3, HBase, Cassandra, Avro,
SequenceFile
Reuses Hadoop’s InputFormat and OutputFormat APIs
APIs like SparkContext.textFile support filesystems, while
SparkContext.hadoopRDD allows passing any Hadoop
JobConf to configure an input source
Hadoop Compatibility
Big Data Computing Parallel Programming with Spark
Vu Pham
import spark.api.java.JavaSparkContext;
JavaSparkContext sc = new JavaSparkContext(
“masterUrl”, “name”, “sparkHome”, new String[] {“app.jar”}));
import spark.SparkContext
import spark.SparkContext._
val sc = new SparkContext(“masterUrl”, “name”, “sparkHome”, Seq(“app.jar”))
Cluster URL, or local /
local[N]
App
name
Spark install path
on cluster
List of JARs with
app code (to ship)
Create a SparkContext
Scala
Java
from pyspark import SparkContext
sc = SparkContext(“masterUrl”, “name”, “sparkHome”, [“library.py”]))
Python
Big Data Computing Parallel Programming with Spark
Vu Pham
import spark.SparkContext
import spark.SparkContext._
object WordCount {
def main(args: Array[String]) {
val sc = new SparkContext(“local”, “WordCount”, args(0), Seq(args(1)))
val lines = sc.textFile(args(2))
lines.flatMap(_.split(“ ”))
.map(word => (word, 1))
.reduceByKey(_ + _)
.saveAsTextFile(args(3))
}
}
Complete App: Scala
Big Data Computing Parallel Programming with Spark
Vu Pham
import sys
from pyspark import SparkContext
if __name__ == "__main__":
sc = SparkContext( “local”, “WordCount”, sys.argv[0], None)
lines = sc.textFile(sys.argv[1])
lines.flatMap(lambda s: s.split(“ ”)) 
.map(lambda word: (word, 1)) 
.reduceByKey(lambda x, y: x + y) 
.saveAsTextFile(sys.argv[2])
Complete App: Python
Big Data Computing Parallel Programming with Spark
Vu Pham
Example: PageRank
Big Data Computing Parallel Programming with Spark
Vu Pham
Why PageRank?
Good example of a more complex algorithm
Multiple stages of map & reduce
Benefits from Spark’s in-memory caching
Multiple iterations over the same data
Big Data Computing Parallel Programming with Spark
Vu Pham
Basic Idea
Give pages ranks (scores) based on links to
them
Links from many pages ➔ high rank
Link from a high-rank page ➔ high rank
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1.0 1.0
1.0
1.0
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 ×
contribs
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 ×
contribs
1.0 1.0
1.0
1.0
1
0.5
0.5
0.5
1
0.5
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 ×
contribs
0.58 1.0
1.85
0.58
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 ×
contribs
0.58
0.29
0.29
0.5
1.85
0.58 1.0
1.85
0.58
0.5
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 ×
contribs
0.39 1.72
1.31
0.58
. . .
Big Data Computing Parallel Programming with Spark
Vu Pham
Algorithm
1. Start each page at a rank of 1
2. On each iteration, have page p contribute
rankp / |neighborsp| to its neighbors
3. Set each page’s rank to 0.15 + 0.85 × contribs
0.46 1.37
1.44
0.73
Final state:
Big Data Computing Parallel Programming with Spark
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021

More Related Content

What's hot

Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 

What's hot (20)

Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Case study
Case studyCase study
Case study
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big data
Big dataBig data
Big data
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Ppt
PptPpt
Ppt
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 

Similar to NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021

Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Big data analytics
Big data analyticsBig data analytics
Big data analyticsjeyaperumal
 
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.pptSwapnilTelrandhe1
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementSimen Smaaberg
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxAravind Reddy
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big DataLuca Naso
 

Similar to NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021 (20)

Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
L21 Big Data and Analytics
L21 Big Data and AnalyticsL21 Big Data and Analytics
L21 Big Data and Analytics
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.ppt
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA 2021

  • 1. Vu Pham Introduction to Big Data Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Introduction to Big Data
  • 2. Vu Pham Preface Content of this Lecture: In this lecture, we will discuss a brief introduction to Big Data: Why Big Data, Where did it come from?, Challenges and applications of Big Data, Characteristics of Big Data i.e. Volume, Velocity, Variety and more V’s. Big Data Computing Introduction to Big Data
  • 3. Vu Pham What’s Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.” Big Data Computing Introduction to Big Data
  • 4. Vu Pham Walmart handles 1 million customer transactions/hour. Facebook handles 40 billion photos from its user base! Facebook inserts 500 terabytes of new data every day. Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data. A flight generates 240 terabytes of flight data in 6-8 hours of flight. More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. Decoding the human genome originally took 10 years to process; now it can be achieved in one week.8 The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records. Facts and Figures Big Data Computing Introduction to Big Data
  • 5. Vu Pham Byte: One grain of rice KB(3): One cup of rice: MB (6): 8 bags of rice: Desktop GB (9): 3 Semi trucks of rice: TB (12): 2 container ships of rice Internet PB (15): Blankets ½ of Jaipur Exabyte (18): Blankets West coast Big Data Or 1/4th of India Zettabyte (21): Fills Pacific Ocean Future Yottabyte(24): An earth-sized rice bowl Brontobyte (27): Astronomical size An Insight Big Data Computing Introduction to Big Data
  • 6. Vu Pham What’s making so much data? Sources: People, machine, organization: Ubiquitous computing More people carrying data-generating devices (Mobile phones with facebook, GPS, Cameras, etc.) Data on the Internet: Internet live stats http://www.internetlivestats.com/ Big Data Computing Introduction to Big Data
  • 7. Vu Pham Source of Data Generation 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014 12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day Big Data Computing Introduction to Big Data
  • 8. Vu Pham Crowdsourcing An Example of Big Data at Work Big Data Computing Introduction to Big Data
  • 9. Vu Pham Where is the problem? Traditional RDBMS queries isn't sufficient to get useful information out of the huge volume of data To search it with traditional tools to find out if a particular topic was trending would take so long that the result would be meaningless by the time it was computed. Big Data come up with a solution to store this data in novel ways in order to make it more accessible, and also to come up with methods of performing analysis on it. Big Data Computing Introduction to Big Data
  • 11. Vu Pham IBM considers Big Data (3V’s): The 3V’s: Volume, Velocity and Variety. Big Data Computing Introduction to Big Data
  • 12. Vu Pham Volume (Scale) Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes even Petabytes of information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption Big Data Computing Introduction to Big Data
  • 13. Vu Pham Volume (Scale) Data Volume 44x increase from 2009 2020 From 0.8 zettabytes to 35zb Data volume is increasing exponentially Exponential increase in collected/generated data Big Data Computing Introduction to Big Data
  • 14. Vu Pham CERN’s Large Hydron Collider (LHC) generates 15 PB a year Big Data Computing Introduction to Big Data Example 1: CERN’s Large Hydron Collider(LHC)
  • 15. Vu Pham Example 2: The Earthscope • The Earthscope is the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363 598/ns/technology_and_science- future_of_technology/#.TmetOdQ--uI) Big Data Computing Introduction to Big Data
  • 16. Vu Pham Velocity (Speed) Velocity: Sometimes 2 minutes is too late. For time- sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real- time to predict customer churn faster Big Data Computing Introduction to Big Data
  • 17. Vu Pham Examples: Velocity (Speed) Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions ➔ missing opportunities Examples E-Promotions: Based on your current location, your purchase history, what you like ➔ send promotions right now for store next to you Healthcare monitoring: sensors monitoring your activities and body ➔ any abnormal measurements require immediate reaction Big Data Computing Introduction to Big Data
  • 18. Vu Pham Real-time/Fast Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion Big Data Computing Introduction to Big Data
  • 19. Vu Pham Customer Influence Behavior Product Recommendations that are Relevant & Compelling Friend Invitations to join a Game or Activity that expands business Preventing Fraud as it is Occurring & preventing more proactively Learning why Customers Switch to competitors and their offers; in time to Counter Improving the Marketing Effectiveness of a Promotion while it is still in Play Real-Time Analytics/Decision Requirement Big Data Computing Introduction to Big Data
  • 20. Vu Pham Variety (Complexity) Variety: Big data is any type of data – Structured Data (example: tabular data) Unstructured –text, sensor data, audio, video Semi Structured : web data, log files Big Data Computing Introduction to Big Data
  • 21. Vu Pham Examples: Variety (Complexity) Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can only scan the data once A single application can be generating/collecting many types of data Big Public Data (online, weather, finance, etc) To extract knowledge➔ all these types of data need to linked together Big Data Computing Introduction to Big Data
  • 22. Vu Pham The 3 Big V’s (+1) Big 3V’s Volume Velocity Variety Plus 1 Value Big Data Computing Introduction to Big Data
  • 23. Vu Pham The 3 Big V’s (+1) (+ N more) Plus many more Veracity Validity Variability Viscosity & Volatility Viability, Venue, Vocabulary, Vagueness, … Big Data Computing Introduction to Big Data
  • 24. Vu Pham Big Data Computing Introduction to Big Data
  • 25. Vu Pham Value Integrating Data Reducing data complexity Increase data availability Unify your data systems All 3 above will lead to increased data collaboration -> add value to your big data Big Data Computing Introduction to Big Data
  • 26. Vu Pham Veracity Veracity refers to the biases ,noise and abnormality in data, trustworthiness of data. 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows. Big Data Computing Introduction to Big Data
  • 27. Vu Pham Valence Valence refers to the connectedness of big data. Such as in the form of graph networks Big Data Computing Introduction to Big Data
  • 28. Vu Pham Validity Accuracy and correctness of the data relative to a particular use Example: Gauging storm intensity satellite imagery vs social media posts prediction quality vs human impact Big Data Computing Introduction to Big Data
  • 29. Vu Pham Variability How the meaning of the data changes over time Language evolution Data availability Sampling processes Changes in characteristics of the data source Big Data Computing Introduction to Big Data
  • 30. Vu Pham Viscosity & Volatility Both related to velocity Viscosity: data velocity relative to timescale of event being studied Volatility: rate of data loss and stable lifetime of data Scientific data often has practically unlimited lifespan, but social / business data may evaporate in finite time Big Data Computing Introduction to Big Data
  • 31. Vu Pham More V’s Viability Which data has meaningful relations to questions of interest? Venue Where does the data live and how do you get it? Vocabulary Metadata describing structure, content, & provenance Schemas, semantics, ontologies, taxonomies, vocabularies Vagueness Confusion about what “Big Data” means Big Data Computing Introduction to Big Data
  • 32. Vu Pham Dealing with Volume Distill big data down to small information Parallel and automated analysis Automation requires standardization Standardize by reducing Variety: Format Standards Structure Big Data Computing Introduction to Big Data
  • 33. Vu Pham Harnessing Big Data OLTP: Online Transaction Processing (DBMSs) OLAP: Online Analytical Processing (Data Warehousing) RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) Big Data Computing Introduction to Big Data
  • 34. Vu Pham The Model Has Changed… The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data Big Data Computing Introduction to Big Data
  • 35. Vu Pham What’s driving Big Data - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time Big Data Computing Introduction to Big Data
  • 36. Vu Pham Big Data Analytics Big data is more real-time in nature than traditional Dataware house (DW) applications Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps Big Data Computing Introduction to Big Data
  • 37. Vu Pham Big Data Technology Big Data Computing Introduction to Big Data
  • 38. Vu Pham Conclusion In this lecture, we have defined Big Data and discussed the challenges and applications of Big Data. We have also described characteristics of Big Data i.e. Volume, Velocity, Variety and more V’s, Big Data Analytics, Big Data Landscape and Big Data Technology. Big Data Computing Introduction to Big Data
  • 39. Vu Pham Big Data Enabling Technologies Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Big Data Enabling Technologies
  • 40. Vu Pham Preface Content of this Lecture: In this lecture, we will discuss a brief introduction to Big Data Enabling Technologies. Big Data Computing Big Data Enabling Technologies
  • 41. Vu Pham Introduction Big Data is used for a collection of data sets so large and complex that it is difficult to process using traditional tools. A recent survey says that 80% of the data created in the world are unstructured. One challenge is how we can store and process this big amount of data. In this lecture, we will discuss the top technologies used to store and analyse Big Data. Big Data Computing Big Data Enabling Technologies
  • 42. Vu Pham Apache Hadoop Apache Hadoop is an open source software framework for big data. It has two basic parts: Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big data and distribute across many nodes in a cluster. a. Scaling out of H/W resources b. Fault Tolerant MapReduce: Programming model that simplifies parallel programming. a. Map-> apply () b. Reduce-> summarize () c. Google used MapReduce for Indexing websites. Big Data Computing Big Data Enabling Technologies
  • 43. Vu Pham Big Data Computing Big Data Enabling Technologies
  • 44. Vu Pham Big Data Computing Big Data Enabling Technologies
  • 45. Vu Pham Map Reduce MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key Big Data Computing Big Data Enabling Technologies
  • 46. Vu Pham Map Reduce Big Data Computing Big Data Enabling Technologies
  • 47. Vu Pham Hadoop Ecosystem Big Data Computing Big Data Enabling Technologies
  • 48. Vu Pham Hadoop Ecosystem Big Data Computing Big Data Enabling Technologies
  • 49. Vu Pham HDFS Architecture Big Data Computing Big Data Enabling Technologies
  • 50. Vu Pham YARN YARN – Yet Another Resource Manager. Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework. YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes. Big Data Computing Big Data Enabling Technologies
  • 51. Vu Pham YARN Architecture Big Data Computing Big Data Enabling Technologies
  • 52. Vu Pham Hive Hive is a distributed data management for Hadoop. It supports SQL-like query option HiveSQL (HSQL) to access big data. It can be primarily used for Data mining purpose. It runs on top of Hadoop. Big Data Computing Big Data Enabling Technologies
  • 53. Vu Pham Apache Spark Apache Spark is a big data analytics framework that was originally developed at the University of California, Berkeley's AMPLab, in 2012. Since then, it has gained a lot of attraction both in academia and in industry. Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. Apache Spark is a lightning-fast cluster computing technology, designed for fast computation Big Data Computing Big Data Enabling Technologies
  • 54. Vu Pham ZooKeeper is a highly reliable distributed coordination kernel, which can be used for distributed locking, configuration management, leadership election, work queues,…. Zookeeper is a replicated service that holds the metadata of distributed applications. Key attributed of such data Small size Performance sensitive Dynamic Critical In very simple words, it is a central store of key-value using which distributed systems can coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many machines. ZooKeeper https://zookeeper.apache.org/ Big Data Computing Big Data Enabling Technologies
  • 55. Vu Pham NoSQL While the traditional SQL can be effectively used to handle large amount of structured data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL databases store unstructured data with no particular schema Each row can have its own set of column values. NoSQL gives better performance in storing massive amount of data. Big Data Computing Big Data Enabling Technologies
  • 56. Vu Pham NoSQL Big Data Computing Big Data Enabling Technologies
  • 57. Vu Pham Cassandra Apache Cassandra is highly scalable, distributed and high-performance NoSQL database. Cassandra is designed to handle a huge amount of data. Cassandra handles the huge amount of data with its distributed architecture. Data is placed on different machines with more than one replication factor that provides high availability and no single point of failure. Big Data Computing Big Data Enabling Technologies
  • 58. Vu Pham Cassandra In the image above, circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node Big Data Computing Big Data Enabling Technologies
  • 59. Vu Pham HBase HBase is an open source, distributed database, developed by Apache Software foundation. Initially, it was Google Big Table, afterwards it was re- named as HBase and is primarily written in Java. HBase can store massive amounts of data from terabytes to petabytes. Big Data Computing Big Data Enabling Technologies
  • 60. Vu Pham HBase Big Data Computing Big Data Enabling Technologies
  • 61. Vu Pham Spark Streaming Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Streaming data input from HDFS, Kafka, Flume, TCP sockets, Kinesis, etc. Spark ML (Machine Learning) functions and GraphX graph processing algorithms are fully applicable to streaming data . Big Data Computing Big Data Enabling Technologies
  • 62. Vu Pham Spark Streaming Big Data Computing Big Data Enabling Technologies
  • 63. Vu Pham Kafka, Streaming Ecosystem Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation written in Scala and Java. Apache Kafka is an open source distributed streaming platform capable of handling trillions of events a day, Kafka is based on an abstraction of a distributed commit log Big Data Computing Big Data Enabling Technologies
  • 64. Vu Pham Kafka Big Data Computing Big Data Enabling Technologies
  • 65. Vu Pham Spark MLlib Spark MLlib is a distributed machine-learning framework on top of Spark Core. MLlib is Spark's scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction. Big Data Computing Big Data Enabling Technologies
  • 66. Vu Pham Spark MLlib Component Big Data Computing Big Data Enabling Technologies
  • 67. Vu Pham Spark GraphX GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new graph abstraction. GraphX reuses Spark RDD concept, simplifies graph analytics tasks, provides the ability to make operations on a directed multigraph with properties attached to each vertex and edge. Big Data Computing Big Data Enabling Technologies
  • 68. Vu Pham Spark GraphX GraphX is a thin layer on top of the Spark general-purpose dataflow framework (lines of code). Big Data Computing Big Data Enabling Technologies
  • 69. Vu Pham Conclusion In this lecture, we given a brief overview of following Big Data Enabling Technologies: Apache Hadoop Hadoop Ecosystem HDFS Architecture YARN NoSQL Hive Map Reduce Apache Spark Zookeeper Cassandra Hbase Spark Streaming Kafka Spark MLlib GraphX Big Data Computing Big Data Enabling Technologies
  • 70. Vu Pham Hadoop Stack for Big Data Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Big Data Hadoop Stack
  • 71. Vu Pham Preface Content of this Lecture: In this lecture, we will provide insight into Hadoop technologies opportunities and challenges for Big Data. We will also look into the Hadoop stack and applications and technologies associated with Big Data solutions. Big Data Computing Big Data Hadoop Stack
  • 72. Vu Pham Hadoop Beginnings Big Data Computing Big Data Hadoop Stack
  • 73. Vu Pham What is Hadoop ? Apache Hadoop is an open source software framework for storage and large scale processing of the data-sets on clusters of commodity hardware. Big Data Computing Big Data Hadoop Stack
  • 74. Vu Pham Hadoop Beginnings Hadoop was created by Doug Cutting and Mike Cafarella in 2005 It was originally developed to support distribution of the Nutch Search Engine Project. Doug, who was working at Yahoo at the time, who is now actually a chief architect at Cloudera, has named this project after his son’s toy elephant, Hadoop. Big Data Computing Big Data Hadoop Stack
  • 75. Vu Pham Moving Computation to Data Big Data Computing Big Data Hadoop Stack Hadoop started out as a simple batch processing framework. The idea behind Hadoop is that instead of moving data to computation, we move computation to data.
  • 76. Vu Pham Scalability Big Data Computing Big Data Hadoop Stack Scalability's at it's core of a Hadoop system. We have cheap computing storage. We can distribute and scale across very easily in a very cost effective manner.
  • 77. Vu Pham Reliability Hardware Failures Handles Automatically! Big Data Computing Big Data Hadoop Stack If we think about an individual machine or rack of machines, or a large cluster or super computer, they all fail at some point of time or some of their components will fail. These failures are so common that we have to account for them ahead of the time. And all of these are actually handled within the Hadoop framework system. So the Apache's Hadoop MapReduce and HDFS components were originally derived from the Google's MapReduce and Google's file system. Another very interesting thing that Hadoop brings is a new approach to data.
  • 78. Vu Pham New Approach to Data: Keep all data Big Data Computing Big Data Hadoop Stack A new approach is, we can keep all the data that we have, and we can take that data and analyze it in new interesting ways. We can do something that's called schema and read style. And we can actually allow new analysis. We can bring more data into simple algorithms, which has shown that with more granularity, you can actually achieve often better results in taking a small amount of data and then some really complex analytics on it.
  • 79. Vu Pham Apache Hadoop Framework & its Basic Modules Big Data Computing Big Data Hadoop Stack
  • 80. Vu Pham Hadoop Common: It contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS): It is a distributed file system that stores data on a commodity machine. Providing very high aggregate bandwidth across the entire cluster. Hadoop YARN: It is a resource management platform responsible for managing compute resources in the cluster and using them in order to schedule users and applications. Hadoop MapReduce: It is a programming model that scales data across a lot of different processes. Apache Framework Basic Modules Big Data Computing Big Data Hadoop Stack
  • 81. Vu Pham Apache Framework Basic Modules Big Data Computing Big Data Hadoop Stack
  • 82. Vu Pham High Level Architecture of Hadoop Big Data Computing Big Data Hadoop Stack Two major pieces of Hadoop are: Hadoop Distribute the File System and the MapReduce, a parallel processing framework that will map and reduce data. These are both open source and inspired by the technologies developed at Google. If we talk about this high level infrastructure, we start talking about things like TaskTrackers and JobTrackers, the NameNodes and DataNodes.
  • 83. Vu Pham HDFS Hadoop distributed file system Big Data Computing Big Data Hadoop Stack
  • 84. Vu Pham Distributed, scalable, and portable file-system written in Java for the Hadoop framework. Each node in Hadoop instance typically has a single name node, and a cluster of data nodes that formed this HDFS cluster. Each HDFS stores large files, typically in ranges of gigabytes to terabytes, and now petabytes, across multiple machines. And it can achieve reliability by replicating the cross multiple hosts, and therefore does not require any range storage on hosts. HDFS: Hadoop distributed file system Big Data Computing Big Data Hadoop Stack
  • 85. Vu Pham HDFS Big Data Computing Big Data Hadoop Stack
  • 86. Vu Pham HDFS Big Data Computing Big Data Hadoop Stack
  • 87. Vu Pham MapReduce Engine Big Data Computing Big Data Hadoop Stack The typical MapReduce engine will consist of a job tracker, to which client applications can submit MapReduce jobs, and this job tracker typically pushes work out to all the available task trackers, now it's in the cluster. Struggling to keep the word as close to the data as possible, as balanced as possible.
  • 88. Vu Pham Apache Hadoop NextGen MapReduce (YARN) Big Data Computing Big Data Hadoop Stack Yarn enhances the power of the Hadoop compute cluster, without being limited by the map produce kind of framework. It's scalability's great. The processing power and data centers continue to grow quickly, because the YARN research manager focuses exclusively on scheduling. It can manage those very large clusters quite quickly and easily. YARN is completely compatible with the MapReduce. Existing MapReduce application end users can run on top of the Yarn without disrupting any of their existing processes.
  • 89. Vu Pham Hadoop 1.0 vs. Hadoop 2.0 Big Data Computing Big Data Hadoop Stack Hadoop 2.0 provides a more general processing platform, that is not constraining to this map and reduce kinds of processes. The fundamental idea behind the MapReduce 2.0 is to split up two major functionalities of the job tracker, resource management, and the job scheduling and monitoring, and to do two separate units. The idea is to have a global resource manager, and per application master manager.
  • 90. Vu Pham Yarn enhances the power of the Hadoop compute cluster, without being limited by the map produce kind of framework. It's scalability's great. The processing power and data centers continue to grow quickly, because the YARN research manager focuses exclusively on scheduling. It can manage those very large clusters quite quickly and easily. YARN is completely compatible with the MapReduce. Existing MapReduce application end users can run on top of the Yarn without disrupting any of their existing processes. It does have a Improved cluster utilization as well. The resource manager is a pure schedule or they just optimize this cluster utilization according to the criteria such as capacity, guarantees, fairness, how to be fair, maybe different SLA's or service level agreements. What is Yarn ? Big Data Computing Big Data Hadoop Stack Scalability MapReduce Compatibility Improved cluster utilization
  • 91. Vu Pham It supports other work flows other than just map reduce. Now we can bring in additional programming models, such as graph process or iterative modeling, and now it's possible to process the data in your base. This is especially useful when we talk about machine learning applications. Yarn allows multiple access engines, either open source or proprietary, to use Hadoop as a common standard for either batch or interactive processing, and even real time engines that can simultaneous acts as a lot of different data, so you can put streaming kind of applications on top of YARN inside a Hadoop architecture, and seamlessly work and communicate between these environments. What is Yarn ? Big Data Computing Big Data Hadoop Stack Fairness Supports Other Workloads Iterative Modeling Machine Learning Multiple Access Engines
  • 92. Vu Pham The Hadoop “Zoo” Big Data Computing Big Data Hadoop Stack
  • 93. Vu Pham Apache Hadoop Ecosystem Big Data Computing Big Data Hadoop Stack
  • 94. Vu Pham Original Google Stack Big Data Computing Big Data Hadoop Stack Had their original MapReduce, and they were storing and processing large amounts of data. Like to be able to access that data and access it in a SQL like language. So they built the SQL gateway to adjust the data into the MapReduce cluster and be able to query some of that data as well.
  • 95. Vu Pham Original Google Stack Big Data Computing Big Data Hadoop Stack Then, they realized they needed a high-level specific language to access MapReduce in the cluster and submit some of those jobs. So Sawzall came along. Then, Evenflow came along and allowed to chain together complex work codes and coordinate events and service across this kind of a framework or the specific cluster they had at the time.
  • 96. Vu Pham Original Google Stack Big Data Computing Big Data Hadoop Stack Then, Dremel came along. Dremel was a columnar storage in the metadata manager that allows us to manage the data and is able to process a very large amount of unstructured data. Then Chubby came along as a coordination system that would manage all of the products in this one unit or one ecosystem that could process all these large amounts of structured data seamlessly.
  • 97. Vu Pham Facebook’s Version of the Stack Big Data Computing Big Data Hadoop Stack
  • 98. Vu Pham Yahoo Version of the Stack Big Data Computing Big Data Hadoop Stack
  • 99. Vu Pham LinkedIn’s Version of the Stack Big Data Computing Big Data Hadoop Stack
  • 100. Vu Pham Cloudera’s Version of the Stack Big Data Computing Big Data Hadoop Stack
  • 101. Vu Pham Hadoop Ecosystem Major Components Big Data Computing Big Data Hadoop Stack
  • 102. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 103. Vu Pham Tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases Apache Sqoop Big Data Computing Big Data Hadoop Stack
  • 104. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 105. Vu Pham Hbase is a key component of the Hadoop stack, as its design caters to applications that require really fast random access to significant data set. Column-oriented database management system Key-value store Based on Google Big Table Can hold extremely large data Dynamic data model Not a Relational DBMS HBASE Big Data Computing Big Data Hadoop Stack
  • 106. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 107. Vu Pham High level programming on top of Hadoop MapReduce The language: Pig Latin Data analysis problems as data flows Originally developed at Yahoo 2006 PIG Big Data Computing Big Data Hadoop Stack
  • 108. Vu Pham PIG for ETL Big Data Computing Big Data Hadoop Stack A good example of PIG applications is ETL transaction model that describes how a process will extract data from a source, transporting according to the rules set that we specify, and then load it into a data store. PIG can ingest data from files, streams, or any other sources using the UDF: a user-defined functions that we can write ourselves. When it has all the data it can perform, select, iterate and do kinds of transformations.
  • 109. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 110. Vu Pham Data warehouse software facilitates querying and managing large datasets residing in distributed storage SQL-like language! Facilitates querying and managing large datasets in HDFS Mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL Apache Hive Big Data Computing Big Data Hadoop Stack
  • 111. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 112. Vu Pham Workflow scheduler system to manage Apache Hadoop jobs Oozie Coordinator jobs! Supports MapReduce, Pig, Apache Hive, and Sqoop, etc. Oozie Workflow Big Data Computing Big Data Hadoop Stack
  • 113. Vu Pham Big Data Computing Big Data Hadoop Stack
  • 114. Vu Pham Provides operational services for a Hadoop cluster group services Centralized service for: maintaining configuration information naming services Providing distributed synchronization and providing group services Zookeeper Big Data Computing Big Data Hadoop Stack
  • 115. Vu Pham Flume Distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data It has a simple and very flexible architecture based on streaming data flows. It's quite robust and fall tolerant, and it's really tunable to enhance the reliability mechanisms, fail over, recovery, and all the other mechanisms that keep the cluster safe and reliable. It uses simple extensible data model that allows us to apply all kinds of online analytic applications. Big Data Computing Big Data Hadoop Stack
  • 116. Vu Pham Additional Cloudera Hadoop Components Impala Big Data Computing Big Data Hadoop Stack
  • 117. Vu Pham Cloudera, Impala was designed specifically at Cloudera, and it's a query engine that runs on top of the Apache Hadoop. The project was officially announced at the end of 2012, and became a publicly available, open source distribution. Impala brings scalable parallel database technology to Hadoop and allows users to submit low latencies queries to the data that's stored within the HDFS or the Hbase without acquiring a ton of data movement and manipulation. Impala is integrated with Hadoop, and it works within the same power system, within the same format metadata, all the security and reliability resources and management workflows. It brings that scalable parallel database technology on top of the Hadoop. It actually allows us to submit SQL like queries at much faster speeds with a lot less latency. Impala Big Data Computing Big Data Hadoop Stack
  • 118. Vu Pham Additional Cloudera Hadoop Components Spark The New Paradigm Big Data Computing Big Data Hadoop Stack
  • 119. Vu Pham Apache Spark™ is a fast and general engine for large-scale data processing Spark is a scalable data analytics platform that incorporates primitives for in-memory computing and therefore, is allowing to exercise some different performance advantages over traditional Hadoop's cluster storage system approach. And it's implemented and supports something called Scala language, and provides unique environment for data processing. Spark is really great for more complex kinds of analytics, and it's great at supporting machine learning libraries. It is yet again another open source computing frame work and it was originally developed at MP labs at the University of California Berkeley and it was later donated to the Apache software foundation where it remains today as well. Spark Big Data Computing Big Data Hadoop Stack
  • 120. Vu Pham In contrast to Hadoop's two stage disk based MapReduce paradigm Multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. Allows user programs to load data into a cluster's memory and query it repeatedly Spark is really well suited for these machined learning kinds of applications that often times have iterative sorting in memory kinds of computation. Spark requires a cluster management and a distributed storage system. So for the cluster management, Spark supports standalone native Spark clusters, or you can actually run Spark on top of a Hadoop yarn, or via patching mesas. For distributor storage, Spark can interface with any of the variety of storage systems, including the HDFS, Amazon S3. Spark Benefits Big Data Computing Big Data Hadoop Stack
  • 121. Vu Pham Conclusion In this lecture, we have discussed the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. Big Data Computing Big Data Hadoop Stack
  • 122. Vu Pham Hadoop Distributed File System (HDFS) Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Hadoop Distributed File System (HDFS)
  • 123. Vu Pham Preface Content of this Lecture: In this lecture, we will discuss design goals of HDFS, the read/write process to HDFS, the main configuration tuning parameters to control HDFS performance and robustness. Big Data Computing Hadoop Distributed File System (HDFS)
  • 124. Vu Pham Introduction Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and executing application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and IO bandwidth by simply adding commodity servers. Hadoop clusters at Yahoo! span 25,000 servers, and store 25 petabytes of application data, with the largest cluster being 3500 servers. One hundred other organizations worldwide report using Hadoop. Big Data Computing Hadoop Distributed File System (HDFS)
  • 125. Vu Pham Introduction Hadoop is an Apache project; all components are available via the Apache open source license. Yahoo! has developed and contributed to 80% of the core of Hadoop (HDFS and MapReduce). HBase was originally developed at Powerset, now a department at Microsoft. Hive was originated and developed at Facebook. Pig, ZooKeeper, and Chukwa were originated and developed at Yahoo! Avro was originated at Yahoo! and is being co-developed with Cloudera. Big Data Computing Hadoop Distributed File System (HDFS)
  • 126. Vu Pham Hadoop Project Components Big Data Computing HDFS Distributed file system MapReduce Distributed computation framework HBase Column-oriented table service Pig Dataflow language and parallel execution framework Hive Data warehouse infrastructure ZooKeeper Distributed coordination service Chukwa System for collecting management data Avro Data serialization system Hadoop Distributed File System (HDFS)
  • 127. Vu Pham HDFS Design Concepts Scalable distributed filesystem: So essentially, as you add disks you get scalable performance. And as you add more, you're adding a lot of disks, and that scales out the performance. Distributed data on local disks on several nodes. Low cost commodity hardware: A lot of performance out of it because you're aggregating performance. Big Data Computing Node 1 B1 Node 2 B2 Node n Bn … Hadoop Distributed File System (HDFS)
  • 128. Vu Pham HDFS Design Goals Hundreds/Thousands of nodes and disks: It means there's a higher probability of hardware failure. So the design needs to handle node/disk failures. Portability across heterogeneous hardware/software: Implementation across lots of different kinds of hardware and software. Handle large data sets: Need to handle terabytes to petabytes. Enable processing with high throughput Big Data Computing Hadoop Distributed File System (HDFS)
  • 129. Vu Pham Techniques to meet HDFS design goals Simplified coherency model: The idea is to write once and then read many times. And that simplifies the number of operations required to commit the write. Data replication: Helps to handle hardware failures. Try to spread the data, same piece of data on different nodes. Move computation close to the data: So you're not moving data around. That improves your performance and throughput. Relax POSIX requirements to increase the throughput. Big Data Computing Hadoop Distributed File System (HDFS)
  • 130. Vu Pham Basic architecture of HDFS Big Data Computing Hadoop Distributed File System (HDFS)
  • 131. Vu Pham HDFS Architecture: Key Components Single NameNode: A master server that manages the file system namespace and basically regulates access to these files from clients, and it also keeps track of where the data is on the DataNodes and where the blocks are distributed essentially. Multiple DataNodes: Typically one per node in a cluster. So you're basically using storage which is local. Basic Functions: Manage the storage on the DataNode. Read and write requests on the clients Block creation, deletion, and replication is all based on instructions from the NameNode. Big Data Computing Hadoop Distributed File System (HDFS)
  • 132. Vu Pham Original HDFS Design Single NameNode Multiple DataNodes Manage storage- blocks of data Serving read/write requests from clients Block creation, deletion, replication Big Data Computing Big Data Enabling Technologies
  • 133. Vu Pham HDFS in Hadoop 2 HDFS Federation: Basically what we are doing is trying to have multiple data nodes, and multiple name nodes. So that we can increase the name space data. So, if you recall from the first design you have essentially a single node handling all the namespace responsibilities. And you can imagine as you start having thousands of nodes that they'll not scale, and if you have billions of files, you will have scalability issues. So to address that, the federation aspect was brought in. That also brings performance improvements. Benefits: Increase namespace scalability Performance Isolation Big Data Computing Big Data Enabling Technologies
  • 134. Vu Pham HDFS in Hadoop 2 How its done Multiple Namenode servers Multiple namespaces Data is now stored in Block pools So there is a pool associated with each namenode or namespace. And these pools are essentially spread out over all the data nodes. Big Data Computing Big Data Enabling Technologies
  • 135. Vu Pham HDFS in Hadoop 2 High Availability- Redundant NameNodes Heterogeneous Storage and Archival Storage ARCHIVE, DISK, SSD, RAM_DISK Big Data Computing Big Data Enabling Technologies
  • 136. Vu Pham Federation: Block Pools Big Data Computing Big Data Enabling Technologies So, if you remember the original design you have one name space and a bunch of data nodes. So, the structure looks similar. You have a bunch of NameNodes, instead of one NameNode. And each of those NameNodes is essentially right into these pools, but the pools are spread out over the data nodes just like before. This is where the data is spread out. You can gloss over the different data nodes. So, the block pool is essentially the main thing that's different.
  • 137. Vu Pham HDFS Performance Measures Determine the number of blocks for a given file size, Key HDFS and system components that are affected by the block size. An impact of using a lot of small files on HDFS and system Big Data Computing Hadoop Distributed File System (HDFS)
  • 138. Vu Pham Recall: HDFS Architecture Distributed data on local disks on several nodes Big Data Computing Node 1 B1 Node 2 B2 Node n Bn … Hadoop Distributed File System (HDFS)
  • 139. Vu Pham HDFS Block Size Default block size is 64 megabytes. Good for large files! So a 10GB file will be broken into: 10 x 1024/64=160 blocks Big Data Computing Node 1 B1 Node 2 B2 Node n Bn … Hadoop Distributed File System (HDFS)
  • 140. Vu Pham Importance of No. of Blocks in a file NameNode memory usage: Every block that you create basically every file could be a lot of blocks as we saw in the previous case, 160 blocks. And if you have millions of files that's millions of objects essentially. And for each object, it uses a bit of memory on the NameNode, so that is a direct effect of the number of blocks. But if you have replication, then you have 3 times the number of blocks. Number of map tasks: Number of maps typically depends on the number of blocks being processed. Big Data Computing Hadoop Distributed File System (HDFS)
  • 141. Vu Pham Large No. of small files: Impact on Name node Memory usage: Typically, the usage is around 150 bytes per object. Now, if you have a billion objects, that's going to be like 300GB of memory. Network load: Number of checks with datanodes proportional to number of blocks Big Data Computing Hadoop Distributed File System (HDFS)
  • 142. Vu Pham Large No. of small files: Performance Impact Number of map tasks: Suppose we have 10GB of data to process and you have them all in lots of 32k file sizes? Then we will end up with 327680 map tasks. Huge list of tasks that are queued. The other impact of this is the map tasks, each time they spin up and spin down, there's a latency involved with that because you are starting up Java processes and stopping them. Inefficient disk I/O with small sizes Big Data Computing Hadoop Distributed File System (HDFS)
  • 143. Vu Pham HDFS optimized for large files Lots of small files is bad! Solution: Merge/Concatenate files Sequence files HBase, HIVE configuration CombineFileInputFormat Big Data Computing Hadoop Distributed File System (HDFS)
  • 144. Vu Pham Big Data Computing Read/Write Processes in HDFS Hadoop Distributed File System (HDFS)
  • 145. Vu Pham Read Process in HDFS Big Data Computing Hadoop Distributed File System (HDFS)
  • 146. Vu Pham Write Process in HDFS Big Data Computing Hadoop Distributed File System (HDFS)
  • 147. Vu Pham Big Data Computing HDFS Tuning Parameters Hadoop Distributed File System (HDFS)
  • 148. Vu Pham Overview Tuning parameters Specifically DFS Block size NameNode, DataNode system/dfs parameters. Big Data Computing Hadoop Distributed File System (HDFS)
  • 149. Vu Pham HDFS XML configuration files Tuning environment typically in HDFS XML configuration files, for example, in the hdfs-site.xml. This is more for system administrators of Hadoop clusters, but it's good to know what changes affect impact the performance, and especially if your trying things out on your own there some important parameters to keep in mind. Commercial vendors have GUI based management console Big Data Computing Hadoop Distributed File System (HDFS)
  • 150. Vu Pham HDFS Block Size Recall: impacts how much NameNode memory is used, number of map tasks that are showing up, and also have impacts on performance. Default 64 megabytes: Typically bumped up to 128 megabytes and can be changed based on workloads. The parameter that this changes dfs.blocksize or dfs.block.size. Big Data Computing Hadoop Distributed File System (HDFS)
  • 151. Vu Pham HDFS Replication Default replication is 3. Parameter: dfs.replication Tradeoffs: Lower it to reduce replication cost Less robust Higher replication can make data local to more workers Lower replication ➔ More space Big Data Computing Hadoop Distributed File System (HDFS)
  • 152. Vu Pham Lot of other parameters Various tunables for datanode, namenode. Examples: Dfs.datanode.handler.count (10): Sets the number of server threads on each datanode Dfs.namenode.fs-limits.max-blocks-per-file: Maximum number of blocks per file. Full List: http://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-hdfs/hdfs-default.xml Big Data Computing Hadoop Distributed File System (HDFS)
  • 153. Vu Pham Big Data Computing HDFS Performance and Robustness Hadoop Distributed File System (HDFS)
  • 154. Vu Pham Common Failures DataNode Failures: Server can fail, disk can crash, data corruption. Network Failures: Sometimes there's data corruption because of network issues or disk issue. So, all of that could lead to a failure in the DataNode aspect of HDFS. You could have network failures. So, you could have a network go down between a particular and the name node that can affect a lot of data nodes at the same time. NameNode Failures: Could have name node failures, disk failure on the name node itself or the name node itself could corrupt this process. Big Data Computing Hadoop Distributed File System (HDFS)
  • 155. Vu Pham HDFS Robustness NameNode receives heartbeat and block reports from DataNodes Big Data Computing Hadoop Distributed File System (HDFS)
  • 156. Vu Pham Mitigation of common failures Periodic heartbeat: from DataNode to NameNode. DataNodes without recent heartbeat: Mark the data. And any new I/O that comes up is not going to be sent to that data node. Also remember that NameNode has information on all the replication information for the files on the file system. So, if it knows that a datanode fails which blocks will follow that replication factor. Now this replication factor is set for the entire system and also you could set it for particular file when you're writing the file. Either way, the NameNode knows which blocks fall below replication factor. And it will restart the process to re-replicate. Big Data Computing Hadoop Distributed File System (HDFS)
  • 157. Vu Pham Mitigation of common failures Checksum computed on file creation. Checksums stored in HDFS namespace. Used to check retrieved data. Re-read from alternate replica Big Data Computing Hadoop Distributed File System (HDFS)
  • 158. Vu Pham Mitigation of common failures Multiple copies of central meta data structures. Failover to standby NameNode- manual by default. Big Data Computing Hadoop Distributed File System (HDFS)
  • 159. Vu Pham Performance Changing blocksize and replication factor can improve performance. Example: Distributed copy Hadoop distcp allows parallel transfer of files. Big Data Computing Hadoop Distributed File System (HDFS)
  • 160. Vu Pham Replication trade off with respect to robustness One performance tradeoff is, actually when you go out to do some of the map reduce jobs, having replicas gives additional locality possibilities, but the big trade off is the robustness. In this case, we said no replicas. Might lose a node or a local disk: can't recover because there is no replication. Similarly, with data corruption, if you get a checksum that's bad, now you can't recover because you don't have a replica. Other parameters changes can have similar effects. Big Data Computing Hadoop Distributed File System (HDFS)
  • 161. Vu Pham Conclusion In this lecture, we have discussed design goals of HDFS, the read/write process to HDFS, the main configuration tuning parameters to control HDFS performance and robustness. Big Data Computing Hadoop Distributed File System (HDFS)
  • 162. Vu Pham Hadoop MapReduce 1.0 Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Hadoop MapReduce 1.0
  • 163. Vu Pham What is Map Reduce MapReduce is the execution engine of Hadoop. Big Data Computing Hadoop MapReduce 1.0
  • 164. Vu Pham Map Reduce Components The Job Tracker Task Tracker Big Data Computing Hadoop MapReduce 1.0
  • 165. Vu Pham The Job Tracker Big Data Computing The Job Tracker is hosted inside the master and it receives the job execution request from the client. Its main duties are to break down the receive job that is big computations in small parts allocate the partial computations that is tasks to the slave nodes monitoring the progress and report of task execution from the slave. The unit of execution is job. Hadoop MapReduce 1.0
  • 166. Vu Pham The Task Tracker Big Data Computing Task tracker is the MapReduce component on the slave machine as there are multiple slave machines. Many task trackers are available in a cluster its duty is to perform computation given by job tracker on the data available on the slave machine. The task tracker will communicate the progress and report the results to the job tracker. The master node contains the job tracker and name node whereas all slaves contain the task tracker and data node. Hadoop MapReduce 1.0
  • 167. Vu Pham Execution Steps Big Data Computing Step-1The client submits the job to Job Tracker Step-2 Job Tracker asks Name node the location of data Step-3 As per the reply from name node, the Job Tracker ask respective task trackers to execute the task on their data Step-4 All the results are stored on some Data Node and the Name Node is informed about the same. Step-5 The task trackers inform the job completion and progress to Job Tracker Step-6 The Job Tracker inform the completion to client Step-7 Client contacts the Name Node and retrieve the results Hadoop MapReduce 1.0
  • 168. Vu Pham Hadoop MapReduce 2.0 Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Hadoop MapReduce 2.0
  • 169. Vu Pham Preface Content of this Lecture: In this lecture, we will discuss the ‘MapReduce paradigm’ and its internal working and implementation overview. We will also see many examples and different applications of MapReduce being used, and look into how the ‘scheduling and fault tolerance’ works inside MapReduce. Big Data Computing Hadoop MapReduce 2.0
  • 170. Vu Pham Introduction MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. Big Data Computing Hadoop MapReduce 2.0
  • 171. Vu Pham Contd… Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. A typical MapReduce computation processes many terabytes of data on thousands of machines. Hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day. Big Data Computing Hadoop MapReduce 2.0
  • 172. Vu Pham Distributed File System Chunk Servers File is split into contiguous chunks Typically each chunk is 16-64MB Each chunk replicated (usually 2x or 3x) Try to keep replicas in different racks Master node Also known as Name Nodes in HDFS Stores metadata Might be replicated Client library for file access Talks to master to find chunk servers Connects directly to chunkservers to access data Big Data Computing Hadoop MapReduce 2.0
  • 173. Vu Pham Motivation for Map Reduce (Why) Large-Scale Data Processing Want to use 1000s of CPUs But don’t want hassle of managing things MapReduce Architecture provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates Big Data Computing Hadoop MapReduce 2.0
  • 174. Vu Pham MapReduce Paradigm Big Data Computing Hadoop MapReduce 2.0
  • 175. Vu Pham What is MapReduce? Terms are borrowed from Functional Language (e.g., Lisp) Sum of squares: (map square ‘(1 2 3 4)) Output: (1 4 9 16) [processes each record sequentially and independently] (reduce + ‘(1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) Output: 30 [processes set of all records in batches] Let’s consider a sample application: Wordcount You are given a huge dataset (e.g., Wikipedia dump or all of Shakespeare’s works) and asked to list the count for each of the words in each of the documents therein Big Data Computing Hadoop MapReduce 2.0
  • 176. Vu Pham Map Process individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Welcome 1 Everyone 1 Hello 1 Everyone 1 Input <filename, file text> Key Value Big Data Computing Hadoop MapReduce 2.0
  • 177. Vu Pham Map Parallelly Process individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Welcome 1 Everyone 1 Hello 1 Everyone 1 Input <filename, file text> MAP TASK 1 MAP TASK 2 Big Data Computing Hadoop MapReduce 2.0
  • 178. Vu Pham Map Parallelly Process a large number of individual records to generate intermediate key/value pairs. Welcome Everyone Hello Everyone Why are you here I am also here They are also here Yes, it’s THEM! The same people we were thinking of ……. Welcome 1 Everyone 1 Hello 1 Everyone 1 Why 1 Are 1 You 1 Here 1 ……. Input <filename, file text> MAP TASKS Big Data Computing Hadoop MapReduce 2.0
  • 179. Vu Pham Reduce Reduce processes and merges all intermediate values associated per key Welcome 1 Everyone 1 Hello 1 Everyone 1 Everyone 2 Hello 1 Welcome 1 Key Value Big Data Computing Hadoop MapReduce 2.0
  • 180. Vu Pham Reduce • Each key assigned to one Reduce • Parallelly Processes and merges all intermediate values by partitioning keys • Popular: Hash partitioning, i.e., key is assigned to – reduce # = hash(key)%number of reduce tasks Welcome 1 Everyone 1 Hello 1 Everyone 1 Everyone 2 Hello 1 Welcome 1 REDUCE TASK 1 REDUCE TASK 2 Big Data Computing Hadoop MapReduce 2.0
  • 181. Vu Pham Programming Model The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The user of the MapReduce library expresses the computation as two functions: (i) The Map (ii) The Reduce Big Data Computing Hadoop MapReduce 2.0
  • 182. Vu Pham (i) Map Abstraction Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key ‘I’ and passes them to the Reduce function. Big Data Computing Hadoop MapReduce 2.0
  • 183. Vu Pham (ii) Reduce Abstraction The Reduce function, also written by the user, accepts an intermediate key ‘I’ and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user's reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory. Big Data Computing Hadoop MapReduce 2.0
  • 184. Vu Pham Map-Reduce Functions for Word Count map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; values: an iterator over counts result = 0 for each count v in values: result += v emit(key, result) Big Data Computing Hadoop MapReduce 2.0
  • 185. Vu Pham Map-Reduce Functions Input: a set of key/value pairs User supplies two functions: map(k,v) → list(k1,v1) reduce(k1, list(v1)) → v2 (k1,v1) is an intermediate key/value pair Output is the set of (k1,v2) pairs Big Data Computing Hadoop MapReduce 2.0
  • 186. Vu Pham MapReduce Applications Big Data Computing Hadoop MapReduce 2.0
  • 187. Vu Pham Applications Here are a few simple applications of interesting programs that can be easily expressed as MapReduce computations. Distributed Grep: The map function emits a line if it matches a supplied pattern. The reduce function is an identity function that just copies the supplied intermediate data to the output. Count of URL Access Frequency: The map function processes logs of web page requests and outputs (URL; 1). The reduce function adds together all values for the same URL and emits a (URL; total count) pair. ReverseWeb-Link Graph: The map function outputs (target; source) pairs for each link to a target URL found in a page named source. The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: (target; list(source)) Big Data Computing Hadoop MapReduce 2.0
  • 188. Vu Pham Contd… Term-Vector per Host: A term vector summarizes the most important words that occur in a document or a set of documents as a list of (word; frequency) pairs. The map function emits a (hostname; term vector) pair for each input document (where the hostname is extracted from the URL of the document). The reduce function is passed all per-document term vectors for a given host. It adds these term vectors together, throwing away infrequent terms, and then emits a final (hostname; term vector) pair Big Data Computing Hadoop MapReduce 2.0
  • 189. Vu Pham Contd… Inverted Index: The map function parses each document, and emits a sequence of (word; document ID) pairs. The reduce function accepts all pairs for a given word, sorts the corresponding document IDs and emits a (word; list(document ID)) pair. The set of all output pairs forms a simple inverted index. It is easy to augment this computation to keep track of word positions. Distributed Sort: The map function extracts the key from each record, and emits a (key; record) pair. The reduce function emits all pairs unchanged. Big Data Computing Hadoop MapReduce 2.0
  • 190. Vu Pham Applications of MapReduce (1) Distributed Grep: Input: large set of files Output: lines that match pattern Map – Emits a line if it matches the supplied pattern Reduce – Copies the intermediate data to output Big Data Computing Hadoop MapReduce 2.0
  • 191. Vu Pham Applications of MapReduce (2) Reverse Web-Link Graph: Input: Web graph: tuples (a, b) where (page a → page b) Output: For each page, list of pages that link to it Map – process web log and for each input <source, target>, it outputs <target, source> Reduce - emits <target, list(source)> Big Data Computing Hadoop MapReduce 2.0
  • 192. Vu Pham Applications of MapReduce (3) Count of URL access frequency: Input: Log of accessed URLs, e.g., from proxy server Output: For each URL, % of total accesses for that URL Map – Process web log and outputs <URL, 1> Multiple Reducers - Emits <URL, URL_count> (So far, like Wordcount. But still need %) Chain another MapReduce job after above one Map – Processes <URL, URL_count> and outputs <1, (<URL, URL_count> )> 1 Reducer – Does two passes. In first pass, sums up all URL_count’s to calculate overall_count. In second pass calculates %’s Emits multiple <URL, URL_count/overall_count> Big Data Computing Hadoop MapReduce 2.0
  • 193. Vu Pham Applications of MapReduce (4) Map task’s output is sorted (e.g., quicksort) Reduce task’s input is sorted (e.g., mergesort) Sort Input: Series of (key, value) pairs Output: Sorted <value>s Map – <key, value> → <value, _> (identity) Reducer – <key, value> → <key, value> (identity) Partitioning function – partition keys across reducers based on ranges (can’t use hashing!) • Take data distribution into account to balance reducer tasks Big Data Computing Hadoop MapReduce 2.0
  • 194. Vu Pham The YARN Scheduler • Used underneath Hadoop 2.x + • YARN = Yet Another Resource Negotiator • Treats each server as a collection of containers – Container = fixed CPU + fixed memory • Has 3 main components – Global Resource Manager (RM) • Scheduling – Per-server Node Manager (NM) • Daemon and server-specific functions – Per-application (job) Application Master (AM) • Container negotiation with RM and NMs • Detecting task failures of that job Big Data Computing Hadoop MapReduce 2.0
  • 195. Vu Pham YARN: How a job gets a container Resource Manager Capacity Scheduler Node A Application Master 1 Node B Node Manager B Application Master 2 Task (App2) 2. Container Completed 1. Need container 3. Container on Node B In this figure • 2 servers (A, B) • 2 jobs (1, 2) Node Manager A 4. Start task, please! Big Data Computing Hadoop MapReduce 2.0
  • 196. Vu Pham MapReduce Examples Big Data Computing Hadoop MapReduce 2.0
  • 197. Vu Pham Example: 1 Word Count using MapReduce map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; values: an iterator over counts result = 0 for each count v in values: result += v emit(key, result) Big Data Computing Hadoop MapReduce 2.0
  • 198. Vu Pham Count Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” see bob run see spot throw see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1 Big Data Computing Hadoop MapReduce 2.0
  • 199. Vu Pham Example 2: Counting words of different lengths The map function takes a value and outputs key:value pairs. For instance, if we define a map function that takes a string and outputs the length of the word as the key and the word itself as the value then map(steve) would return 5:steve and map(savannah) would return 8:savannah. This allows us to run the map function against values in parallel and provides a huge advantage. Big Data Computing Hadoop MapReduce 2.0
  • 200. Vu Pham Example 2: Counting words of different lengths Before we get to the reduce function, the mapreduce framework groups all of the values together by key, so if the map functions output the following key:value pairs: 3 : the 3 : and 3 : you 4 : then 4 : what 4 : when 5 : steve 5 : where 8 : savannah 8 : research They get grouped as: 3 : [the, and, you] 4 : [then, what, when] 5 : [steve, where] 8 : [savannah, research] Big Data Computing Hadoop MapReduce 2.0
  • 201. Vu Pham Example 2: Counting words of different lengths Each of these lines would then be passed as an argument to the reduce function, which accepts a key and a list of values. In this instance, we might be trying to figure out how many words of certain lengths exist, so our reduce function will just count the number of items in the list and output the key with the size of the list, like: 3 : 3 4 : 3 5 : 2 8 : 2 Big Data Computing Hadoop MapReduce 2.0
  • 202. Vu Pham Example 2: Counting words of different lengths The reductions can also be done in parallel, again providing a huge advantage. We can then look at these final results and see that there were only two words of length 5 in the corpus, etc... The most common example of mapreduce is for counting the number of times words occur in a corpus. Big Data Computing Hadoop MapReduce 2.0
  • 203. Vu Pham Example 3: Word Length Histogram Abridged Declaration of Independence A Declaration By the Representatives of the United States of America, in General Congress Assembled. When in the course of human events it becomes necessary for a people to advance from that subordination in which they have hitherto remained, and to assume among powers of the earth the equal and independent station to which the laws of nature and of nature's god entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the change. We hold these truths to be self-evident; that all men are created equal and independent; that from that equal creation they derive rights inherent and inalienable, among which are the preservation of life, and liberty, and the pursuit of happiness; that to secure these ends, governments are instituted among men, deriving their just power from the consent of the governed; that whenever any form of government shall become destructive of these ends, it is the right of the people to alter or to abolish it, and to institute new government, laying it’s foundation on such principles and organizing it's power in such form, as to them shall seem most likely to effect their safety and happiness. Prudence indeed will dictate that governments long established should not be changed for light and transient causes: and accordingly all experience hath shewn that mankind are more disposed to suffer while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, begun at a distinguished period, and pursuing invariably the same object, evinces a design to reduce them to arbitrary power, it is their right, it is their duty, to throw off such government and to provide new guards for future security. Such has been the patient sufferings of the colonies; and such is now the necessity which constrains them to expunge their former systems of government. the history of his present majesty is a history of unremitting injuries and usurpations, among which no one fact stands single or solitary to contradict the uniform tenor of the rest, all of which have in direct object the establishment of an absolute tyranny over these states. To prove this, let facts be submitted to a candid world, for the truth of which we pledge a faith yet unsullied by falsehood. Big Data Computing Hadoop MapReduce 2.0
  • 204. Vu Pham Example 3: Word Length Histogram Abridged Declaration of Independence A Declaration By the Representatives of the United States of America, in General Congress Assembled. When in the course of human events it becomes necessary for a people to advance from that subordination in which they have hitherto remained, and to assume among powers of the earth the equal and independent station to which the laws of nature and of nature's god entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the change. We hold these truths to be self-evident; that all men are created equal and independent; that from that equal creation they derive rights inherent and inalienable, among which are the preservation of life, and liberty, and the pursuit of happiness; that to secure these ends, governments are instituted among men, deriving their just power from the consent of the governed; that whenever any form of government shall become destructive of these ends, it is the right of the people to alter or to abolish it, and to institute new government, laying it’s foundation on such principles and organizing it's power in such form, as to them shall seem most likely to effect their safety and happiness. Prudence indeed will dictate that governments long established should not be changed for light and transient causes: and accordingly all experience hath shewn that mankind are more disposed to suffer while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, begun at a distinguished period, and pursuing invariably the same object, evinces a design to reduce them to arbitrary power, it is their right, it is their duty, to throw off such government and to provide new guards for future security. Such has been the patient sufferings of the colonies; and such is now the necessity which constrains them to expunge their former systems of government. the history of his present majesty is a history of unremitting injuries and usurpations, among which no one fact stands single or solitary to contradict the uniform tenor of the rest, all of which have in direct object the establishment of an absolute tyranny over these states. To prove this, let facts be submitted to a candid world, for the truth of which we pledge a faith yet unsullied by falsehood. Big Data Computing How many “big”, “medium” and “small” words, are used ? Hadoop MapReduce 2.0
  • 205. Vu Pham Big = Yellow = 10+ letters Medium = Red = 5..9 letters Small = Blue = 2..4 letters Tiny = Pink = 1 letter Example 3: Word Length Histogram Big Data Computing Hadoop MapReduce 2.0
  • 206. Vu Pham Example 3: Word Length Histogram Big Data Computing Hadoop MapReduce 2.0
  • 207. Vu Pham Example 3: Word Length Histogram Big Data Computing Hadoop MapReduce 2.0
  • 208. Vu Pham Example 3: Word Length Histogram Big Data Computing Hadoop MapReduce 2.0
  • 209. Vu Pham Example 4: Build an Inverted Index Big Data Computing Input: tweet1, (“I love pancakes for breakfast”) tweet2, (“I dislike pancakes”) tweet3, (“What should I eat for breakfast?”) tweet4, (“I love to eat”) Desired output: “pancakes”, (tweet1, tweet2) “breakfast”, (tweet1, tweet3) “eat”, (tweet3, tweet4) “love”, (tweet1, tweet4) … Hadoop MapReduce 2.0
  • 210. Vu Pham Example 5: Relational Join Big Data Computing Hadoop MapReduce 2.0
  • 211. Vu Pham Example 5: Relational Join: Before Map Phase Big Data Computing Hadoop MapReduce 2.0
  • 212. Vu Pham Example 5: Relational Join: Map Phase Big Data Computing Hadoop MapReduce 2.0
  • 213. Vu Pham Example 5: Relational Join: Reduce Phase Big Data Computing Hadoop MapReduce 2.0
  • 214. Vu Pham Example 5: Relational Join in MapReduce, again Big Data Computing Hadoop MapReduce 2.0
  • 215. Vu Pham Example 6: Finding Friends Facebook has a list of friends (note that friends are a bi-directional thing on Facebook. If I'm your friend, you're mine). They also have lots of disk space and they serve hundreds of millions of requests everyday. They've decided to pre-compute calculations when they can to reduce the processing time of requests. One common processing request is the "You and Joe have 230 friends in common" feature. When you visit someone's profile, you see a list of friends that you have in common. This list doesn't change frequently so it'd be wasteful to recalculate it every time you visited the profile (sure you could use a decent caching strategy, but then we wouldn't be able to continue writing about mapreduce for this problem). We're going to use mapreduce so that we can calculate everyone's common friends once a day and store those results. Later on it's just a quick lookup. We've got lots of disk, it's cheap. Big Data Computing Hadoop MapReduce 2.0
  • 216. Vu Pham Example 6: Finding Friends Assume the friends are stored as Person->[List of Friends], our friends list is then: A -> B C D B -> A C D E C -> A B D E D -> A B C E E -> B C D Big Data Computing Hadoop MapReduce 2.0
  • 217. Vu Pham Example 6: Finding Friends For map(A -> B C D) : (A B) -> B C D (A C) -> B C D (A D) -> B C D For map(B -> A C D E) : (Note that A comes before B in the key) (A B) -> A C D E (B C) -> A C D E (B D) -> A C D E (B E) -> A C D E Big Data Computing Hadoop MapReduce 2.0
  • 218. Vu Pham Example 6: Finding Friends For map(C -> A B D E) : (A C) -> A B D E (B C) -> A B D E (C D) -> A B D E (C E) -> A B D E For map(D -> A B C E) : (A D) -> A B C E (B D) -> A B C E (C D) -> A B C E (D E) -> A B C E And finally for map(E -> B C D): (B E) -> B C D (C E) -> B C D (D E) -> B C D Big Data Computing Hadoop MapReduce 2.0
  • 219. Vu Pham Example 6: Finding Friends Before we send these key-value pairs to the reducers, we group them by their keys and get: (A B) -> (A C D E) (B C D) (A C) -> (A B D E) (B C D) (A D) -> (A B C E) (B C D) (B C) -> (A B D E) (A C D E) (B D) -> (A B C E) (A C D E) (B E) -> (A C D E) (B C D) (C D) -> (A B C E) (A B D E) (C E) -> (A B D E) (B C D) (D E) -> (A B C E) (B C D) Big Data Computing Hadoop MapReduce 2.0
  • 220. Vu Pham Example 6: Finding Friends Each line will be passed as an argument to a reducer. The reduce function will simply intersect the lists of values and output the same key with the result of the intersection. For example, reduce((A B) -> (A C D E) (B C D)) will output (A B) : (C D) and means that friends A and B have C and D as common friends. Big Data Computing Hadoop MapReduce 2.0
  • 221. Vu Pham Example 6: Finding Friends The result after reduction is: (A B) -> (C D) (A C) -> (B D) (A D) -> (B C) (B C) -> (A D E) (B D) -> (A C E) (B E) -> (C D) (C D) -> (A B E) (C E) -> (B D) (D E) -> (B C) Now when D visits B's profile, we can quickly look up (B D) and see that they have three friends in common, (A C E). Big Data Computing Hadoop MapReduce 2.0
  • 222. Vu Pham Reading Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters” http://labs.google.com/papers/mapreduce.html Big Data Computing Hadoop MapReduce 2.0
  • 223. Vu Pham Parallel Programming with Spark Dr. Rajiv Misra Dept. of Computer Science & Engg. Indian Institute of Technology Patna rajivm@iitp.ac.in Big Data Computing Parallel Programming with Spark
  • 224. Vu Pham Preface Content of this Lecture: In this lecture, we will discuss: Overview of Spark Fundamentals of Scala & functional programming Spark concepts Spark operations Job execution Big Data Computing Parallel Programming with Spark
  • 225. Vu Pham Introduction to Spark Big Data Computing Parallel Programming with Spark
  • 226. Vu Pham Fast, expressive cluster computing system compatible with Apache Hadoop Works with any Hadoop-supported storage system (HDFS, S3,SequenceFile, Avro, …) Improves efficiency through: In-memory computing primitives General computation graphs Improves usability through: Rich APIs in Java, Scala, Python Interactive shell Up to 100× faster Often 2-10× less code What is Spark? Big Data Computing Parallel Programming with Spark
  • 227. Vu Pham Local multicore: just a library in your program EC2: scripts for launching a Spark cluster Private cluster: Mesos, YARN, Standalone Mode How to Run It Big Data Computing Parallel Programming with Spark
  • 228. Vu Pham Scala vs Java APIs Spark originally written in Scala, which allows concise function syntax and interactive use APIs in Java, Scala and Python Interactive shells in Scala and Python Big Data Computing Parallel Programming with Spark
  • 229. Vu Pham Introduction to Scala & functional programming Big Data Computing Parallel Programming with Spark
  • 230. Vu Pham High-level language for the Java VM Object-oriented + functional programming Statically typed Comparable in speed to Java But often no need to write types due to type inference Interoperates with Java Can use any Java class, inherit from it, etc; can also call Scala code from Java About Scala Big Data Computing Parallel Programming with Spark
  • 231. Vu Pham Interactive shell: just type scala Supports importing libraries, tab completion and all constructs in the language. Best Way to Learn Scala Big Data Computing Parallel Programming with Spark
  • 232. Vu Pham Quick Tour Declaring variables: var x: Int = 7 var x = 7 // type inferred val y = “hi” // read-only Java equivalent: int x = 7; final String y = “hi”; Functions: def square(x: Int): Int = x*x def square(x: Int): Int = { x*x } def announce(text: String) { println(text) } Java equivalent: int square(int x) { return x*x; } void announce(String text) { System.out.println(text); } Last expression in block returned Big Data Computing Parallel Programming with Spark
  • 233. Vu Pham Quick Tour Generic types: var arr = new Array[Int](8) var lst = List(1, 2, 3) // type of lst is List[Int] Java equivalent: int[] arr = new int[8]; List<Integer> lst = new ArrayList<Integer>(); lst.add(...) Indexing: arr(5) = 7 println(lst(5)) Java equivalent: arr[5] = 7; System.out.println(lst.get(5)); Factory method Can’t hold primitive types Big Data Computing Parallel Programming with Spark
  • 234. Vu Pham Processing collections with functional programming: val list = List(1, 2, 3) list.foreach(x => println(x)) // prints 1, 2, 3 list.foreach(println) // same list.map(x => x + 2) // => List(3, 4, 5) list.map(_ + 2) // same, with placeholder notation list.filter(x => x % 2 == 1) // => List(1, 3) list.filter(_ % 2 == 1) // => List(1, 3) list.reduce((x, y) => x + y) // => 6 list.reduce(_ + _) // => 6 Function expression (closure) All of these leave the list unchanged (List is immutable) Quick Tour Big Data Computing Parallel Programming with Spark
  • 235. Vu Pham Scala Closure Syntax (x: Int) => x + 2 // full version x => x + 2 // type inferred _ + 2 // when each argument is used exactly once x => { // when body is a block of code val numberToAdd = 2 x + numberToAdd } // If closure is too long, can always pass a function def addTwo(x: Int): Int = x + 2 list.map(addTwo) Scala allows defining a “local function” inside another function Big Data Computing Parallel Programming with Spark
  • 236. Vu Pham Other Collection Methods Scala collections provide many other functional methods; for example, Google for “Scala Seq” Method on Seq[T] Explanation map(f: T => U): Seq[U] Pass each element through f flatMap(f: T => Seq[U]): Seq[U] One-to-many map filter(f: T => Boolean): Seq[T] Keep elements passing f exists(f: T => Boolean): Boolean True if one element passes forall(f: T => Boolean): Boolean True if all elements pass reduce(f: (T, T) => T): T Merge elements using f groupBy(f: T => K): Map[K,List[T]] Group elements by f(element) sortBy(f: T => K): Seq[T] Sort elements by f(element) . . . Big Data Computing Parallel Programming with Spark
  • 237. Vu Pham Spark Concepts Big Data Computing Parallel Programming with Spark
  • 238. Vu Pham Spark Overview Goal: Work with distributed collections as you would with local ones Concept: resilient distributed datasets (RDDs) Immutable collections of objects spread across a cluster Built through parallel transformations (map, filter, etc) Automatically rebuilt on failure Controllable persistence (e.g. caching in RAM) Big Data Computing Parallel Programming with Spark
  • 239. Vu Pham Main Primitives Resilient distributed datasets (RDDs) Immutable, partitioned collections of objects Transformations (e.g. map, filter, groupBy, join) Lazy operations to build RDDs from other RDDs Actions (e.g. count, collect, save) Return a result or write it to storage Big Data Computing Parallel Programming with Spark
  • 240. Vu Pham lines = spark.textFile(“hdfs://...”) errors = lines.filter(lambda s: s.startswith(“ERROR”)) messages = errors.map(lambda s: s.split(‘t’)[2]) messages.cache() Block 1 Block 2 Block 3 Worker Worker Worker Driver messages.filter(lambda s: “foo” in s).count() messages.filter(lambda s: “bar” in s).count() . . . tasks results Cache 1 Cache 2 Cache 3 Base RDD Transformed RDD Action Result: full-text search of Wikipedia in <1 sec (vs 20 sec for on-disk data) Result: scaled to 1 TB data in 5-7 sec (vs 170 sec for on-disk data) Example: Mining Console Logs Load error messages from a log into memory, then interactively search for patterns Big Data Computing Parallel Programming with Spark
  • 241. Vu Pham RDD Fault Tolerance RDDs track the transformations used to build them (their lineage) to recompute lost data E.g: messages = textFile(...).filter(lambda s: s.contains(“ERROR”)) .map(lambda s: s.split(‘t’)[2]) HadoopRDD path = hdfs://… FilteredRDD func = contains(...) MappedRDD func = split(…) Big Data Computing Parallel Programming with Spark
  • 242. Vu Pham Fault Recovery Test 119 57 56 58 58 81 57 59 57 59 0 50 100 150 1 2 3 4 5 6 7 8 9 10 Iteratrion time (s) Iteration Failure happens Big Data Computing Parallel Programming with Spark
  • 243. Vu Pham Behavior with Less RAM Big Data Computing Parallel Programming with Spark
  • 244. Vu Pham Which Language Should I Use? Standalone programs can be written in any, but console is only Python & Scala Python developers: can stay with Python for both Java developers: consider using Scala for console (to learn the API) Performance: Java / Scala will be faster (statically typed), but Python can do well for numerical work with NumPy Big Data Computing Parallel Programming with Spark
  • 245. Vu Pham Tour of Spark operations Big Data Computing Parallel Programming with Spark
  • 246. Vu Pham Learning Spark Easiest way: Spark interpreter (spark-shell or pyspark) Special Scala and Python consoles for cluster use Runs in local mode on 1 thread by default, but can control with MASTER environment var: MASTER=local ./spark-shell # local, 1 thread MASTER=local[2] ./spark-shell # local, 2 threads MASTER=spark://host:port ./spark-shell # Spark standalone cluster Big Data Computing Parallel Programming with Spark
  • 247. Vu Pham Main entry point to Spark functionality Created for you in Spark shells as variable sc In standalone programs, you’d make your own (see later for details) First Stop: SparkContext Big Data Computing Parallel Programming with Spark
  • 248. Vu Pham Creating RDDs # Turn a local collection into an RDD sc.parallelize([1, 2, 3]) # Load text file from local FS, HDFS, or S3 sc.textFile(“file.txt”) sc.textFile(“directory/*.txt”) sc.textFile(“hdfs://namenode:9000/path/file”) # Use any existing Hadoop InputFormat sc.hadoopFile(keyClass, valClass, inputFmt, conf) Big Data Computing Parallel Programming with Spark
  • 249. Vu Pham Basic Transformations nums = sc.parallelize([1, 2, 3]) # Pass each element through a function squares = nums.map(lambda x: x*x) # => {1, 4, 9} # Keep elements passing a predicate even = squares.filter(lambda x: x % 2 == 0) # => {4} # Map each element to zero or more others nums.flatMap(lambda x: range(0, x)) # => {0, 0, 1, 0, 1, 2} Range object (sequence of numbers 0, 1, …, x-1) Big Data Computing Parallel Programming with Spark
  • 250. Vu Pham nums = sc.parallelize([1, 2, 3]) # Retrieve RDD contents as a local collection nums.collect() # => [1, 2, 3] # Return first K elements nums.take(2) # => [1, 2] # Count number of elements nums.count() # => 3 # Merge elements with an associative function nums.reduce(lambda x, y: x + y) # => 6 # Write elements to a text file nums.saveAsTextFile(“hdfs://file.txt”) Basic Actions Big Data Computing Parallel Programming with Spark
  • 251. Vu Pham Spark’s “distributed reduce” transformations act on RDDs of key-value pairs Python: pair = (a, b) pair[0] # => a pair[1] # => b Scala: val pair = (a, b) pair._1 // => a pair._2 // => b Java: Tuple2 pair = new Tuple2(a, b); // class scala.Tuple2 pair._1 // => a pair._2 // => b Working with Key-Value Pairs Big Data Computing Parallel Programming with Spark
  • 252. Vu Pham Some Key-Value Operations pets = sc.parallelize([(“cat”, 1), (“dog”, 1), (“cat”, 2)]) pets.reduceByKey(lambda x, y: x + y) # => {(cat, 3), (dog, 1)} pets.groupByKey() # => {(cat, Seq(1, 2)), (dog, Seq(1)} pets.sortByKey() # => {(cat, 1), (cat, 2), (dog, 1)} reduceByKey also automatically implements combiners on the map side Big Data Computing Parallel Programming with Spark
  • 253. Vu Pham lines = sc.textFile(“hamlet.txt”) counts = lines.flatMap(lambda line: line.split(“ ”)) .map(lambda word: (word, 1)) .reduceByKey(lambda x, y: x + y) “to be or” “not to be” “to” “be” “or” “not” “to” “be” (to, 1) (be, 1) (or, 1) (not, 1) (to, 1) (be, 1) (be, 2) (not, 1) (or, 1) (to, 2) Example: Word Count Big Data Computing Parallel Programming with Spark
  • 254. Vu Pham Other Key-Value Operations val visits = sc.parallelize(List( (“index.html”, “1.2.3.4”), (“about.html”, “3.4.5.6”), (“index.html”, “1.3.3.1”))) val pageNames = sc.parallelize(List( (“index.html”, “Home”), (“about.html”, “About”))) visits.join(pageNames) // (“index.html”, (“1.2.3.4”, “Home”)) // (“index.html”, (“1.3.3.1”, “Home”)) // (“about.html”, (“3.4.5.6”, “About”)) visits.cogroup(pageNames) // (“index.html”, (Seq(“1.2.3.4”, “1.3.3.1”), Seq(“Home”))) // (“about.html”, (Seq(“3.4.5.6”), Seq(“About”))) Big Data Computing Parallel Programming with Spark
  • 255. Vu Pham visits = sc.parallelize([(“index.html”, “1.2.3.4”), (“about.html”, “3.4.5.6”), (“index.html”, “1.3.3.1”)]) pageNames = sc.parallelize([(“index.html”, “Home”), (“about.html”, “About”)]) visits.join(pageNames) # (“index.html”, (“1.2.3.4”, “Home”)) # (“index.html”, (“1.3.3.1”, “Home”)) # (“about.html”, (“3.4.5.6”, “About”)) visits.cogroup(pageNames) # (“index.html”, (Seq(“1.2.3.4”, “1.3.3.1”), Seq(“Home”))) # (“about.html”, (Seq(“3.4.5.6”), Seq(“About”))) Multiple Datasets Big Data Computing Parallel Programming with Spark
  • 256. Vu Pham Controlling the Level of Parallelism All the pair RDD operations take an optional second parameter for number of tasks words.reduceByKey(lambda x, y: x + y, 5) words.groupByKey(5) visits.join(pageViews, 5) Can also set spark.default.parallelism property Big Data Computing Parallel Programming with Spark
  • 257. Vu Pham External variables you use in a closure will automatically be shipped to the cluster: query = raw_input(“Enter a query:”) pages.filter(lambda x: x.startswith(query)).count() Some caveats: Each task gets a new copy (updates aren’t sent back) Variable must be Serializable (Java/Scala) or Pickle-able (Python) Don’t use fields of an outer object (ships all of it!) Using Local Variables Big Data Computing Parallel Programming with Spark
  • 258. Vu Pham class MyCoolRddApp { val param = 3.14 val log = new Log(...) ... def work(rdd: RDD[Int]) { rdd.map(x => x + param) .reduce(...) } } How to get around it: class MyCoolRddApp { ... def work(rdd: RDD[Int]) { val param_ = param rdd.map(x => x + param_) .reduce(...) } } NotSerializableException: MyCoolRddApp (or Log) References only local variable instead of this.param Closure Mishap Example Big Data Computing Parallel Programming with Spark
  • 259. Vu Pham Other RDD Operations sample(): deterministically sample a subset union(): merge two RDDs cartesian(): cross product pipe(): pass through external program See Programming Guide for more: www.spark-project.org/documentation.html Big Data Computing Parallel Programming with Spark
  • 260. Vu Pham Spark supports lots of other operations! Full programming guide: spark-project.org/documentation More Details Big Data Computing Parallel Programming with Spark
  • 261. Vu Pham Job execution Big Data Computing Parallel Programming with Spark
  • 262. Vu Pham Software Components Spark runs as a library in your program (one instance per app) Runs tasks locally or on a cluster Standalone deploy cluster, Mesos or YARN Accesses storage via Hadoop InputFormat API Can use HBase, HDFS, S3, … Your application SparkContext Local threads Cluster manager Worker Worker HDFS or other storage Spark executor Spark executor Big Data Computing Parallel Programming with Spark
  • 263. Vu Pham join filter groupBy Stage 3 Stage 1 Stage 2 A: B: C: D: E: F: = cached partition = RDD map Task Scheduler Supports general task graphs Pipelines functions where possible Cache-aware data reuse & locality Partitioning-aware to avoid shuffles Big Data Computing Parallel Programming with Spark
  • 264. Vu Pham More Information Scala resources: www.artima.com/scalazine/articles/steps.html (First Steps to Scala) www.artima.com/pins1ed (free book) Spark documentation: www.spark- project.org/documentation.html Big Data Computing Parallel Programming with Spark
  • 265. Vu Pham Spark can read/write to any storage system / format that has a plugin for Hadoop! Examples: HDFS, S3, HBase, Cassandra, Avro, SequenceFile Reuses Hadoop’s InputFormat and OutputFormat APIs APIs like SparkContext.textFile support filesystems, while SparkContext.hadoopRDD allows passing any Hadoop JobConf to configure an input source Hadoop Compatibility Big Data Computing Parallel Programming with Spark
  • 266. Vu Pham import spark.api.java.JavaSparkContext; JavaSparkContext sc = new JavaSparkContext( “masterUrl”, “name”, “sparkHome”, new String[] {“app.jar”})); import spark.SparkContext import spark.SparkContext._ val sc = new SparkContext(“masterUrl”, “name”, “sparkHome”, Seq(“app.jar”)) Cluster URL, or local / local[N] App name Spark install path on cluster List of JARs with app code (to ship) Create a SparkContext Scala Java from pyspark import SparkContext sc = SparkContext(“masterUrl”, “name”, “sparkHome”, [“library.py”])) Python Big Data Computing Parallel Programming with Spark
  • 267. Vu Pham import spark.SparkContext import spark.SparkContext._ object WordCount { def main(args: Array[String]) { val sc = new SparkContext(“local”, “WordCount”, args(0), Seq(args(1))) val lines = sc.textFile(args(2)) lines.flatMap(_.split(“ ”)) .map(word => (word, 1)) .reduceByKey(_ + _) .saveAsTextFile(args(3)) } } Complete App: Scala Big Data Computing Parallel Programming with Spark
  • 268. Vu Pham import sys from pyspark import SparkContext if __name__ == "__main__": sc = SparkContext( “local”, “WordCount”, sys.argv[0], None) lines = sc.textFile(sys.argv[1]) lines.flatMap(lambda s: s.split(“ ”)) .map(lambda word: (word, 1)) .reduceByKey(lambda x, y: x + y) .saveAsTextFile(sys.argv[2]) Complete App: Python Big Data Computing Parallel Programming with Spark
  • 269. Vu Pham Example: PageRank Big Data Computing Parallel Programming with Spark
  • 270. Vu Pham Why PageRank? Good example of a more complex algorithm Multiple stages of map & reduce Benefits from Spark’s in-memory caching Multiple iterations over the same data Big Data Computing Parallel Programming with Spark
  • 271. Vu Pham Basic Idea Give pages ranks (scores) based on links to them Links from many pages ➔ high rank Link from a high-rank page ➔ high rank Big Data Computing Parallel Programming with Spark
  • 272. Vu Pham Algorithm 1.0 1.0 1.0 1.0 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs Big Data Computing Parallel Programming with Spark
  • 273. Vu Pham Algorithm 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs 1.0 1.0 1.0 1.0 1 0.5 0.5 0.5 1 0.5 Big Data Computing Parallel Programming with Spark
  • 274. Vu Pham Algorithm 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs 0.58 1.0 1.85 0.58 Big Data Computing Parallel Programming with Spark
  • 275. Vu Pham Algorithm 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs 0.58 0.29 0.29 0.5 1.85 0.58 1.0 1.85 0.58 0.5 Big Data Computing Parallel Programming with Spark
  • 276. Vu Pham Algorithm 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs 0.39 1.72 1.31 0.58 . . . Big Data Computing Parallel Programming with Spark
  • 277. Vu Pham Algorithm 1. Start each page at a rank of 1 2. On each iteration, have page p contribute rankp / |neighborsp| to its neighbors 3. Set each page’s rank to 0.15 + 0.85 × contribs 0.46 1.37 1.44 0.73 Final state: Big Data Computing Parallel Programming with Spark