2. WHAT IS
BIG DATA?
Big data is a field that treats ways to analyze,
systematically extract information from, or
otherwise deal with data sets that are too
large or complex to be dealt with by
traditional data-processing application
software.
3. BIG DATA CHARACTERISTICS (5V)
Volume
Variety
Veracity
Velocity
the quantity of data
points
type and nature of
the data: text,
image, video, audio
how fast the data is
generated and
processed
how trustworthy the
sources are
Value
how actionable the
data is
4. BIG DATA CONCEPTS & TERMINOLOGY
The pooling of the
resources of multiple
machines to complete
jobs
Clustered
Computing
A type of computation
in which many
calculations are carry
out simultaneously
Parallel
Computing
Collection of nodes
(networked
computers) that run in
parallel.
Distributed
Computing
5. BIG DATA CONCEPTS & TERMINOLOGY
Breaking the jobs into
small pieces and running
them on individual
machines
Batch
Processing
Information is processed
and made ready
immediately
Real-Time
Processing
9. PARALLEL COMPUTING
Mainly because of memory
Also for processing power
Why it necessary?
Split tasks up into several
smaller sub tasks
Distributed these sub tasks
over several computers
How it works?
Extra processing powers
Reduced memory footprint
Advantages?
Moving data incurs a cost
Communication time when
gathering all results together
Disadvantages?
10. CLOUD COMPUTING FOR DATA PROCESSING
Top 3 Cloud Providers
In the cloud, we rent
servers and the price is
often cheaper than
buying our own servers
for data processing. We
don't need to worry
about space, electrical &
maintenance.