7. Its difficult to edit 10TB file in limited
time in traditional system
8. ClassifiCation of Big Data
1. StructuredData:
● It refers to data that has a defined
length and format for big data
● Ex.numbers, dates, and groups of
words and numbers called strings.
● It’s usually stored in a database.
9. 2. Unstructured Data
●
● No fields
Massive data ex. Internet
data●
Music(Audio)Applications Movie(vedio)
X-Rays Pictures
10. 3. Semi-Structured Data
The data which do not have a proper
formate attached to it. Ex.
–Data within an email
–Data in Doc File
11. Why do we need this?
●
●
●
● Better understand and target
coustmers
In Election exit poll
Improving healthcare
Improving and optimizing cities and
countries.
Application of big data are endless
14. ExamplEs of VEloCity
●
●
●
Almost 3.5 billion queries on Google
are performed each day
80 million photos are shared on Instagram on
an average day.
Every minute we upload 300 hours
of video on Youtube.
every day over 205 billion emails are sent.
500 million tweets are sent per day.
●
15. ● It refers to the vast amount of
data generated every second.
● Here we are talking about
Zettabyte or more.
● Data is generated by machines,
networks and human
interaction on systems like
social media.
● The volume of data to be
analyzed is massive.
17. Example of Volume...2
• Self-driving cars will
generate 2 Petabyte of data
every year.
• From now on, the amount of
data in the world will
double every two years.
• By 2020, we will have 50 times
the amount of data as that we
had in 2011.
●
●
●
18. Variety
●
●
●
Refers to the different types ofdata we can
now use.
In past the data was structured that fitted
in columns and rows.
– Stored in Database
– Spread sheets
But now the data is unstructured that are
difficult to storing, analysing,mining.
– Email, photo, audio
– monitoring devices, PDFs
19. ●
●
●
Are the results meaningful for the
given problem space?
it’s about data quality and
understandability.
Especially in automated decision-
making, where no human is involved
anymore, you need to be sure that
both the data and the analyses are
correct.
Veracity
20. Data Generating Points
●
●
●
Smart Phones
5 billion camera phones are
there in the world Most of them
have location awareness(GPS)
By 2020 we will have 6.1 billion
smartphones users globally.
21. Internet
●
●
●
●
2 billion people using internet
By the end of 2020, cisco internet
traffic will be 4.8 ZB per year.
Emails:
205 billion email sent every day
Blogs:
There are 200 million entries on the
web
22. Social Media
Facebook:
● 34K likes every minute
● It deals with 3-4 PB of data each
day
● There are 1 billion active user
Twitter:
● It generates 12TB of data daily
● 300million user generates
23. ●
●
Google:
It perform 2million search
every minute It deals with
20PB of data each day
Youtube:
●
2.9 billion vedio hours vedio
watched per month
24. Tools for handling big data
Traditional System
ex. RDBMS
Big Data Tools
ex. Hadoop
Created to handle
Big Data
25. Limitations of Traditional
Data Warehouse
●
Co
st
●
Fixed Schema of
RDBMS
●
Saving huge file and
accessing them
●
Perform
analysis
●
Time to do all
26. What is
Hadoop?
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters
of computers using simple programming
models.
• It is made by apache software foundation in
2011.
• Written in JAVA.