Big Data

Big Data : Concept and Applications

Big data: Concept & Applications
Big data is the term for collection of dataset so large
and complex that it become difficult to process using
on hand database management tools or traditional
data processing applications.
The amount of data that is beyond the storage
and processing capabilities of single physical
machine then it is called Big data.
Big data ?
Large volume of data
Existing tools were not designed to handle such a huge data
.
Gigabyte  Terabyte  Petabyte  Exabyte  Zeta byte

Title : Big data : Concept & Applications
Amazon  collect social data ,log data , different flavor of data.
Walmart  handles more than 1 million customer transactions every hour.
Twitter  300000 tweets per minutes
Instagram  250000 upload new picture per minutes
Email  5 million messages (gmail)
WhatsApp  4,00,000 pictures per min
Google  5 millions search request per min
Facebook  2.5 millions contents per min
500 TB per day
Having data bigger it requires different approaches:
Techniques, Tools and Architecture
An aim to solve new problems or old problems in a better
way
Big Data generates value from the storage and processing of
very large quantities of digital information that cannot be
analyzed with traditional computing techniques.

Big data : 3V
•Variety
data coming from various sources
• Velocity
real time live streaming data
• Volume
in order of terabyte and petabyte

Title : Big data : Concept & Applications
Big Data are in everywhere.
Network Analysis
Social Network Web Graph

Bigdata : Volume
 Volume of data is increasing in every second
 Data will be measured in TB and ZB.
 Amount of data will be double in every two
years
 100 terabytes of data are uploaded
daily to Facebook
 100 hours of video uploaded in
every minute
 Research estimated 65% annual
growth in digital contents , mainly
unstructured data.
Gigabyte  Terabyte  Petabyte  Exabyte  Zetabyte

Data is created real
time
Internet of thing (IOT),
social media – major
contributor for the
speed at which the
data is generated.
In every minute
25 million queries on Google
 20 million photos are viewed on Flickr
 over 200 million emails are sent
Big Data : Velocity

Data are coming in all shape
structured,
semistructured, unstructured &
even complexed structure
90% of data generated is
‘unstructured’
starting from text to audio,
image or video data.
Big Data : Variety

Big Data Life Cycle
Storage
Capacity
2000 2018
Storage
MB
PB
2025
Processing
Speed

11
Hadoop
Apache Hadoop is a framework for storing ,processing and
analyzing big data.
•Distributed
•Scalable
•Open Source

12
Why Hadoop?
• 1 TB data is processed
by 1 computer
• Each computer is
having 4 I/O channel
of 100 mbps.
• Total time required :
44 minutes
1 TB data is processed by 10
computers (same configuration)
parallel .
Total time required : 4.4 minutes
CASE 1 CASE 2

13
HDFS (Hadoop Distributed file System)
- Stores data on the cluster
HDFS is a file system written in Java
Provide storage for massive amount of data
- Scalable
- Fault Tolerance
- Support efficient processing in MR

14
Hadoop : How files are stored?
-Data files split into blocks and distributed to data nodes
- Each block is replicated in multiple nodes ( default 3x)

15
HDFS (Master/Slaves Architecture)
Master machine is Name Node
Slaves machine are Data Node

16
MAP REDUCE
Map Reduce is a framework for executing highly parallelizable and
distributable algorithms across huge datasets.

17
Map Reduce : Mappers run parallel

18
Mad Reduce : Analyzing data

19
Basic Cluster Configuration

21
HADOOP
Hadoop =HDFS + Map Reduce
Hadoop HDFS commands are similar
to unix command.
Map reduce is programming model
Hive  Data Manipulation (like SQL)
Pig  Data Manipulation using Script
Sqoop  Import and Export on HDFS

22
Import/Export using Sqoop and Flume
Sqoop : Transfers data between RDBMS and HDFS
Flume : A service to move large amounts of data in real Time

Applications
E-commerce : (Amazon)
Recommendation Engine
-User buy pattern
-Digital Marketing Analysis
Telecommunication
-Call drop Analysis
-Network Problem Optimization
Entertainment
-Content Analytics (Netflix)
Sports
-Fitness Management (fitbit)
Health Care
-Early Disease Detection (pfizer)

Applications
Technology: In the technology, it is used in the websites like eBay,
Amazon and Facebook and Google utilize it.
Private sector: The application of big data in the private sector includes
the retail, retail banking, and real estate.
Government: The big data is also utilized by the Indian government.
International development: The development in the big data analysis
furnishes cost-effective opportunities to enhance the decision in critical
advancement areas like health care, employment opportunities and crime,
security and natural disaster. Hence, in this way, the big data is helpful for
the international development.

References
1.www.google.com
2.www.wikipedia.com
3.www.hortonworks.com

Big Data

More Related Content

What's hot

Similar to Big Data

Recently uploaded

Big Data

Editor's Notes