Faculty: Dr. Rakhi Tripathi Presented by: Ankita Sharma(222005)
What is BIG DATA ?
• Big Data is any data that is too large, complex & dynamic for any conventional data
tools to capture, store, manage & analyze.
• This explosion in data volume, variety, and velocity is called Big Data – and if you
can harness it, it will revolutionize the way you do business. Big Data platform,
applications, analytics, and services can help you dive into that ocean of
information and extract real business value – in real time
• It can be Structured as well as Non Structured.
WHY BIG DATA IS BECOMING IMPORTANT NOW?
• Rise of Smartphones with GPS and Internet connectivity: There are 4.6 billion
mobile-phone subscriptions worldwide and there are between 1 and 2 billion
people accessing the internet.
• Aerial Sensors and Sensor Network: The NASA Center for Climate Simulation stores
32 petabytes of climate observations and simulations on the Discover
• Social Network Adoption:Facebook has 1.06 billion monthly active users with 30
billion pieces of content shared on Facebook every month. There are roughly 175
million tweets every day, from more than 465 million accounts.
BIG DATA GENERATORS
Social media and networks
(all of us are generating data)
(tracking all objects all the time)
(collecting all sorts of
Sensor technology and networks
(measuring all kinds of data)
Big Data Characteristics
“Big Data” refers to high volume, velocity, variety and complex information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making
Characteristics of Big Data:
1. VOLUME (Scale)
• There has been a considerate increase in the volume of the data. Around
40X increase from 2009 to 2013.
• In other words, data volume is increasing exponentially.
Data collected from the World’s Topmost Companies:
Google processes 20000 PB a day (2010)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
The amount of data to be analyzed has gone up from some Terabytes to
millions of Petabytes or more precisely in thousands of Zetabytes.
Characteristics of Big Data:
2. VELOCITY (Speed)
• Data is being generated fast and need to be processed fast
• Velocity is as critical as the volume of the data.
• Late decisions missing opportunities
– E-Promotions: Based on your demographics, your purchase history,
what you like send promotions right now for items relevant to you.
– Healthcare monitoring: sensors monitoring your activities and body
any abnormal measurements require immediate reaction
Characteristics of Big Data:
3. VARIETY (Complexity)
• Big data is any type of data - structured and unstructured data such as
text, sensor data, audio, video, click streams, log files and more. New
insights are found when analyzing these data types together.
• Monitor 100’s of live video feeds from surveillance cameras to target
points of interest
• Exploit the 80% data growth in images, video and documents to improve
BIG DATA MANAGEMENT
• We have moved from an era where an organization could implement
database to meet a specific project need and be done. Nowadays, data has
become the fuel of Growth & Innovation. For effective data management
we have to keep in mind the below figure:
To Capture, Organize,
Consolidate (Integrate), Analyze & Act
BIG DATA TECHNOLOGIES
• With the evolution of computing technology, it is now possible to manage
immense volumes of data that previously could have only been handled by
supercomputers at great expense.
• In particular, the innovations MapReduce, Hadoop, and Big Table proved
to be the sparks that led to a new generation of data management. These
technologies address one of the most fundamental problems - the
capability to process massive amounts of data efficiently, cost effectively,
and in a timely fashion.
• Hadoop is an open source software project that enables the distributed processing
of large data sets across clusters of commodity servers. It is designed to scale up
from a single server to thousands of machines, with a very high degree of fault
Hadoop allows applications based on MapReduce (hadoop’s function) to run on
large clusters of commodity hardware. Hadoop is designed to parallelize data
processing across computing nodes to speed computations and hide latency.
MapReduce was designed by Google as a way of efficiently executing a set of
functions against a large amount of data in batch mode. The “map” component
distributes the programming problem or tasks across a large number of systems
and handles the placement of the tasks in a way that balances the load and
manages recovery from failures. After the distributed computation is completed,
another function called “reduce” aggregates all the elements back together to
provide a result.
Big Table was developed by Google to be a distributed storage system
intended to manage highly scalable structured data.
Data is organized into tables with rows and columns. Unlike a traditional
relational database model, Big Table is a sparse, distributed, persistent
multidimensional sorted map. It is intended to store huge volumes of data
across commodity servers.
Advantages of Big Data
• As organisations create and store more transactional data in digital form, they
can collect more accurate and detailed performance information on everything
from product inventories to sick days and therefore expose variability and boost
performance. In fact, some leading companies are using their ability to collect and
analyse big data to conduct controlled experiments to make better management
• Big Data allows ever-narrower segmentation of customers and therefore much
more precisely tailored products or services.
• Sophisticated analytics can substantially improve decision-making, minimise risks,
and unearth valuable insights that would otherwise remain hidden.
• Big Data can be used to develop the next generation of products and services. For
instance, manufacturers are using data obtained from sensors embedded in
products to create innovative after-sales service offerings such as proactive
maintenance to avoid failures in new products.
Analyzing new and fresh data can reveal new sources of economic value, provide
fresh insights into customer behavior & identify market trends early. But this influx
of new data creates great challenges for IT Department. To derive real business
value from Big Data, you need the right set of tools to capture & organize a wide
variety of Data types from different sources.
By using the mentioned applications an enterprise can acquire, organize & analyze
all their enterprise data including structured & unstructured- to make the most
“Today’s Big Data will not stay the same Tomorrow.”