Bigdata

BIG DATA
Sourabh Dattawad
Department of Computer Science
KLS Gogte Institute of Technology
Belgaum, India

Contents
• Introduction
• What is Big Data?
• Characteristics of Big Data
• What is Big Data analytics?
• How does Big Data work?
• Application of Big Data
• Big Data growth
• What’s trending
• Conclusion
• References

Introduction
• A decade ago amount of data produced was less.
• Today the amount of data in the world is increasing
rapidly, outstripping not only our machines, but also
our imagination.

What can be done with this data?
• Scrapping this data is not a great idea.
• Big data has the potential to help companies improve
operations and make faster, more intelligent and
accurate decisions.
• More accurate analyses will lead to more confident
and effective decision making. And better decisions
can mean cost reductions and reduced risk.

Definition
• Big Data is a new term given to a diverse field of data
analysis in which the datasets are so massive that
they become hard to store, work, predict and analyze
using traditional databases and software.

Characteristics of Big Data
• Big Data is characterized as follows,

Volume
• It is the quantity of data generated that determines
the value and potential of data .
• Facebook, gets more than 12 million photos every
hour .
• Tweets on twitter cross over 400 million every day.

Velocity
• Its states the rate at which data is generated.
• Every minute on YouTube 48 hours of new videos are
uploaded.
• Every minute Google processes 2 million search
queries.

Variety
• It is the category to which the data belongs.
• The categories include Health sectors, Social
networking, Banking etc.

What is Big Data analytics?
• Analyzing the large data and reaching to conclusions
is called as Big Data analytics .
• Explanation using real life incidents,
– Google’s Flu Trends.
– Target Retailer.

Google’s Flu Trends
• Here Google predicted the flu trends just by
analyzing the data.
• In the year 2009 a new flu virus ‘H1N1’ was
discovered.
• 250-500k deaths every year, worldwide.
• Swine flu pandemic is worse.
• Surveillance
Centers for Disease Control and Prevention (CDC).
Problems Faced by CDC,
– Weekly
– 1-2 week publication lag

• Google took 50 million common search terms that
was typed in United States and compared the
number with CDC data on the spread of the flu.
• They processed 450 million different models in order
to test the search terms and prediction was almost
similar the stats processed by CDC .
What did they do?

Target Retailer
• Target retailer predicted the pregnancy just by
analyzing the buy trends of the consumers.
• Story of a pregnant teenager.
• This shows that real time data is never false.

How Big Data Works?
• Apache Hadoop -Apache Hadoop is the software
most commonly associated with Big Data. Apache
states it as “a framework that allows us for the
distributed processing of massive data sets across
clusters of computers using simple programming
models”.
• With Hadoop, no data is too big. It is possible to
process a huge data in just 3 minutes which takes
more than 20 hours for traditional systems.

• MapReduce - To make effective splitting of data
MapReduce is used. It is a software framework that
allows primary to split the input data set into
independent chunks that are processed in a
completely parallel manner.
Simple Block Diagram

What’s trending
• By analyzing the Big Data of DNA it is possible cure
genetic diseases like cancer.
• This can even predict where terrorists try to attack
only by analyzing the data.

Conclusion
• Big Data is the next big thing. Its about letting data
speak and real time data is never false, hence it is a
revolution that will transform how we think, live and
work.

References
• Victor Mayer-Schonberger, Kenneth Cukier “Big Data
– A Revolution”.
• Doing Data Science, By Cathy O'Neil, Rachel Schutt
Publisher: O'Reilly Media.
• http://hadoop.apache.org
Thank You

Bigdata

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Bigdata

Similar to Bigdata (20)

Recently uploaded

Recently uploaded (20)

Bigdata