Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.
3. Big Data Definition
Big data is used to describe a massive volume of both
structured and unstructured data that is so large that it's
difficult to process using traditional database and
software techniques.
In most enterprise scenarios the data is too big or it
moves too fast or it exceeds current processing capacity.
The term big data is believed to have originated with
Web search companies who had to query very large
distributed aggregations of loosely-structured data.
4. An Example of Big Data
An example of big data might be petabytes (1,024
terabytes) or exabytes (1,024 petabytes) of data
consisting of billions to trillions of records of millions of
people—all from different sources (e.g. Web, sales,
customer contact center, social media, mobile data and
so on). The data is typically loosely structured data that
is often incomplete and inaccessible.
When dealing with larger datasets, organizations face
difficulties in being able to create, manipulate, and
manage big data. Big data is particularly a problem in
business analytics because standard tools and
procedures are not designed to search and analyze
massive datasets.
6. Cost problem
Cost of processing 1 Petabyte of data with 1000 nodes?
1 PB = 1015 B = 1 million gigabytes = 1 thousand
terabytes
9 hours for each node to process 500GB at rate of
15MB/S
15*60*60*9 = 486000MB ~ 500 GB
1000 * 9 * 0.34$ = 3060$ for single run
1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750
Day
The cost for 1000 cloud node each processing 1PB
2000 * 3060$ = 6,120,000$
7. Importance of Big Data
Government: In 2012, the Obama administration
announced the Big Data Research and Development
Initiative.
84 different big data programs spread across six
departments.
Private Sector: Wal-Mart handles more than 1 million
customer transactions every hour, which is imported into
databases estimated to contain more than 2.5 petabytes
of data.
Facebook handles 40 billion photos from its user base.
Falcon Credit Card Fraud Detection System protects 2.1
billion active accounts world-wide.
Science: Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.
8.
Large Hardon Colider 13 Petabyte data produced in
2010.
Medical computation like decoding human Genome.
Social science revolution
New way of science (Microscope example)
9. Technology Player in this field
Google
Oracle
Microsoft
IBM
Hadapt
Nike
Yelp
Netflix
Dropbox
Zipdial
11. Some Challenges in Big Data
While big data can yield extremely useful information, it
also presents new challenges with respect to :
How much data to store ?
How much this will cost ?
Whether the data will be secure ? and
How long it must be maintained ?
12. Implementation of Big Data
Platforms for Large-scale Data Analysis :
The Apache Software Foundations' Java-based Hadoop
programming framework that can run applications on
systems with thousands of nodes; and
The MapReduce software framework, which consists of
a Map function that distributes work to different nodes
and a Reduce function that gathers results and resolves
them into a single value.