Big data

1,135 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,135
On SlideShare
0
From Embeds
0
Number of Embeds
333
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big data

  1. 1. Big Data A big step towards innovation, competition and productivity
  2. 2. Contents         Big Data Definition Example of Big Data Big Data Vectors Cost Problem Importance of Big Data Big Data growth Some Challenges in Big Data Big Data Implementation
  3. 3. Big Data Definition  Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.  In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.  The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.
  4. 4. An Example of Big Data  An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.  When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.
  5. 5. Big Data vectors
  6. 6. Cost problem Cost of processing 1 Petabyte of data with 1000 nodes?  1 PB = 1015 B = 1 million gigabytes = 1 thousand terabytes  9 hours for each node to process 500GB at rate of 15MB/S  15*60*60*9 = 486000MB ~ 500 GB  1000 * 9 * 0.34$ = 3060$ for single run  1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day  The cost for 1000 cloud node each processing 1PB  2000 * 3060$ = 6,120,000$
  7. 7. Importance of Big Data       Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. 84 different big data programs spread across six departments. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Facebook handles 40 billion photos from its user base. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.
  8. 8.     Large Hardon Colider 13 Petabyte data produced in 2010. Medical computation like decoding human Genome. Social science revolution New way of science (Microscope example)
  9. 9. Technology Player in this field           Google Oracle Microsoft IBM Hadapt Nike Yelp Netflix Dropbox Zipdial
  10. 10. Big Data growth
  11. 11. Some Challenges in Big Data     While big data can yield extremely useful information, it also presents new challenges with respect to : How much data to store ? How much this will cost ? Whether the data will be secure ? and How long it must be maintained ?
  12. 12. Implementation of Big Data Platforms for Large-scale Data Analysis :  The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and  The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value.
  13. 13. Thank You!! By: Harshita Rachora Trainee Software Consultant Knoldus Software LLP

×