Your SlideShare is downloading. ×
  • Like
Introduction to Big Data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Introduction to Big Data


Introduction to Big Data

Introduction to Big Data

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Introduction to Big Data DEENA DAYALAN
  • 2. What is Big Data?  As per Wikipedia Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using onhand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.  As per Gartner “Big data is high-volume, high-velocity and highvariety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”  Any data that cannot be processed using current relational database technologies in some manner can consider as Big Data.
  • 3. Big Data Age  We are living in the age of big data  Data is collected from everywhere and stored in warehouse.  Social network – Facebook, Twitter, flickr, Google, Amazon, etc  Bank / Credit card Transactions – For sending promotional offers.  E-commerce – eBay, Amazon.
  • 4. Big Data Vectors (3Vs)
  • 5. 1 - Data Volume  Large Hadron Collider (LHC) in CERN generating approximately one petabyte of data per second. CERN stores 25 petabytes of data per year.  Large Synoptic Survey Telescope (LSST) Over 30 thousand gigabytes (30TB) of images will be generated every night during the decade -long LSST sky survey.  EBay got 90 Petabytes of data warehouse.  Petabyte data sets are common these days and Exabyte is not far away.  Data volume is increasing exponentially.
  • 6. 2 - Data Velocity  Initially, companies analyzed data using a batch process. One takes a chunk of data, submits a job to the server and waits for delivery of the result.  It works only when incoming data rate is slower than the batch processing rate and the data is useful despite the delay.  With the new sources of data such as social and mobile applications, the batch process breaks down.  The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short.  Twitter Users send over 400 million tweets per day.
  • 7. 3 - Data Variety  Previously in relation database data is stored in tables and excel files.  Various formats include Pure text, photo, audio, video, web, GPS data, sensor data, relational databases, documents, SMS, PDF, flash, social media data, etc.  One no longer has control over the input data format. Structure can no longer be imposed like in the past in order to keep control over the analysis.  In facebook we post images, audio files, comments(text),etc  Google uses smart phones as sensors to determine traffic conditions from their GPS(gps data, Traffic cameras, RFID Tags in electronic payment) and internet connectivity.
  • 8. Importance of Big Data  handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.  Facebook handles 50 billion photos from its user base.  In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the government. The initiative was composed of 84 different big data programs spread across six departments.  LHC with less than 0.001% of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before replication (as of 2012). This becomes nearly 200 petabytes after replication.
  • 9. Tools used in Big Data Scenarios NoSQL Databases  MongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, V oldemort, Riak, Zookeeper Map Reduce  Hadoop, Hive, Pig, Cascading, Cascalog, MapR, Flume, Kafka, Azkaba n, Oozie, Greenplum Storage  S3, Hadoop Distributed File System Servers  EC2, Google App Engine, Elastic, Beanstalk, Heroku
  • 10. Gartner Hype cycle 2013
  • 11. Sources and References     
  • 12. Questions?