©copyright Ankur Raina 2012
• 3 million lines of code are tracking your
  checked baggage.
• A billion lines of code are included in the
  working of the latest airbus plane.
• A billion transistors per person.
• 4 billion mobile phone subscribers.
• St. Anthony Falls Bridge ( Minneapolis) is fitted
  with 200 embedded sensors.
                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
• 2001
 8 Lakh Petabytes of data

• 2020
 35 zettabytes of data

                 ©copyright Ankur Raina 2012
7 TB/day



                            10 TB/day
       ©copyright Ankur Raina 2012
The Trouble begins here…
• 80% of the world’s information is
  unstructured.

• Unstructured information is growing at 15
  times the rate of structured information.



                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Contents
•   What is Big Data ?
•   The 3Vs.
•   What is a Big Data platform ?
•   Needle in a haystack problem.
•   Big Data & Social Media.
•   The Call Centre mantra.
•   ABCs of Hadoop.
                   ©copyright Ankur Raina 2012
Big Data
The information which cannot be
  processed/analyzed using the
  traditional processes or tools.
        • Instrumentation
        • Interconnection
     • M2M interconnectivity
      • Intelligent Machines
          ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Big Data Platform
• Lets you store the data in its native business
  object format & get value out of it through
  massive parallelism on readily available
  components.

• It’s not a replacement of Data Warehouse.


                   ©copyright Ankur Raina 2012
Service Oriented Architecture (SOA )
                             This is what I need !!!




                  ©copyright Ankur Raina 2012
Social Media
We know…
• What are the people saying ?

But…
• Why are people saying what they are saying &
  behaving in the way they are behaving ?


                 ©copyright Ankur Raina 2012
• Super Bowl 2011 (4064 Ttps ,Feb 2011)
• Bin Laden’s death ( 5106 Ttps )
• Japan Earthquake ( 6939 Ttps )
• Paraghay’s football penalty shootout win over
  Brazil in the Copa America quarter-final
  peaked at 7166 Ttps
• Same day U.S match win in the FIFA women’s
  world cup -> 7196 Ttps
• Singer Beyonce’s pregnancy announcement
  (8868 Ttps )
                  ©copyright Ankur Raina 2012
• In-Motion Analytics ( Streams Computing )
• Using At Rest ( BigInsights)


                 ©copyright Ankur Raina 2012
HADOOP
• Creator: Doug Cutting
• Top-level Apache Project.
• Inspired by Google’s work on it GFS ( Google
  File System ).
• Function-to-data model & not data-to-
  function model.

                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop

                                       Hadoop
HDFS     Map Reduce                   Common
                                     Components


       ©copyright Ankur Raina 2012
Hadoop Distributed File System
• Data broken into blocks & distributed throughout the
  cluster.
• Data locality.
• Mean Time To Failure ( MTTF )
• Block size ( 64MB default )
• Higher block sizes available for longer files to reduce
  the amount of metadata. ( BigInsights 128 MB )
• Redundancy
• Name Node server
                     ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Map Reduce
• Map job which takes a set of data and
  converts it into another set of data where
  individual elements are broken down into
  tuples.
• Reduce job takes the output from a map as
  input & combines those data tuples into
  smaller set of tuples.

                 ©copyright Ankur Raina 2012
Map Reduce
•   Job
•   Tasks
•   Job Tracker
•   Task Tracker Agents
•   Shuffle
•   Combiner

                   ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop Common Components
• Set of libraries that support various Hadoop
  subprojects.
• /bin/hdfs dfs <args>
Command         Function


chmod           Changes the permissions for reading & writing to a given file/set
                of files.
chown           Changes the owner of a given file/set of files


copyFromLocal   Copies a file from the local file system into HDFS

                              ©copyright Ankur Raina 2012
Command       Function
copyToLocal   Copies a file from HDFS to the local file system.

cp            Copies HDFS files from one directory to another.

expunge       Empties all files that are in the trash.

cat           Copies the files to standard output.

ls            Displays a listing of files in a given directory.

mkdir         Creates a directory in HDFS.

mv            Moves files from one directory to another.

rm            Deletes a file 7 sends it to the trash. ( use –skiptrash option for
              deleting permanently).
                         ©copyright Ankur Raina 2012
References
• www.ibm.com
• www.hadoop.apache.org
• Understanding Big Data by Chris, Dirk, Tom,
  George & Paul ( McGraw Hill )
• Oracle Magazine



                 ©copyright Ankur Raina 2012

Big data

  • 2.
  • 3.
    • 3 millionlines of code are tracking your checked baggage. • A billion lines of code are included in the working of the latest airbus plane. • A billion transistors per person. • 4 billion mobile phone subscribers. • St. Anthony Falls Bridge ( Minneapolis) is fitted with 200 embedded sensors. ©copyright Ankur Raina 2012
  • 4.
  • 5.
    • 2001 8Lakh Petabytes of data • 2020 35 zettabytes of data ©copyright Ankur Raina 2012
  • 6.
    7 TB/day 10 TB/day ©copyright Ankur Raina 2012
  • 7.
    The Trouble beginshere… • 80% of the world’s information is unstructured. • Unstructured information is growing at 15 times the rate of structured information. ©copyright Ankur Raina 2012
  • 8.
  • 9.
    Contents • What is Big Data ? • The 3Vs. • What is a Big Data platform ? • Needle in a haystack problem. • Big Data & Social Media. • The Call Centre mantra. • ABCs of Hadoop. ©copyright Ankur Raina 2012
  • 10.
    Big Data The informationwhich cannot be processed/analyzed using the traditional processes or tools. • Instrumentation • Interconnection • M2M interconnectivity • Intelligent Machines ©copyright Ankur Raina 2012
  • 11.
  • 12.
    Big Data Platform •Lets you store the data in its native business object format & get value out of it through massive parallelism on readily available components. • It’s not a replacement of Data Warehouse. ©copyright Ankur Raina 2012
  • 13.
    Service Oriented Architecture(SOA ) This is what I need !!! ©copyright Ankur Raina 2012
  • 14.
    Social Media We know… •What are the people saying ? But… • Why are people saying what they are saying & behaving in the way they are behaving ? ©copyright Ankur Raina 2012
  • 15.
    • Super Bowl2011 (4064 Ttps ,Feb 2011) • Bin Laden’s death ( 5106 Ttps ) • Japan Earthquake ( 6939 Ttps ) • Paraghay’s football penalty shootout win over Brazil in the Copa America quarter-final peaked at 7166 Ttps • Same day U.S match win in the FIFA women’s world cup -> 7196 Ttps • Singer Beyonce’s pregnancy announcement (8868 Ttps ) ©copyright Ankur Raina 2012
  • 16.
    • In-Motion Analytics( Streams Computing ) • Using At Rest ( BigInsights) ©copyright Ankur Raina 2012
  • 17.
    HADOOP • Creator: DougCutting • Top-level Apache Project. • Inspired by Google’s work on it GFS ( Google File System ). • Function-to-data model & not data-to- function model. ©copyright Ankur Raina 2012
  • 18.
  • 19.
    Hadoop Hadoop HDFS Map Reduce Common Components ©copyright Ankur Raina 2012
  • 20.
    Hadoop Distributed FileSystem • Data broken into blocks & distributed throughout the cluster. • Data locality. • Mean Time To Failure ( MTTF ) • Block size ( 64MB default ) • Higher block sizes available for longer files to reduce the amount of metadata. ( BigInsights 128 MB ) • Redundancy • Name Node server ©copyright Ankur Raina 2012
  • 21.
  • 22.
    Map Reduce • Mapjob which takes a set of data and converts it into another set of data where individual elements are broken down into tuples. • Reduce job takes the output from a map as input & combines those data tuples into smaller set of tuples. ©copyright Ankur Raina 2012
  • 23.
    Map Reduce • Job • Tasks • Job Tracker • Task Tracker Agents • Shuffle • Combiner ©copyright Ankur Raina 2012
  • 24.
  • 25.
    Hadoop Common Components •Set of libraries that support various Hadoop subprojects. • /bin/hdfs dfs <args> Command Function chmod Changes the permissions for reading & writing to a given file/set of files. chown Changes the owner of a given file/set of files copyFromLocal Copies a file from the local file system into HDFS ©copyright Ankur Raina 2012
  • 26.
    Command Function copyToLocal Copies a file from HDFS to the local file system. cp Copies HDFS files from one directory to another. expunge Empties all files that are in the trash. cat Copies the files to standard output. ls Displays a listing of files in a given directory. mkdir Creates a directory in HDFS. mv Moves files from one directory to another. rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for deleting permanently). ©copyright Ankur Raina 2012
  • 27.
    References • www.ibm.com • www.hadoop.apache.org •Understanding Big Data by Chris, Dirk, Tom, George & Paul ( McGraw Hill ) • Oracle Magazine ©copyright Ankur Raina 2012