More Related Content
More from Ankur Raina (6)
Big data
- 3. • 3 million lines of code are tracking your
checked baggage.
• A billion lines of code are included in the
working of the latest airbus plane.
• A billion transistors per person.
• 4 billion mobile phone subscribers.
• St. Anthony Falls Bridge ( Minneapolis) is fitted
with 200 embedded sensors.
©copyright Ankur Raina 2012
- 5. • 2001
8 Lakh Petabytes of data
• 2020
35 zettabytes of data
©copyright Ankur Raina 2012
- 6. 7 TB/day
10 TB/day
©copyright Ankur Raina 2012
- 7. The Trouble begins here…
• 80% of the world’s information is
unstructured.
• Unstructured information is growing at 15
times the rate of structured information.
©copyright Ankur Raina 2012
- 9. Contents
• What is Big Data ?
• The 3Vs.
• What is a Big Data platform ?
• Needle in a haystack problem.
• Big Data & Social Media.
• The Call Centre mantra.
• ABCs of Hadoop.
©copyright Ankur Raina 2012
- 10. Big Data
The information which cannot be
processed/analyzed using the
traditional processes or tools.
• Instrumentation
• Interconnection
• M2M interconnectivity
• Intelligent Machines
©copyright Ankur Raina 2012
- 12. Big Data Platform
• Lets you store the data in its native business
object format & get value out of it through
massive parallelism on readily available
components.
• It’s not a replacement of Data Warehouse.
©copyright Ankur Raina 2012
- 14. Social Media
We know…
• What are the people saying ?
But…
• Why are people saying what they are saying &
behaving in the way they are behaving ?
©copyright Ankur Raina 2012
- 15. • Super Bowl 2011 (4064 Ttps ,Feb 2011)
• Bin Laden’s death ( 5106 Ttps )
• Japan Earthquake ( 6939 Ttps )
• Paraghay’s football penalty shootout win over
Brazil in the Copa America quarter-final
peaked at 7166 Ttps
• Same day U.S match win in the FIFA women’s
world cup -> 7196 Ttps
• Singer Beyonce’s pregnancy announcement
(8868 Ttps )
©copyright Ankur Raina 2012
- 16. • In-Motion Analytics ( Streams Computing )
• Using At Rest ( BigInsights)
©copyright Ankur Raina 2012
- 17. HADOOP
• Creator: Doug Cutting
• Top-level Apache Project.
• Inspired by Google’s work on it GFS ( Google
File System ).
• Function-to-data model & not data-to-
function model.
©copyright Ankur Raina 2012
- 19. Hadoop
Hadoop
HDFS Map Reduce Common
Components
©copyright Ankur Raina 2012
- 20. Hadoop Distributed File System
• Data broken into blocks & distributed throughout the
cluster.
• Data locality.
• Mean Time To Failure ( MTTF )
• Block size ( 64MB default )
• Higher block sizes available for longer files to reduce
the amount of metadata. ( BigInsights 128 MB )
• Redundancy
• Name Node server
©copyright Ankur Raina 2012
- 22. Map Reduce
• Map job which takes a set of data and
converts it into another set of data where
individual elements are broken down into
tuples.
• Reduce job takes the output from a map as
input & combines those data tuples into
smaller set of tuples.
©copyright Ankur Raina 2012
- 23. Map Reduce
• Job
• Tasks
• Job Tracker
• Task Tracker Agents
• Shuffle
• Combiner
©copyright Ankur Raina 2012
- 25. Hadoop Common Components
• Set of libraries that support various Hadoop
subprojects.
• /bin/hdfs dfs <args>
Command Function
chmod Changes the permissions for reading & writing to a given file/set
of files.
chown Changes the owner of a given file/set of files
copyFromLocal Copies a file from the local file system into HDFS
©copyright Ankur Raina 2012
- 26. Command Function
copyToLocal Copies a file from HDFS to the local file system.
cp Copies HDFS files from one directory to another.
expunge Empties all files that are in the trash.
cat Copies the files to standard output.
ls Displays a listing of files in a given directory.
mkdir Creates a directory in HDFS.
mv Moves files from one directory to another.
rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for
deleting permanently).
©copyright Ankur Raina 2012