Big data and HadoopLearn how Hadoop deals problems associated with Big data AnalysisSrikanth M V
There are 30 billion pieces of contentshared on Facebook every day.
Wal-Mart handles more than 1million customer transactions anhour.
More than 5 billion people arecalling, texting, tweeting andbrowsing websites usingSmart phones.
The 3-Vs of Big DataVolumeGiga Bytes, Tera Bytes,Peta Bytes or ZetaBytes….VelocityThe rate at which dataflows into anorgan...
So What is Big Data?• Big data is large and complex data sets collectedfrom various sources like Sensors, Social Media,Sat...
Problem:Storing and Analyzing“Big Data”
Solution*:Move compute to data*One among many, but Hadoop is flexible, Simple and reliable.Hey there, I’mHadoop and Ican d...
• Created: 2005• Creators: Doug Cutting and Mike Cafarella• Contributors: Apache, Yahoo, Google• Language: Java
How Hadoop deals with “Big data”• Primary Components• HDFS – Hadoop Distributed File System• Map Reduce• Hadoop YARN• Job ...
HDFS• Distributed, scalable,reliable and portable filesystem.• Hadoop Cluster is a set ofData Nodes and a NameNode• Client...
Map Reduce• Divide and Conquer• Parallel Computing• Map(): Perform Sorting &Filtering• Reduce(): PerformSummary Operation•...
Hadoop Architecture
Hadoop Secondary Components• Ambari• Web Tool for provisioning, managing and monitoring Clusters• Hbase• Scalable distribu...
Real World Example of Big dataAnalytics using HadoopMySqlDatabase172 3561. Users interact with Facebook using data in text...
Why should an Enterprise move to BigData Analytics?• Enterprises will be able toharness relevant data anduse it to make th...
What is in it for us?
Hadoop on Cloud• Provision Scalable Storage for storing Big data as Blobs– PAAS• Provision Linux VMs on Cloud – IAAS• Lang...
Thank you• Questions ?Vishwanath.srikanth@gmail.comhttp://Vishwanathsrikanth.wordpress.com
Big data and hadoop
Upcoming SlideShare
Loading in …5
×

Big data and hadoop

659 views

Published on

An Introduction to Big data and problems associated with storing and analyzing big data and How Hadoop solves the problem with its HDFS and MapReduce frameworks. A little intro to HDInsight, Hadoop on windows azure.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
659
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
56
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big data and hadoop

  1. 1. Big data and HadoopLearn how Hadoop deals problems associated with Big data AnalysisSrikanth M V
  2. 2. There are 30 billion pieces of contentshared on Facebook every day.
  3. 3. Wal-Mart handles more than 1million customer transactions anhour.
  4. 4. More than 5 billion people arecalling, texting, tweeting andbrowsing websites usingSmart phones.
  5. 5. The 3-Vs of Big DataVolumeGiga Bytes, Tera Bytes,Peta Bytes or ZetaBytes….VelocityThe rate at which dataflows into anorganizationVarietyStructured andUnstructured
  6. 6. So What is Big Data?• Big data is large and complex data sets collectedfrom various sources like Sensors, Social Media,Satellite images, Audio, Video, RFID etc.• Big data is data that exceeds the processingcapacity of conventional database systems.• How ‘Big’ is big? GB, TB, PB , ZB?? NO.. Data is Big when the organization’s ability to handle,store and analyze exceeds its capacity.
  7. 7. Problem:Storing and Analyzing“Big Data”
  8. 8. Solution*:Move compute to data*One among many, but Hadoop is flexible, Simple and reliable.Hey there, I’mHadoop and Ican do thatfor you..
  9. 9. • Created: 2005• Creators: Doug Cutting and Mike Cafarella• Contributors: Apache, Yahoo, Google• Language: Java
  10. 10. How Hadoop deals with “Big data”• Primary Components• HDFS – Hadoop Distributed File System• Map Reduce• Hadoop YARN• Job Scheduling and Resource Management• Hadoop Common• Access to file system
  11. 11. HDFS• Distributed, scalable,reliable and portable filesystem.• Hadoop Cluster is a set ofData Nodes and a NameNode• Client divides the data toprocess, into blocks• Each block of data isreplicated in 3 Nodes*• More Nodes, MoreEfficiency.• Robust - Relies on Softwareinstead of hardwareHDFS ClusterServerData Node Name NodeServerData NodeServerData NodeServerData NodeMasterSlavesB1 B2 B3Somefile.txtB1B2B3B2B3B1B3B1B3
  12. 12. Map Reduce• Divide and Conquer• Parallel Computing• Map(): Perform Sorting &Filtering• Reduce(): PerformSummary Operation• Each node has Task trackerwhich communicates withJob Tracker.• The output files will beavailable as local files onclient.
  13. 13. Hadoop Architecture
  14. 14. Hadoop Secondary Components• Ambari• Web Tool for provisioning, managing and monitoring Clusters• Hbase• Scalable distributed database that supports structured data for large tables• Zoo Keeper– A High performance coordination service for distributed applications• Pig– A High level data flow language and execution framework for parallel computation• Hive– A Data warehouse infrastructure that provides data summarization and ad hoc querying• Cassandra– A scalable multi master database with no single point failures• Chukwa– A data collection system for managing large distributed systems• Lucene and Solr– Search engines, currently not part of Hadoop
  15. 15. Real World Example of Big dataAnalytics using HadoopMySqlDatabase172 3561. Users interact with Facebook using data in textual, image, video formats.2. Facebook transfers the core data to My SQL database.3. My SQL data is replicated to Hadoop clusters.4. Data is processed using Hadoop MapReduce functions5. The results are transferred back to My SQL6. Facebook uses the data to create recommendations for you based onyour interests.4Other users:
  16. 16. Why should an Enterprise move to BigData Analytics?• Enterprises will be able toharness relevant data anduse it to make the bestdecisions– Increasing the redemptionrate– Determine optimum prices– Calculate risks in a minute,and understand futurepossibilities to mitigate risk– Enabling new products– Identifying patterns helpidentify trends in businessThe key lies in collecting quality data, not quantity.
  17. 17. What is in it for us?
  18. 18. Hadoop on Cloud• Provision Scalable Storage for storing Big data as Blobs– PAAS• Provision Linux VMs on Cloud – IAAS• Language support for JS and C#• Business Intelligence – Connect MS Excel to Hadoop Hive• Remote Access to Hadoop Jobs via REST API, WebHCat REST API.• Easy to access Management Portal for monitoring Hadoop Jobs• .NET SDK to execute Hive Jobs on HDInsight• …..More+
  19. 19. Thank you• Questions ?Vishwanath.srikanth@gmail.comhttp://Vishwanathsrikanth.wordpress.com

×