Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cred_hadoop_presenatation

144 views

Published on

  • Be the first to comment

  • Be the first to like this

Cred_hadoop_presenatation

  1. 1. Big Data
  2. 2. Where does Big Data come from  Web data  Social Media  Click stream data  Sensor data  Connected Device
  3. 3. Big Data Challenges  Size of Big data.  Unstructured or semi structured data.  Analyzing Big data.
  4. 4. How Hadoop solves the Big Data Problem  Hadoop is built on cluster of machines.  It handles unstructured and semi structured data.  Hadoop cluster can scale horizontally to meet storage requirements .  Hadoop clusters provide both storage as well as computation.
  5. 5. Solving Big data problems with Hadoop ENTERPRISE USE CASES
  6. 6. Retail  Challenges : Were higher priced items selling in certain markets ? Should inventory be re-allocated or price optimized based on geography ?
  7. 7. Manufacturing  Challenges: Monitor and predict network failure
  8. 8. Hadoop - Introduction HADOOP = HDFS + MAPREDUCE
  9. 9. Services in Hadoop  Namenode : Stores and maintains the metadata for HDFS  Secondary namenode : Performs housekeeping functions for namenode  Datanode : Stores actual HDFS data blocks  Jobtracker : Manages MapReduce jobs and distributes individual tasks to task trackers.  Tasktracker : Responsible to instantiate and monitor Map and reduce task.
  10. 10. Hadoop Cluster Architecture
  11. 11. Hadoop job management
  12. 12. Hadoop Fault tolernace  The Data stored in HDFS is replicated to more than one DataNode, so that even if one data node goes down we have copy of data on some other node.  The replication factor by default is 3 and is configurable  The namenode is Single Point of Failure in Cluster and hence the logs and metadata are periodically backed up to secondary namenode.
  13. 13. HDFS – Hadoop Distributed File System  Hadoop is the distributed file system for storing huge data sets on the cluster of commodity hardware with streaming data access pattern.
  14. 14. Map Reduce Concept
  15. 15. Hadoop Streaming API
  16. 16. Hadoop technology stack
  17. 17. Hadoop Ecosystems Introduction  Sqoop : Imports data from relational databases.  Flume : Collection and import of log and event data.  Map Reduce : Parallel computation on server clusters.  HDFS : Distributed redundant file system for Hadoop  Pig : High level programming language for Hadoop computations.  Hive : Data warehouse with SQL like access
  18. 18. Data Processing systems in Hadoop Batch Processing  Map Reduce Stream Processing  Apache Spark  Apache Storm
  19. 19. Thank you.

×