Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

xGem BigData

326 views

Published on

WhatisBig Data?
Big Data Technologies
WhatisHadoop?
Big Data Components
HadoopDistributions
HortonWorkdData Platform
Log Analyzed

Published in: Technology
  • Be the first to comment

  • Be the first to like this

xGem BigData

  1. 1. XGem
  2. 2. XGem Big Data
  3. 3. Agenda  What is Big Data?  Big Data Technologies  What is Hadoop?  Big Data Components  Hadoop Distributions  HortonWorkd Data Platform  Log Analyzed
  4. 4. What is Big Data? Ernst and Young offers the following definition: Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools and machines. It requires new, innovative, and scalable technology to collect, host and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management and enhanced shareholder value. The research firm Gartner, defines Big Data as follows: Big Data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation.
  5. 5. 5V’s del Big Data BigData 3 Variety is the diversity of the data. We have structured data that fits neatly into rows and columns, or relational databases and unstructured data that is not organized in a pre-defined way, for example Tweets, blogposts, pictures, numbers, and even video data. Variety 1 Velocity Velocity is the idea that data is being generated extremely fast, a process that never stops. Attributes include near or real-time streaming and local and cloud-based technologies that can process information very quickly. 4 Veracity is the conformity to facts and accuracy. Is the information real, or is it false? Veracity 2 Volume Volume is the scale of the data, or the increase in the amount of data stored. 5 VALUE
  6. 6. Big Data Value isn't just profit. It may be medical or social benefits, or customer, employee, personal satisfaction or crime prevention. The main reasons for why people invest time to understand Big Data is to derive value from it. VALUE
  7. 7. Big Data Technologies
  8. 8. What is Apache Hadoop? • Hadoop is an open-source software framework used to store and process huge amounts of data. • Owned by Apache Software Foundation • Transforms commodity hardware into a service that: • Stores petabytes of data reliably (HDFS) • Allows huge distributed computations (MapReduce) • Key attributes: • Redundant and reliable • Doesn’t stop or lose data even if hardware fails • Easy to program • Extremely powerful • Allows the development of big data algorithms & tools • Batch processing centric • Runs on commodity hardware • Computers & network
  9. 9. Who build Hadoop?
  10. 10. Who use Hadoop? 2006 2008 2009 2010 The Datagraph Blog 2007
  11. 11. How HDFS Works? Namenode Persistent Namespace Metadata & Journal Namespace State Block Map Heartbeats & Block Reports Block ID  Block Locations Datanodes Block ID  Data Hierarchal Namespace File Name  BlockIDs Horizontally Scale IO and Storage b1 b5 b3 JBOD BlockStorageNamespace b2 b3 b1 JBOD b3 b5 b2 JBOD b1 b5 b2 JBOD
  12. 12. HDFS Data Reliability Namenode Namespace State Block Map b1 b5 b3 JBOD b2 b3 b4 JBOD b3 b5 b2 JBOD b1b5 b2 JBOD 2. copy 3. blockReceived 1. replicate Bad/lost block replica Periodically check block checksums
  13. 13. What is the Hadoop framework?
  14. 14. Hadoop framework Components
  15. 15. Hadoop Distributions
  16. 16. Agenda Hortonworks Solutions
  17. 17. Log Analytics Systems Today LOG ANALYTICS PLATFORMNetwork Device Logs • Not all data can be captured • Not all captured data is valuable • Transport all data
  18. 18. LOG ANALYTICS PLATFORM Network Device Logs HDP HDF 2. Content-based routing based on dynamic evaluation of content, attributes, priority 1. Integrate and enrich logs across data centers and security zones 3. Cost effectively expand collection and grow timescale of logs collected Expand Storage Options of Log Data
  19. 19. Thanks!

×