Hd insight essentials quick view
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hd insight essentials quick view

on

  • 370 views

These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book

These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book

Statistics

Views

Total Views
370
Views on SlideShare
327
Embed Views
43

Actions

Likes
0
Downloads
3
Comments
1

4 Embeds 43

http://hadoopsimplified.blogspot.com 32
https://hadoopsimplified.blogspot.com 9
http://hadoopsimplified.blogspot.in 1
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hd insight essentials quick view Presentation Transcript

  • 1. HDInsight  Essentials  ISBN  :  1849695369    /  ISBN  13  :  9781849695367   Rajesh  Nadipalli   05/01/2014  
  • 2. Goals  of  this  Book   • Focus  on  Microso'’s  new  Hadoop   distribu=on   • Serve  as  Quick  Reference   • Provide  an  Overview  of  Hadoop   • Address  both  cloud  and  on-­‐premise  setup   for  HDInsight   • Highlight  HDInsight  differen:ator     • Provide  Prac=cal  &  Real  world  examples  
  • 3. Book  Table  of  Contents   •  Chapter  1:    HDInsight  in  a  Heartbeat   •  Chapter  2:    Deployment  HDInsight  on  premise   •  Chapter  3:    HDInsight  Azure  cloud  service   •  Chapter  4:    Administer  your  cluster   •  Chapter  5:    Ingest  data  to  your  cluster   •  Chapter  6:    Transform  data  in  your  cluster   •  Chapter  7:    Analyze  &  Report  data  from  cluster   •  Chapter  8:    Project  Planning  &                                              Architectural  Considera=ons  
  • 4. CHAPTER  1  HIGHLIGHTS:     HDINSIGHT  IN  A  HEARTBEAT  
  • 5. Big  Data  Problem  Characteristics    
  • 6. Hadoop  Overview   Self Healing Distributed Storage Fault Tolerant Distributed Computing + Abstraction for Parallel Processing CORE HADOOP COMPONENTS •  HDFS:  Distributed   Storage  –  replicated,   self-­‐healing  and   scalable     •  MapReduce:    Parallel   Processing,  process   local  data  for  efficiency      
  • 7. NameNode JobTracker TaskTracker     TaskTracker     TaskTracker    MapReduce   Layer   Distributed     File  System   Layer   Secondary NameNode Master  Node   Slaves  Nodes   DataNode     DataNode     DataNode     Hadoop  Nodes  Layout  
  • 8. Data  Sources         RDBMS     Databases   Audio,     Images   Log  Files   Sensors,     RFID   Social     Media,  Feeds     Hadoop  Data  Store           HDFS   Hbase    (NOSQL  DB)     Data  Processing         Mapreduce     Data  Access         Hive   Pig   Mahout     Machine  Learning   Flume,  Sqoop   Excel   Business     Data  Feeds   Zookeeper  (Distributed  Process  Management)   Hcatalog  (Metadata  on  Pig,  Hive,  MapReduce  )   Oozie     Workflow,  Scheduler   Infrastructure  ,  Opera:ons   (Monitoring,  Configura<on)   Hadoop  Eco  System  
  • 9. Collect & Import to HDFS Process (MapReduce) Analyze (BI Tools) Report & Publish End  to  End  Solution  on  Hadoop  
  • 10. Popular  Hadoop  Distributions   •  Amazon  Elas=c  MapReduce  (cloud,  hbp://aws.amazon.com/ elas=cmapreduce/)     •  Cloudera  ( hbp://www.cloudera.com/content/cloudera/en/home.html)     •  EMC  PivitolHD  (hbp://gopivotal.com/)     •  Hortonworks  HDP  (hbp://hortonworks.com/)     •  MapR  (hbp://mapr.com/)     •  Microsod  HDInsight  (cloud,  hbp://www.windowsazure.com/)  
  • 11. HDInsight  Differenciator   •  Enterprise-­‐ready  Hadoop  backed  by  Microsod     •  Analy:cs  using  Excel   •  Integra=on  with  Ac=ve  Directory.       •  Integra=on  with  .NET  and  Javascript     •  Connectors  to  RDBMS     •  Scale  using  cloud  offering:    Azure  HDInsight  service  enables  customers   to  scale  quickly  and  has  seamless  interface  between  HDFS  and  Azure   Storage  Vault     •  JavaScript  Console  
  • 12. WordCount  in  HDInsight  
  • 13. CHAPTER  2  HIGHLIGHTS:     HDINSIGHT  INSTALL  ON  PREMISE  
  • 14. Apache  Hadoop         •  Open  Source  Sodware   •  Community  Development       Hortonworks  Data  PlaSorm         •  Enterprise  Hadoop  Plagorm  (HDP)   •  Leaders  in  Hadoop   •  Code  commibers  to  Hadoop   Microso'  HDInsight         •  Built  on  top  of  HDP   •  Integra=on  with  ASV,  Excel,  Powerview,   SQLServer,  Ac=ve  Directory       HDInsight  Distribution  
  • 15. Physical  Install  Options   NN          SNN            JT   DN    /  TT   Single  node  for  development/test       Mul=  node  for  produc=on      
  • 16. Multi  Node  Install  Steps   •  Pre-­‐requisites   •  Networking  Setup   •  Remote  Scrip=ng   •  Firewall  Setup   •  Sodware  Install  (each  node)   •  Hadoop  Configura=on   •  Verifica=on  
  • 17. CHAPTER  3  HIGHLIGHTS:     HDINSIGHT  AZURE  SERVICE  
  • 18. Azure  Cloud  Service   Create  Storage   Create  HDInsight   cluster  
  • 19. CHAPTER  4  HIGHLIGHTS:     ADMINISTER  YOUR  CLUSTER  
  • 20. HDInsight  Cluster  Management  
  • 21. HDInsight  Dashboard  
  • 22. HDInsight  Dashboard  
  • 23. NameNode  Status  
  • 24. Jobtracker  Status  
  • 25. CHAPTER  5  HIGHLIGHTS:     INGEST  DATA  INTO  YOUR  CLUSTER  
  • 26. Loading  Data  into  your  Cluster   You  have  following  op=ons…     •  Loading  data  using  Hadoop  commands   •  Loading  data  using  Azure  Storage  Vault   •  Loading  data  using  Interac:ve  JavaScript     •  Shipping  data  to  your  Cluster   •  Loading  data  from  RDBMS  via  Sqoop  
  • 27. Loading  via  Azure  Storage  Explorer  
  • 28. CHAPTER  6  HIGHLIGHTS:     TRANSFORM  YOUR  DATA  
  • 29. Transforming  Data   You  have  following  op=ons…     •  MapReduce   •  Hive   •  Pig   •  Others  
  • 30. Processing  Data  in  Cluster   Map for Jan2012 Map for Feb2012 Map for Apr2013 …   One Reducer
  • 31. HDFS   Hive   JDBC/OBDC Metastore Thrift Server Command LineWeb GUI Driver (Parser, Planner, Executor) MapReduce   Hive  
  • 32. Raw  Data  in  HDFS   •  Distributed   Storage   •  Reliable   Data  Processing  via  Pig   •  Pipelines   •  Itera=ve  Processing   •  Research   Data   Warehouse   HDFS   Data  Warehouse  via  Hive   •  BI  Tools   •  Analysis   Hive  or  Pig?  
  • 33. CHAPTER  7  HIGHLIGHTS:     ANALYZE  &  REPORT  
  • 34. Analyze  using  Excel  
  • 35. Analyze  using  Excel  
  • 36. CHAPTER  8:     PROJECT  PLANNING  &  ARCHITECTURAL   CONSIDERATIONS  
  • 37. Execu:ve  &   Stakeholder     Buy-­‐in   Discovery  &   Analysis   Design   Implementa:on  User  Acceptance   Produc:on   Opera:ons   Feedback,  New   Requirements