Windows Azure HDInsight        Service   Hadoop on Windows Azure        NEIL MACKENZIE
Who Am I? Neil Mackenzie Windows Azure Architect @ Satory Global Windows Azure MVP Blog: http://convective.wordpress.c...
Goals and Agenda Goals   Introduce Windows Azure HDInsight Service to the Windows    Azure developer   Introduce Window...
Big Data Problem:   How do we create value from enormous amounts of low-value    data? Solution:   Analyze it using a ...
Three Vs of Big Data Volume   How much data is there? Variety   What are the sources of the data? Velocity   How fas...
MapReduce Distributed computational model for data analysis.   Map function:        Processes a key-value pair to gener...
Apache Hadoop Modules:   Hadoop Distributed File System (HDFS)   MapReduce Related projects:   HBase – scalable, dist...
Windows Azure Compute   PaaS: Cloud Services, Windows Azure Web Sites   IaaS: Virtual Machines Storage   Windows Azur...
Windows Azure HDInsight Service Components:   HadoopCore – v1.0.1   HDFS & ASV   Pig – v0.9.3   Hive – v0.8.1   Sqoo...
Hadoop Administration Portal   http://www.hadooponazure.com   Apply to join preview   Create and manage Hadoop cluster...
Distributed File Systems HDFS   Contents deleted when cluster deleted ASV   Azure Storage Vault   Data stored in Wind...
Pig Hadoop feature to perform data-flow operations:   Execution environment   Language: Pig Latin Execution Environmen...
Pig Examplerecords = LOAD asv://flightdata/input/flightdata.txtAS(year:int, month:int, day:int, carrier:chararray, origin:...
Hive Hadoop feature to perform data warehouse  operations HiveQL     high-level, SQL-like language     Supports equi-j...
Hive ExampleFROM flightdata_asvINSERT OVERWRITE TABLE origin_countsSELECT origin, COUNT(*)GROUP BY originINSERT OVERWRITE ...
Sqoop Feature allowing import and export from SQL databases    Uses JDBC connector    Works with Windows Azure SQL Data...
Sqoop Example Exporting a table:sqoop.cmd export –connect"jdbc:sqlserver://sql_database_server.database.windows.net:1433;...
Excel and Hadoop on Azure Example of Microsoft business intelligence strategy   Expose Hadoop to existing tools HiveODB...
More Information Sign up for preview:  http://www.hadooponazure.com Support:  http://social.msdn.microsoft.com/Forums/en...
Summary Hadoop:   De-facto solution to the Big Data problem Windows Azure HDInsight Service   Native Hadoop implementa...
Windows Azure HDInsight Service
Upcoming SlideShare
Loading in...5
×

Windows Azure HDInsight Service

3,127

Published on

Describe the Hadoop features provided in Windows Azure HDInsight

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,127
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
92
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Windows Azure HDInsight Service

  1. 1. Windows Azure HDInsight Service Hadoop on Windows Azure NEIL MACKENZIE
  2. 2. Who Am I? Neil Mackenzie Windows Azure Architect @ Satory Global Windows Azure MVP Blog: http://convective.wordpress.com/ Twitter: @mknz Book: Microsoft Windows Azure Development Cookbook
  3. 3. Goals and Agenda Goals  Introduce Windows Azure HDInsight Service to the Windows Azure developer  Introduce Windows Azure to the Hadoop user  Not a tutorial on how to use Hadoop features Agenda  Big Data  Windows Azure  Windows Azure HDInsight Service
  4. 4. Big Data Problem:  How do we create value from enormous amounts of low-value data? Solution:  Analyze it using a lot of commodity hardware.
  5. 5. Three Vs of Big Data Volume  How much data is there? Variety  What are the sources of the data? Velocity  How fast is the data being generated?
  6. 6. MapReduce Distributed computational model for data analysis.  Map function:  Processes a key-value pair to generate intermediate pairs  Reduce function:  Merges all intermediate values with the same intermediate key. Map and reduce functions allocated to many compute nodes with data stored locally. Raw MapReduce functions are written in Java.
  7. 7. Apache Hadoop Modules:  Hadoop Distributed File System (HDFS)  MapReduce Related projects:  HBase – scalable, distributed database  Hive – data warehouse infrastructure  Mahout – scalable machine learning library  Pig – high-level data-flow language Other:  Sqoop –import and export to relational database
  8. 8. Windows Azure Compute  PaaS: Cloud Services, Windows Azure Web Sites  IaaS: Virtual Machines Storage  Windows Azure Storage Service: blobs, tables, queues  Windows Azure SQL Database  IaaS: Microsoft SQL Server, MongoDB, Cassandra, etc. Connectivity  HTTP, TCP, UDP, Site-to-Site VPN Administration  Portal, Service Management API
  9. 9. Windows Azure HDInsight Service Components:  HadoopCore – v1.0.1  HDFS & ASV  Pig – v0.9.3  Hive – v0.8.1  Sqoop – v1.4.2  Excel/Hive Note: this was formerly known as Hadoop on Azure.
  10. 10. Hadoop Administration Portal  http://www.hadooponazure.com  Apply to join preview  Create and manage Hadoop cluster  3 nodes for 5 days  Access the Interactive console  Hive  Invoke Hive statements  JavaScript  Invoke HDFS commands  Invoke Hive & Pig statements
  11. 11. Distributed File Systems HDFS  Contents deleted when cluster deleted ASV  Azure Storage Vault  Data stored in Windows Azure Blob Storage  Configured on Hadoop on Azure portal  Contents survive deletion of Hadoop cluster  Supports multi-level structure, e.g.:  containername/input/file1
  12. 12. Pig Hadoop feature to perform data-flow operations:  Execution environment  Language: Pig Latin Execution Environment  Local in local JVM or distributed on Hadoop cluster Pig Latin  High-level language  Describes data-flow operations  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly
  13. 13. Pig Examplerecords = LOAD asv://flightdata/input/flightdata.txtAS(year:int, month:int, day:int, carrier:chararray, origin:chararray, dest:chararray, depdelay:int, arrdelay:int);modified_records = FOREACH recordsGENERATE origin, depdelay;STORE modified_recordsINTO my_output using PigStorage(,);
  14. 14. Hive Hadoop feature to perform data warehouse operations HiveQL  high-level, SQL-like language  Supports equi-joins  Schema on read NOT schema on write  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly Metadata store  Contains descriptions of tables
  15. 15. Hive ExampleFROM flightdata_asvINSERT OVERWRITE TABLE origin_countsSELECT origin, COUNT(*)GROUP BY originINSERT OVERWRITE TABLE dest_countsSELECT dest, COUNT(*)GROUP BY dest
  16. 16. Sqoop Feature allowing import and export from SQL databases  Uses JDBC connector  Works with Windows Azure SQL Database  Table must exist before export
  17. 17. Sqoop Example Exporting a table:sqoop.cmd export –connect"jdbc:sqlserver://sql_database_server.database.windows.net:1433;database=sql_database_instance;user=sqoop_login@sql_database_server;password=sqoop_login_password"--table sql_database_table--export-dir "/user/hive/warehouse/hive_table"--input-fields-terminated-by "001"
  18. 18. Excel and Hadoop on Azure Example of Microsoft business intelligence strategy  Expose Hadoop to existing tools HiveODBC connector for Excel  Create Hive queries from Excel  Invoke them from Excel
  19. 19. More Information Sign up for preview: http://www.hadooponazure.com Support: http://social.msdn.microsoft.com/Forums/en-US/hdinsight Avkash Chauhan’s blog: http://blogs.msdn.com/b/avkashchauhan/archive/tags/hadoop Roger Jennings’ blog: http://oakleafblog.blogspot.com/2012/04/using-data-in- windows-azure-blobs-with.html
  20. 20. Summary Hadoop:  De-facto solution to the Big Data problem Windows Azure HDInsight Service  Native Hadoop implementation  Managed Hadoop service for Windows Azure  Currently in preview
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×