July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop


Published on

MapR makes Hadoop a more open platform by supporting industry-standard interfaces, including NFS and ODBC. The NFS interface enables users to leverage standard file-based applications, and makes it easier to get data into and out of the cluster, while the ODBC interface enables users to leverage standard BI tools and query builders. This talk covers the motivation for supporting industry-standard interfaces as well as several real-world use cases. In addition, this talk explains the technical details behind these capabilities and how they actually work.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

  1. 1. Using Standard File-Based Applications and SQL-Based Tools with Hadoop©MapR Technologies - Confidential 1
  2. 2. http://info.mapr.com/HUG-7-2012 Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies©MapR Technologies - Confidential 2
  3. 3. The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution – Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr©MapR Technologies - Confidential 3
  4. 4. MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation©MapR Technologies - Confidential 4
  5. 5. Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs©MapR Technologies - Confidential 5
  6. 6. Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders©MapR Technologies - Confidential 6
  7. 7. NFS©MapR Technologies - Confidential 7
  8. 8. Your Data is Your Data HDFS-based Hadoop distributions do not (cannot) support NFS Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries? Access to HDFS source code != access to your data©MapR Technologies - Confidential 8
  9. 9. The NFS Protocol RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; Very simple protocol offset3 offset; count3 count; stable_how stable; Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset; HDFS does not support count3 count; random writes so it }; cannot support NFS©MapR Technologies - Confidential 9
  10. 10. S3 o.a.h.fs.s3native.NativeS3FileSystem©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem10 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  11. 11. One NFS Gateway©MapR Technologies - Confidential 11
  12. 12. Multiple NFS Gateways©MapR Technologies - Confidential 12
  13. 13. Multiple NFS Gateways with Load Balancing©MapR Technologies - Confidential 13
  14. 14. Multiple NFS Gateways with NFS HA (VIPs)©MapR Technologies - Confidential 14
  15. 15. Customer Examples: Import/Export Data Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS Online measurement company – Clickstreams from application servers are streamed into the cluster SaaS company – Exporting a database to Hadoop over NFS Ad exchange – Bids and transactions are streamed into the cluster©MapR Technologies - Confidential 15
  16. 16. Customer Examples: Productivity and Operations Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, … Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster Email marketing company – Object store with HBase and NFS©MapR Technologies - Confidential 16
  17. 17. ODBC©MapR Technologies - Confidential 17
  18. 18. ODBC ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, … Advanced ODBC drivers use the latest 3.52 specification©MapR Technologies - Confidential 18
  19. 19. MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux Enables direct SQL access to MapR-stored data by translating SQL to HiveQL SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query©MapR Technologies - Confidential 19
  20. 20. Example: Tableau©MapR Technologies - Confidential 20
  21. 21. Example: Tableau©MapR Technologies - Confidential 21
  22. 22. Example: Open source query builder (Kaimon)©MapR Technologies - Confidential 22
  23. 23. Example: Microsoft Excel©MapR Technologies - Confidential 23
  24. 24. Join MapR Join the fastest growing Hadoop company Open positions in every discipline – Engineers – Solution Architects – Product Management Email jobs@mapr.com©MapR Technologies - Confidential 24
  25. 25. Time for Questions Download slides or send me an email – http://info.mapr.com/HUG-7-2012 Download MapR to learn more – www.mapr.com/download©MapR Technologies - Confidential 25