Upcoming SlideShare
Loading in...5




Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

Using Standard File-Based Applications and SQL-Based Tools with Hadoop.



Total Views
Views on SlideShare
Embed Views



4 Embeds 310 162 126 21 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

NFS and ODBC NFS and ODBC Presentation Transcript

  • Using Standard File-Based Applications and SQL-Based Tools with Hadoop©MapR Technologies - Confidential 1
  • Tomer Shiran Director of Product Management, MapR Technologies©MapR Technologies - Confidential 2
  • The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution – Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) –©MapR Technologies - Confidential 3
  • Recent News Amazon selects MapR to provide the enterprise-grade Hadoop distribution in EMR Google selects MapR to provide Hadoop on Google Compute Engine MapR launched open source Apache Drill project inspired by Google Dremel – Low latency queries©MapR Technologies - Confidential 4
  • MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation©MapR Technologies - Confidential 5
  • Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs©MapR Technologies - Confidential 6
  • Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders©MapR Technologies - Confidential 7
  • NFS©MapR Technologies - Confidential 8
  • Your Data is Your Data HDFS-based Hadoop distributions do not (cannot) support NFS Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries?©MapR Technologies - Confidential 9
  • The NFS Protocol RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; Very simple protocol offset3 offset; count3 count; stable_how stable; Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset; HDFS does not support count3 count; random writes so it }; cannot support NFS©MapR Technologies - Confidential 10
  • S3 o.a.h.fs.s3native.NativeS3FileSystem©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem11 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  • One NFS Gateway©MapR Technologies - Confidential 12
  • Multiple NFS Gateways©MapR Technologies - Confidential 13
  • Multiple NFS Gateways with Load Balancing©MapR Technologies - Confidential 14
  • Multiple NFS Gateways with NFS HA (VIPs)©MapR Technologies - Confidential 15
  • Customer Examples: Import/Export Data Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS Online measurement company – Clickstreams from application servers are streamed into the cluster SaaS company – Exporting a database to Hadoop over NFS Ad exchange – Bids and transactions are streamed into the cluster©MapR Technologies - Confidential 16
  • Customer Examples: Productivity and Operations Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, … Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster Email marketing company – Object store with HBase and NFS©MapR Technologies - Confidential 17
  • ©MapR Technologies - Confidential 18
  • MapR Roadmap: What?©MapR Technologies - Confidential 19
  • ODBC©MapR Technologies - Confidential 20
  • ODBC ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, … Advanced ODBC drivers use the latest 3.52 specification©MapR Technologies - Confidential 21
  • MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux Enables direct SQL access to MapR-stored data by translating SQL to HiveQL SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query©MapR Technologies - Confidential 22
  • Example: Tableau©MapR Technologies - Confidential 23
  • Example: Tableau©MapR Technologies - Confidential 24
  • Example: Open source query builder (Kaimon)©MapR Technologies - Confidential 25
  • Example: Microsoft Excel©MapR Technologies - Confidential 26
  • Time for Questions Download slides or send me an email – Download MapR to learn more – Contact EMC Greenplum Japan – Yoshiaki Hirabayashi – – Akihiko Kusanagi –©MapR Technologies - Confidential 27