July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop
Upcoming SlideShare
Loading in...5
×
 

July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

on

  • 1,837 views

MapR makes Hadoop a more open platform by supporting industry-standard interfaces, including NFS and ODBC. The NFS interface enables users to leverage standard file-based applications, and makes it ...

MapR makes Hadoop a more open platform by supporting industry-standard interfaces, including NFS and ODBC. The NFS interface enables users to leverage standard file-based applications, and makes it easier to get data into and out of the cluster, while the ODBC interface enables users to leverage standard BI tools and query builders. This talk covers the motivation for supporting industry-standard interfaces as well as several real-world use cases. In addition, this talk explains the technical details behind these capabilities and how they actually work.

Statistics

Views

Total Views
1,837
Views on SlideShare
1,837
Embed Views
0

Actions

Likes
1
Downloads
33
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop Presentation Transcript

  • Using Standard File-Based Applications and SQL-Based Tools with Hadoop©MapR Technologies - Confidential 1
  • http://info.mapr.com/HUG-7-2012 Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies©MapR Technologies - Confidential 2
  • The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution – Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr©MapR Technologies - Confidential 3
  • MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation©MapR Technologies - Confidential 4
  • Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs©MapR Technologies - Confidential 5
  • Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders©MapR Technologies - Confidential 6
  • NFS©MapR Technologies - Confidential 7
  • Your Data is Your Data HDFS-based Hadoop distributions do not (cannot) support NFS Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries? Access to HDFS source code != access to your data©MapR Technologies - Confidential 8
  • The NFS Protocol RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; Very simple protocol offset3 offset; count3 count; stable_how stable; Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset; HDFS does not support count3 count; random writes so it }; cannot support NFS©MapR Technologies - Confidential 9
  • S3 o.a.h.fs.s3native.NativeS3FileSystem©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem10 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  • One NFS Gateway©MapR Technologies - Confidential 11
  • Multiple NFS Gateways©MapR Technologies - Confidential 12
  • Multiple NFS Gateways with Load Balancing©MapR Technologies - Confidential 13
  • Multiple NFS Gateways with NFS HA (VIPs)©MapR Technologies - Confidential 14
  • Customer Examples: Import/Export Data Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS Online measurement company – Clickstreams from application servers are streamed into the cluster SaaS company – Exporting a database to Hadoop over NFS Ad exchange – Bids and transactions are streamed into the cluster©MapR Technologies - Confidential 15
  • Customer Examples: Productivity and Operations Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, … Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster Email marketing company – Object store with HBase and NFS©MapR Technologies - Confidential 16
  • ODBC©MapR Technologies - Confidential 17
  • ODBC ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, … Advanced ODBC drivers use the latest 3.52 specification©MapR Technologies - Confidential 18
  • MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux Enables direct SQL access to MapR-stored data by translating SQL to HiveQL SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query©MapR Technologies - Confidential 19
  • Example: Tableau©MapR Technologies - Confidential 20
  • Example: Tableau©MapR Technologies - Confidential 21
  • Example: Open source query builder (Kaimon)©MapR Technologies - Confidential 22
  • Example: Microsoft Excel©MapR Technologies - Confidential 23
  • Join MapR Join the fastest growing Hadoop company Open positions in every discipline – Engineers – Solution Architects – Product Management Email jobs@mapr.com©MapR Technologies - Confidential 24
  • Time for Questions Download slides or send me an email – http://info.mapr.com/HUG-7-2012 Download MapR to learn more – www.mapr.com/download©MapR Technologies - Confidential 25