• Like
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Using Standard File-Based Applications and SQL-Based Tools with Hadoop©MapR Technologies - Confidential 1
  • 2. http://info.mapr.com/Japan-HUG-8-2012 Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies©MapR Technologies - Confidential 2
  • 3. The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution – Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr©MapR Technologies - Confidential 3
  • 4. Recent News Amazon selects MapR to provide the enterprise-grade Hadoop distribution in EMR Google selects MapR to provide Hadoop on Google Compute Engine MapR launched open source Apache Drill project inspired by Google Dremel – Low latency queries©MapR Technologies - Confidential 4
  • 5. MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation©MapR Technologies - Confidential 5
  • 6. Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs©MapR Technologies - Confidential 6
  • 7. Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders©MapR Technologies - Confidential 7
  • 8. NFS©MapR Technologies - Confidential 8
  • 9. Your Data is Your Data HDFS-based Hadoop distributions do not (cannot) support NFS Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries?©MapR Technologies - Confidential 9
  • 10. The NFS Protocol RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; Very simple protocol offset3 offset; count3 count; stable_how stable; Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset; HDFS does not support count3 count; random writes so it }; cannot support NFS©MapR Technologies - Confidential 10
  • 11. S3 o.a.h.fs.s3native.NativeS3FileSystem©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem11 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  • 12. One NFS Gateway©MapR Technologies - Confidential 12
  • 13. Multiple NFS Gateways©MapR Technologies - Confidential 13
  • 14. Multiple NFS Gateways with Load Balancing©MapR Technologies - Confidential 14
  • 15. Multiple NFS Gateways with NFS HA (VIPs)©MapR Technologies - Confidential 15
  • 16. Customer Examples: Import/Export Data Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS Online measurement company – Clickstreams from application servers are streamed into the cluster SaaS company – Exporting a database to Hadoop over NFS Ad exchange – Bids and transactions are streamed into the cluster©MapR Technologies - Confidential 16
  • 17. Customer Examples: Productivity and Operations Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, … Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster Email marketing company – Object store with HBase and NFS©MapR Technologies - Confidential 17
  • 18. ©MapR Technologies - Confidential 18
  • 19. MapR Roadmap: What?©MapR Technologies - Confidential 19
  • 20. ODBC©MapR Technologies - Confidential 20
  • 21. ODBC ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, … Advanced ODBC drivers use the latest 3.52 specification©MapR Technologies - Confidential 21
  • 22. MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux Enables direct SQL access to MapR-stored data by translating SQL to HiveQL SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query©MapR Technologies - Confidential 22
  • 23. Example: Tableau©MapR Technologies - Confidential 23
  • 24. Example: Tableau©MapR Technologies - Confidential 24
  • 25. Example: Open source query builder (Kaimon)©MapR Technologies - Confidential 25
  • 26. Example: Microsoft Excel©MapR Technologies - Confidential 26
  • 27. Time for Questions Download slides or send me an email – http://info.mapr.com/Japan-HUG-8-2012 Download MapR to learn more – www.mapr.com/download Contact EMC Greenplum Japan – Yoshiaki Hirabayashi – Yoshiaki.Hirabayashi@emc.com – Akihiko Kusanagi – Akihiko.Kusanagi@emc.com©MapR Technologies - Confidential 27