Uploaded on

Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,126
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
13
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Using Standard File-Based Applications and SQL-Based Tools with Hadoop©MapR Technologies - Confidential 1
  • 2. http://info.mapr.com/Japan-HUG-8-2012 Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies©MapR Technologies - Confidential 2
  • 3. The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution – Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr©MapR Technologies - Confidential 3
  • 4. Recent News Amazon selects MapR to provide the enterprise-grade Hadoop distribution in EMR Google selects MapR to provide Hadoop on Google Compute Engine MapR launched open source Apache Drill project inspired by Google Dremel – Low latency queries©MapR Technologies - Confidential 4
  • 5. MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation©MapR Technologies - Confidential 5
  • 6. Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs©MapR Technologies - Confidential 6
  • 7. Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders©MapR Technologies - Confidential 7
  • 8. NFS©MapR Technologies - Confidential 8
  • 9. Your Data is Your Data HDFS-based Hadoop distributions do not (cannot) support NFS Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries?©MapR Technologies - Confidential 9
  • 10. The NFS Protocol RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; Very simple protocol offset3 offset; count3 count; stable_how stable; Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset; HDFS does not support count3 count; random writes so it }; cannot support NFS©MapR Technologies - Confidential 10
  • 11. S3 o.a.h.fs.s3native.NativeS3FileSystem©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem11 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  • 12. One NFS Gateway©MapR Technologies - Confidential 12
  • 13. Multiple NFS Gateways©MapR Technologies - Confidential 13
  • 14. Multiple NFS Gateways with Load Balancing©MapR Technologies - Confidential 14
  • 15. Multiple NFS Gateways with NFS HA (VIPs)©MapR Technologies - Confidential 15
  • 16. Customer Examples: Import/Export Data Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS Online measurement company – Clickstreams from application servers are streamed into the cluster SaaS company – Exporting a database to Hadoop over NFS Ad exchange – Bids and transactions are streamed into the cluster©MapR Technologies - Confidential 16
  • 17. Customer Examples: Productivity and Operations Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, … Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster Email marketing company – Object store with HBase and NFS©MapR Technologies - Confidential 17
  • 18. ©MapR Technologies - Confidential 18
  • 19. MapR Roadmap: What?©MapR Technologies - Confidential 19
  • 20. ODBC©MapR Technologies - Confidential 20
  • 21. ODBC ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, … Advanced ODBC drivers use the latest 3.52 specification©MapR Technologies - Confidential 21
  • 22. MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux Enables direct SQL access to MapR-stored data by translating SQL to HiveQL SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query©MapR Technologies - Confidential 22
  • 23. Example: Tableau©MapR Technologies - Confidential 23
  • 24. Example: Tableau©MapR Technologies - Confidential 24
  • 25. Example: Open source query builder (Kaimon)©MapR Technologies - Confidential 25
  • 26. Example: Microsoft Excel©MapR Technologies - Confidential 26
  • 27. Time for Questions Download slides or send me an email – http://info.mapr.com/Japan-HUG-8-2012 Download MapR to learn more – www.mapr.com/download Contact EMC Greenplum Japan – Yoshiaki Hirabayashi – Yoshiaki.Hirabayashi@emc.com – Akihiko Kusanagi – Akihiko.Kusanagi@emc.com©MapR Technologies - Confidential 27