More Related Content Similar to July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop (20) More from Yahoo Developer Network (20) July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop3. The MapR Distribution for Apache Hadoop
The open, enterprise-grade distribution for Apache Hadoop
– Open source components
• Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …
– Enhancements to make Hadoop more open and enterprise-grade
Fastest growing distribution
– Thousands of clusters deployed
Now available as a service with Amazon Elastic MapReduce (EMR)
– http://aws.amazon.com/elasticmapreduce/mapr
©MapR Technologies - Confidential 3
4. MapR
Make Hadoop Make Hadoop
more open enterprise-grade
This presentation
©MapR Technologies - Confidential 4
5. Not All Applications Use the Hadoop APIs
Applications and libraries
that use files and/or SQL
30 years
100,000s applications
10,000s libraries
10s programming languages
Applications and libraries
that use the Hadoop APIs
©MapR Technologies - Confidential 5
6. Hadoop Needs Industry-Standard Interfaces
Hadoop • MapReduce and HBase applications
API • Mostly custom-built
• File-based applications
NFS • Supported by most operating systems
• SQL-based tools
ODBC • Supported by most BI applications and
query builders
©MapR Technologies - Confidential 6
8. Your Data is Your Data
HDFS-based Hadoop distributions do not (cannot)
support NFS
Your data is your data – make sure you can access it
– Why store your data in a system which cannot be accessed
by 95% of the world’s applications and libraries?
Access to HDFS source code != access to your data
©MapR Technologies - Confidential 8
9. The NFS Protocol
RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;
struct WRITE3args {
nfs_fh3 file;
Very simple protocol offset3 offset;
count3 count;
stable_how stable;
Random reads/writes opaque data<>;
– Read count bytes from };
offset offset of file file
– Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6;
offset offset of a file file
struct READ3args {
nfs_fh3 file;
offset3 offset;
HDFS does not support count3 count;
random writes so it };
cannot support NFS
©MapR Technologies - Confidential 9
10. S3
o.a.h.fs.s3native.NativeS3FileSystem
©MapR Technologies - Confidential
HDFS
o.a.h.hdfs.DistributedFileSystem
Storage Layers
Local File System
o.a.h.fs.LocalFileSystem
MapReduce
FTP
o.a.h.fs.ftp.FTPFileSystem
10
MapR storage layer
o.a.h.fs.FileSystem Interface
com.mapr.fs.MapRFileSystem
Hadoop Was Designed to Support Multiple
Hadoop
NFS interface
FileSystem API
15. Customer Examples: Import/Export Data
Network security vendor
– Network packet captures from switches are streamed into the cluster
– New pattern definitions are loaded into online IPS via NFS
Online measurement company
– Clickstreams from application servers are streamed into the cluster
SaaS company
– Exporting a database to Hadoop over NFS
Ad exchange
– Bids and transactions are streamed into the cluster
©MapR Technologies - Confidential 15
16. Customer Examples: Productivity and Operations
Retailer
– Operational scripts are easier with NFS than DFS + MapReduce
• chmod/chown, file system searches/greps, make, tab-complete
– Consolidate object store with analytics
Credit card company
– User and project home directories on Linux gateways
• Local files, scripts, source code, …
• Administrators manage quotas, snapshots/backups, …
Large Internet company
– Web server serve MapReduce results (item relationships) directly from cluster
Email marketing company
– Object store with HBase and NFS
©MapR Technologies - Confidential 16
18. ODBC
ODBC – Open DataBase Connectivity
– Open standard API for accessing a SQL-based backend
– Developed by Microsoft and Simba Technologies in 1992
Flagship API for SQL-based BI and reporting
– Excel, Tableau, MicroStrategy, Crystal Reports, …
Advanced ODBC drivers use the latest 3.52 specification
©MapR Technologies - Confidential 18
19. MapR ODBC Driver
MapR provides a Hive ODBC 3.52 driver
– Developed in partnership with ODBC inventor Simba Technologies
– Compliant with latest ODBC 3.52 specification
• 32- and 64-bit platform support
• Windows and Linux
Enables direct SQL access to MapR-stored data by translating SQL to
HiveQL
SQLizer enables seamless connectivity
– Provides ANSI SQL-92 front-end
– Targeted for existing apps that generate standard SQL queries
– Transforms SQL query into HiveQL query
©MapR Technologies - Confidential 19
24. Join MapR
Join the fastest growing Hadoop company
Open positions in every discipline
– Engineers
– Solution Architects
– Product Management
Email jobs@mapr.com
©MapR Technologies - Confidential 24
25. Time for Questions
Download slides or send me an email
– http://info.mapr.com/HUG-7-2012
Download MapR to learn more
– www.mapr.com/download
©MapR Technologies - Confidential 25