Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

HUG slides on NFS and ODBC

530
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
530
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using  Standard  File-­‐Based   Applica4ons  and  SQL-­‐Based   Tools  with  Hadoop  ©MapR  Technologies   1  
  • 2. Who  am  I?   hBp://www.mapr.com/company/events/ speaking/dc-­‐hug-­‐9-­‐18-­‐12  §  Keys  Botzum  §  kbotzum@maprtech.com  §  Senior  Principal  Technologist,  MapR  Technologies  ©MapR  Technologies   2  
  • 3. The  MapR  Distribu4on  for  Apache  Hadoop  §  The  open,  enterprise-­‐grade  distribuLon  for  Apache  Hadoop   –  Open  source  components   •  Hive,  Pig,  Cascading,  HBase,  ZooKeeper,  Oozie,  Flume,  Sqoop,  Whirr,  …   –  Enhancements  to  make  Hadoop  more  open  and  enterprise-­‐grade  §  Growing  fast  and  a  recognized  leader  ©MapR  Technologies   3  
  • 4. MapR  in  the  Cloud    §  Available  as  a  service  with  Amazon  ElasLc  MapReduce  (EMR)   –  hBp://aws.amazon.com/elasLcmapreduce/mapr     §  Available  as  a  service  with  Google  Compute  Engine    ©MapR  Technologies   4  
  • 5. MapR   Make  Hadoop   Make  Hadoop   more  open   enterprise-­‐grade   •  High  Availability   •  Scalability   •  Management  tools  –  Web,  CLI,  REST   This  presentaLon   •  Data  ProtecLon  –  snapshots  &  mirroring   •  Performance  ©MapR  Technologies   5  
  • 6. Not  All  Applica4ons  Use  the  Hadoop  APIs   ApplicaLons  and   libraries  that  use  files   and/or  SQL   •  These  are  not  legacy   30  years   applicaLons,  they  are   100,000s  applicaLons   valuable  applicaLons   10,000s  libraries   10s  programming  languages     ApplicaLons  and   libraries  that  use  the   Hadoop  APIs    ©MapR  Technologies   6  
  • 7. Hadoop  Needs  Industry-­‐Standard  Interfaces   Hadoop   •  MapReduce  and  HBase  applicaLons   API   •  Mostly  custom-­‐built   •  File-­‐based  applicaLons   NFS   •  Supported  by  most  operaLng  systems   •  SQL-­‐based  tools   ODBC   •  Supported  by  most  BI  applicaLons  and   query  builders  ©MapR  Technologies   7  
  • 8. NFS  ©MapR  Technologies   8  
  • 9. Your  Data  is  Important  §  HDFS-­‐based  Hadoop  distribuLons  do  not  (cannot)   properly  support  NFS  §  Your  data  is  important,  it  drives  your  business  –  make   sure  you  can  access  it   –  Why  store  your  data  in  a  system  which  cannot  be  accessed   by  95%  of  the  world’s  applicaLons  and  libraries?  §  Access  to  HDFS  source  code  !=  access  to  your  data  ©MapR  Technologies   9  
  • 10. The  NFS  Protocol  §  RFC  1813   WRITE3res  NFSPROC3_WRITE(WRITE3args)  =  7;     struct  WRITE3args  {          nfs_fh3          file;  §  Very  simple  protocol          offset3          offset;          count3            count;          stable_how    stable;  §  Random  reads/writes          opaque            data<>;   –  Read  count  bytes  from   };   offset  offset  of  file  file     READ3res  NFSPROC3_READ(READ3args)  =  6;   –  Write  buffer  data  to       offset  offset  of  a  file  file   struct  READ3args  {          nfs_fh3    file;          offset3    offset;  §  HDFS  does  not  support          count3      count;   random  writes  so  it   };   cannot  support  NFS    ©MapR  Technologies   10  
  • 11. S3   o.a.h.fs.s3naLve.NaLveS3FileSystem  ©MapR  Technologies   HDFS   o.a.h.hdfs.DistributedFileSystem   Local  File  System   Storage  Layers   o.a.h.fs.LocalFileSystem   MapReduce   FTP   o.a.h.fs.qp.FTPFileSystem  11   MapR  storage  layer   o.a.h.fs.FileSystem  Interface   com.mapr.fs.MapRFileSystem   Hadoop   Hadoop  Was  Designed  to  Support  Mul4ple   NFS  interface   FileSystem  API  
  • 12. One  NFS  Gateway   What  about  scalability  and  high  availability?  ©MapR  Technologies   12  
  • 13. Mul4ple  NFS  Gateways  ©MapR  Technologies   13  
  • 14. Mul4ple  NFS  Gateways  with  Load  Balancing  ©MapR  Technologies   14  
  • 15. Mul4ple  NFS  Gateways  with  NFS  HA  (VIPs)  ©MapR  Technologies   15  
  • 16. Customer  Examples:  Import/Export  Data  §  Network  security  vendor   –  Network  packet  captures  from  switches  are  streamed  into  the  cluster   –  New  paBern  definiLons  are  loaded  into  online  IPS  via  NFS  §  Online  measurement  company   –  Clickstreams  from  applicaLon  servers  are  streamed  into  the  cluster  §  SaaS  company   –  ExporLng  a  database  to  Hadoop  over  NFS  §  Ad  exchange   –  Bids  and  transacLons  are  streamed  into  the  cluster  ©MapR  Technologies   16  
  • 17. Customer  Examples:  Produc4vity  and  Opera4ons  §  Retailer   –  OperaLonal  scripts  are  easier  with  NFS  than  HDFS  +  MapReduce   •  chmod/chown,  file  system  searches/greps,  perl,  awk,  tab-­‐complete   –  Consolidate  object  store  with  analyLcs  §  Credit  card  company   –  User  and  project  home  directories  on  Linux  gateways   •  Local  files,  scripts,  source  code,  …   •  Administrators  manage  quotas,  snapshots/backups,  …  §  Large  Internet  company  recommendaLon  system   –  Web  server  serve  MapReduce  results    (item  relaLonships)  directly  from  cluster  §  Email  markeLng  company   –  Object  store  with  HBase  and  NFS  ©MapR  Technologies   17  
  • 18. ODBC  ©MapR  Technologies   18  
  • 19. ODBC  §  ODBC  –  Open  DataBase  ConnecLvity   –  Open  standard  API  for  accessing  a  SQL-­‐based  backend   –  Developed  by  Microsoq  and  Simba  Technologies  in  1992  §  Flagship  API  for  SQL-­‐based  BI  and  reporLng   –  Excel,  Tableau,  MicroStrategy,  Crystal  Reports,  …  §  Advanced  ODBC  drivers  use  the  latest  3.52  specificaLon  ©MapR  Technologies   19  
  • 20. MapR  ODBC  Driver  §  MapR  provides  a  Hive  ODBC  3.52  driver   –  Developed  in  partnership  with  ODBC  inventor  Simba  Technologies   –  Compliant  with  latest  ODBC  3.52  specificaLon   •  32-­‐  and  64-­‐bit  plavorm  support   •  Windows  and  Linux  §  Enables  direct  SQL  access  to  MapR-­‐stored  data  by  translaLng  SQL  to   HiveQL  §  SQLizer  enables  seamless  connecLvity   –  Provides  ANSI  SQL-­‐92  front-­‐end   –  Targeted  for  exisLng  apps  that  generate  standard  SQL  queries   –  Transforms  SQL  query  into  HiveQL  query  ©MapR  Technologies   20  
  • 21. Example:  Tableau  ©MapR  Technologies   21  
  • 22. Example:  Open  source  query  builder  (Kaimon)  ©MapR  Technologies   22  
  • 23. Example:  MicrosoW  Excel  ©MapR  Technologies   23  
  • 24. In  Summary  §  Open  standards  are  important  §  SupporLng  exisLng  applicaLons  and  tools  that  support  those   standards  is  valuable   –  Preserves  investment  in  tools   –  Preserves  investment  in  custom  applicaLons  that  proceeded  Hadoop   –  Leverages  skills  you  already  have  ©MapR  Technologies   24  
  • 25. Join  MapR  §  Join  the  fastest  growing  Hadoop  company  §  Open  posiLons  in  every  discipline   –  Engineers   –  SoluLon  Architects   –  Product  Management  §  Email  jobs@mapr.com  ©MapR  Technologies   25  
  • 26. Time  for  Ques4ons  §  Download  slides  or  send  me  an  email   –  hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12    §  Download  MapR  to  learn  more   –  www.mapr.com/download  ©MapR  Technologies   26