• Save
Hadoop in a Windows Shop - CHUG - 20120416
Upcoming SlideShare
Loading in...5

Hadoop in a Windows Shop - CHUG - 20120416



View the accompanying video on vimeo: https://vimeo.com/40781385

View the accompanying video on vimeo: https://vimeo.com/40781385



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Insight Suite all

Hadoop in a Windows Shop - CHUG - 20120416 Hadoop in a Windows Shop - CHUG - 20120416 Presentation Transcript

  • Hadoop in a Windows Shop Abuna Demoz – Abuna@AdGooroo.com Brad Vah – Bvah@AdGooroo.com Mike Schiro – Mschiro@AdGooroo.com Twitter: @AdGooroo @abuna
  • Who Is AdGooroo?• Founded in 2004• We are the largest provider of Search Intelligence in the world• Our customers include: – Agencies – CMOs – Marketing Managers – Digital Ad Sales – Over 4,000 users• Global Scale – 50 Countries – 14 Search Engines – 14 Ad Networks
  • AdGooroo Insight Suite™©2012 AdGooroo, LLC. All Rights Reserved.
  • Paid SearchNatural Search
  • Why we deployed Hadoop
  • Hadoop Administration
  • Learning Curve• Where is Hadoop going to fit?• How do we leverage existing tools?• Linux can be less forgiving – rm –rf /*• Who names these things?
  • Integration Points• Active Directory != LDAP• Create a seamless user experience• Domjoin in 30 simple steps – Tip: It’s usually safe to blame Kerberos
  • Integration Points – Data Transfer• SMB works…mostly – Flaky connectivity – Relatively slow transfer for GigE• NFS – Client Services for NFS – Much faster transfer speeds
  • Integration Points – Data Transfer• MountableHDFS/HDFS_Fuse – Fuse -> NFS -> Windows • We tried it. You should not. – SCP (Windows) -> NFS -> Fuse • Messy, but it works. • Don’t often need to use it
  • Monitoring and Management• Operations Manager (MOM/SCOM) – Native Linux monitoring – Custom Management packs for Hadoop• Opalis – Workflow automation• Configuration Manager (SCCM) – Quest Management Xtensions for *nix
  • Final Thoughts• Hadoop and Windows can live together.• Microsoft is starting to figure out this whole “open-source” thing. – MSSQL connectors for Hadoop – ODBC driver for Hive – Interop initiatives• When in doubt; blame Kerberos.• Roll your own repo.
  • Hadoop Development
  • Environments• Windows – Visual Studio, SQL Server, etc – Physical workstations• Linux – Getting reacquainted with an old friend – New suite of tools – Cloudera VM • RAMRAMRAMRAMRAMRAMRAMRAMRAM
  • Languages• Java – Straightforward transition from the .NET world – Hmm…How do I create that JAR again?• Python/Bash – Utilized a lot more than expected• HiveQL – Simple transition from SQL – Custom UDFs
  • Unexpected Roadblocks - AVRO• Assumption: – Works with .NET • Can serialize files to be read by Java Map/Reduce• Reality: – .NET compatibility not fully baked • Any files written in .NET could not be read in Java. – C# side is not reading nor writing the header – JIRA: AVRO-823
  • Unexpected Roadblocks – Flume• Assumption: – We’ll use Flume for Windows• Reality: – Overkill for our needs – Implementation woes• Solution: – Custom log collector service – Converts data to AVRO file
  • Unexpected Roadblocks – Thrift• Assumption: – We’ll use Thrift to talk to HBase from .NET• Reality: – HBase.thrift does not support C# yet• Solution: – Convert Thrift Java code-gen to .NET • Some community work already done here (https://bitbucket.org/vadim/hbase-sharp)
  • As Advertised - Sqoop• Simple• Fast route to POC – Imports – Exports• Minor “gotchas” – Delimiters – Large exports to SQL Server • Use “--batch” mode
  • As Advertised - Hive• Very similar to SQL• “Quick” data analysis – Results without crippling your existing RDBMS• HBase storage handler – provides easy point of entry to data and data manipulation
  • Final Thoughts• Don’t overthink it! – Just because you can doesn’t mean you should• Modularity – Easy to be overwhelmed by all the moving parts – Flatten the learning curve by taking it one piece at a time
  • We’re Hiringjobs@adgooroo.comabuna@adgooroo.combvah@adgooroo.commschiro@adgooroo.com