Hadoop in a Windows Shop            Abuna Demoz – Abuna@AdGooroo.com            Brad Vah – Bvah@AdGooroo.com            Mi...
Who Is AdGooroo?• Founded in 2004• We are the largest provider of Search Intelligence in the world• Our customers include:...
AdGooroo Insight Suite™©2012 AdGooroo, LLC. All Rights Reserved.
Paid SearchNatural Search
Why we deployed Hadoop
Hadoop Administration
Learning Curve• Where is Hadoop going to fit?• How do we leverage existing tools?• Linux can be less forgiving  – rm –rf /...
Integration Points• Active Directory != LDAP• Create a seamless user experience• Domjoin in 30 simple steps  – Tip: It’s u...
Integration Points – Data Transfer• SMB works…mostly  – Flaky connectivity  – Relatively slow transfer for GigE• NFS  – Cl...
Integration Points – Data Transfer• MountableHDFS/HDFS_Fuse  – Fuse -> NFS -> Windows    • We tried it. You should not.  –...
Monitoring and Management• Operations Manager (MOM/SCOM)  – Native Linux monitoring  – Custom Management packs for Hadoop•...
Final Thoughts• Hadoop and Windows can live together.• Microsoft is starting to figure out this  whole “open-source” thing...
Hadoop Development
Environments• Windows  – Visual Studio, SQL Server, etc  – Physical workstations• Linux  – Getting reacquainted with an ol...
Languages• Java  – Straightforward transition from the .NET world  – Hmm…How do I create that JAR again?• Python/Bash  – U...
Unexpected Roadblocks - AVRO• Assumption:  – Works with .NET     • Can serialize files to be read by Java Map/Reduce• Real...
Unexpected Roadblocks – Flume• Assumption:  – We’ll use Flume for Windows• Reality:  – Overkill for our needs  – Implement...
Unexpected Roadblocks – Thrift• Assumption:  – We’ll use Thrift to talk to HBase from .NET• Reality:  – HBase.thrift does ...
As Advertised - Sqoop• Simple• Fast route to POC  – Imports  – Exports• Minor “gotchas”  – Delimiters  – Large exports to ...
As Advertised - Hive• Very similar to SQL• “Quick” data analysis  – Results without crippling your existing RDBMS• HBase s...
Final Thoughts• Don’t overthink it!  – Just because you can doesn’t mean you should• Modularity  – Easy to be overwhelmed ...
We’re Hiringjobs@adgooroo.comabuna@adgooroo.combvah@adgooroo.commschiro@adgooroo.com
Upcoming SlideShare
Loading in …5
×

Hadoop in a Windows Shop - CHUG - 20120416

1,241 views

Published on

View the accompanying video on vimeo: https://vimeo.com/40781385

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Hadoop in a Windows Shop - CHUG - 20120416

  1. 1. Hadoop in a Windows Shop Abuna Demoz – Abuna@AdGooroo.com Brad Vah – Bvah@AdGooroo.com Mike Schiro – Mschiro@AdGooroo.com Twitter: @AdGooroo @abuna
  2. 2. Who Is AdGooroo?• Founded in 2004• We are the largest provider of Search Intelligence in the world• Our customers include: – Agencies – CMOs – Marketing Managers – Digital Ad Sales – Over 4,000 users• Global Scale – 50 Countries – 14 Search Engines – 14 Ad Networks
  3. 3. AdGooroo Insight Suite™©2012 AdGooroo, LLC. All Rights Reserved.
  4. 4. Paid SearchNatural Search
  5. 5. Why we deployed Hadoop
  6. 6. Hadoop Administration
  7. 7. Learning Curve• Where is Hadoop going to fit?• How do we leverage existing tools?• Linux can be less forgiving – rm –rf /*• Who names these things?
  8. 8. Integration Points• Active Directory != LDAP• Create a seamless user experience• Domjoin in 30 simple steps – Tip: It’s usually safe to blame Kerberos
  9. 9. Integration Points – Data Transfer• SMB works…mostly – Flaky connectivity – Relatively slow transfer for GigE• NFS – Client Services for NFS – Much faster transfer speeds
  10. 10. Integration Points – Data Transfer• MountableHDFS/HDFS_Fuse – Fuse -> NFS -> Windows • We tried it. You should not. – SCP (Windows) -> NFS -> Fuse • Messy, but it works. • Don’t often need to use it
  11. 11. Monitoring and Management• Operations Manager (MOM/SCOM) – Native Linux monitoring – Custom Management packs for Hadoop• Opalis – Workflow automation• Configuration Manager (SCCM) – Quest Management Xtensions for *nix
  12. 12. Final Thoughts• Hadoop and Windows can live together.• Microsoft is starting to figure out this whole “open-source” thing. – MSSQL connectors for Hadoop – ODBC driver for Hive – Interop initiatives• When in doubt; blame Kerberos.• Roll your own repo.
  13. 13. Hadoop Development
  14. 14. Environments• Windows – Visual Studio, SQL Server, etc – Physical workstations• Linux – Getting reacquainted with an old friend – New suite of tools – Cloudera VM • RAMRAMRAMRAMRAMRAMRAMRAMRAM
  15. 15. Languages• Java – Straightforward transition from the .NET world – Hmm…How do I create that JAR again?• Python/Bash – Utilized a lot more than expected• HiveQL – Simple transition from SQL – Custom UDFs
  16. 16. Unexpected Roadblocks - AVRO• Assumption: – Works with .NET • Can serialize files to be read by Java Map/Reduce• Reality: – .NET compatibility not fully baked • Any files written in .NET could not be read in Java. – C# side is not reading nor writing the header – JIRA: AVRO-823
  17. 17. Unexpected Roadblocks – Flume• Assumption: – We’ll use Flume for Windows• Reality: – Overkill for our needs – Implementation woes• Solution: – Custom log collector service – Converts data to AVRO file
  18. 18. Unexpected Roadblocks – Thrift• Assumption: – We’ll use Thrift to talk to HBase from .NET• Reality: – HBase.thrift does not support C# yet• Solution: – Convert Thrift Java code-gen to .NET • Some community work already done here (https://bitbucket.org/vadim/hbase-sharp)
  19. 19. As Advertised - Sqoop• Simple• Fast route to POC – Imports – Exports• Minor “gotchas” – Delimiters – Large exports to SQL Server • Use “--batch” mode
  20. 20. As Advertised - Hive• Very similar to SQL• “Quick” data analysis – Results without crippling your existing RDBMS• HBase storage handler – provides easy point of entry to data and data manipulation
  21. 21. Final Thoughts• Don’t overthink it! – Just because you can doesn’t mean you should• Modularity – Easy to be overwhelmed by all the moving parts – Flatten the learning curve by taking it one piece at a time
  22. 22. We’re Hiringjobs@adgooroo.comabuna@adgooroo.combvah@adgooroo.commschiro@adgooroo.com

×