Your SlideShare is downloading. ×
0
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

696

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
696
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Jean-Pierre König, MeMo News AG OPENING THE TOOL BOX DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP ECOSYSTEM 14.05.12http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
  • 2. Development THE APPLICATIONhttp://www.flickr.com/photos/oskay/2523189273/sizes/l/in/photostream/
  • 3. DevelopmentThe Applicationisa ... • Distributed newsagent • GUI-less Java Application • Spring-based 2-layer architecture • Services and data access objects • Client of Hadoop • Dependencies to Zookeeper and HBase 14.05.12
  • 4. Development(2)We use Maven 3 for • Project structure -Corporate POM & Modules • Dependency Management • Build the artifact Corporate POM global newsagent tools mapred Loader (Client) Infrastructure Model Utils Services Data Access Objects 14.05.12
  • 5. Development MAPREDUCEJOBShttp://www.flickr.com/photos/elasticsoul/61062372/sizes/l/in/photostream/
  • 6. MapReduce6 • Java MR jobs for business processes • Input and output paths either HDFS or HBase • MR job chaining by Azkaban • PIG, HIVE for ad-hoc queries 14.05.12
  • 7. Development HBASEhttp://www.flickr.com/photos/isherwoodchris/6902155937/sizes/l/in/photostream/
  • 8. HBase• HBase Schema Manager • github.com/jkoenig/hbase-schema-manager• Utilities to copy/move/rename column-families and copy complete tables with its data • github.com/memonews/hbase-utils• Stargate REST API without compression • github.com/memonews/hbase-stargate 14.05.12
  • 9. Hadoop, HBase, Zookeeper TESTINGhttp://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
  • 10. HBase• We use the Apache HBaseTestingUtility• It’s in-memory  complete hadoop instance with dfs, zk and hbase• It‘s very slow – conciderlongrunning ITpublicclassConfigurableHBaseClient {protectedstaticHBaseTestingUtility TEST_UTIL;static{ final Configurationconf = HBaseConfiguration.create();conf.addResource("hbase-default-test.xml");try{TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf); } catch (final Exception e) {fail("Couldnot start hadoop mini cluster."); } }} 14.05.12
  • 11. MapReduce• Since business logic involved, we use hadoop- mrunit for testing Map/Reduce Jobs• It’s in-memory testing • Parameterized Mapper/Reducer with a driver@TestpublicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException {final List<DoubleWritable>values = newArrayList<DoubleWritable>();values.add(new DoubleWritable(399287729));this.driver.withInput(newText("de.t-online/nachrichten"), values);this.driver.run(); assertEquals(1, this.driver.getCounters().findCounter(MeMoCounters.SIGNALS_WRITTEN).getValue());} 14.05.12
  • 12. Zookeeper• We use the Apache Zookeeper ClientBase• It‘s not in-memory but against the staging cluster • Prefix paths e.g.: /test/memo/subscribers@TestpublicvoidgetNumberOfSubscribersShouldSetWatchFlag()throwsKeeperException,InterruptedException{ final SubscriberDaoImplsubscriberDao =newSubscriberDaoImpl(zookeeperDao, DIR, null);subscriberDao.getNumberOfSubscribers(listener);verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao));} 14.05.12
  • 13. Deployment THE APPLICATIONhttp://www.flickr.com/photos/navalsurfaceforces/5553412190/sizes/l/in/photostream/
  • 14. The Application• Automated build and restart via capistrano• Build on every machine • There is a .m2 repository everywhereset :deploy_to, "/usr/share/memo-newsagent“set:keep_releases, 1after "deploy:setup" dorun "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/" ...endafter "deploy:update_code" dorun "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log"endafter "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start" 14.05.12
  • 15. Deployment MAPREDUCE JOBShttp://www.flickr.com/photos/navalsurfaceforces/6257239933/sizes/l/in/photostream/
  • 16. Map Reduce Jobs• We use a Maven HadoopPluginhadoop:pack a la mvn:packagehadoop:deploy HDFS and target folder• All dependencies packed-in  Careful: Huge JARs without dependency managementsee github.com/memonews/maven-hadoop 14.05.12
  • 17. DevOps OTHER TOOLS IN USEhttp://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
  • 18. Other Tools• Staging environment in-house, 1 to 1 copy from production (virtualized)• Azkaban for MR job scheduling• Jenkins for (Integration-) Tests and Metrics• GIT• Icinga for Monitoring & Alerting• Ganglia / Graphite for Hadoop Metrics• Fliwi for automated cluster provisioning 14.05.12
  • 19. jean-pierre.koenig@menonews.comTHANKS!

×