SlideShare a Scribd company logo
1 of 35
Download to read offline
Continuous Integration
on top of Hadoop
Wisely Chen and Neal Lee
Saturday, August 3, 13
Agenda
• Who I am
• Problem
• Solution
• Demo
• Q&A
Saturday, August 3, 13
Who I am
• Wisely Chen ( thegiive@gmail.com )
• Release manager of Yahoo![Taiwan] shopping and data team
• Loves to promote open source tech in Taiwan
• Hadoop Summit 2013 San Jose
• Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007
• Puppet : PHPConf 2012 , RubyConf 2012
• Release Practice :Webconf 2013, Coscup 2012
Saturday, August 3, 13
Who I am
• Neal Lee (@neal_lee)
• Data Engineer at Yahoo![Taiwan] data team
• Aims to build a easy to use self-service BI platform
connecting to Hadoop.
Saturday, August 3, 13
EC Data Team
拍賣/商城/購物中心
站台
流量/點擊/使用者行為 追蹤
Transactional
data
Tracking data
Data
Highway
Data Warehouse/
Data Mart
Data
Infra BI
Platform
Report
Recommendation
API
Machine
Learning
Serve
Saturday, August 3, 13
Problem : Debug
Saturday, August 3, 13
Problem : Performance
Saturday, August 3, 13
Solution
Saturday, August 3, 13
Continuous Integration
Saturday, August 3, 13
Continuous Integration
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Saturday, August 3, 13
We focus on
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commits to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Saturday, August 3, 13
CI on Hadoop Flow
Code
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
One Click Deploy
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Toolset
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Vaidya
BASH
Saturday, August 3, 13
System diagram
CI Master
GitHub
Alpha
CI Slave
Beta Cluster
Hadoop
JobTracker
CI Slave Hadoop
node
Hadoop
node
Hadoop
node
Hadoop
node
Slave
Node
Prod ClusterGateway
Saturday, August 3, 13
Unit Test
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
PigUnit
• A simple xUnit framework
• No cluster set up is required in local mode
• Unit testing, regression testing, and rapid
prototyping on the fly
Saturday, August 3, 13
Using PigUnit
• After
• Coding
• Write PigUnit test case
• Run local PigUnit test
• Push to cluster
• Run Pig on cluster
• Get right result !
• Before
• Coding
• Manual local test
• Push to cluster
• Run Pig on cluster
• Get right result !
Saturday, August 3, 13
Unit test is live doc
• Unit test is runnable live doc
• Pass test case and meet previous
requirement
Saturday, August 3, 13
Flexible
• Pig can use PigUnit
• MapReduce can use MapUnit
• Hive can use hive_test
Saturday, August 3, 13
Performance Test
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Vaidya
• Rule based performance diagnosis of M/R jobs
• Extensible framework
• You can add your own rules
• Write complex rules using existing rules
Saturday, August 3, 13
Performance Test
Pig Job
Pig Job
History
Vaidya
Vaidya
Rule
4
Pig Job
Conf
Notify
User
3
Performance
result
Next CI
Stage
1
1
2
2
2
5
1. Exec pig job with sampling data on beta server
2. Vaidya read job history,conf,rule
to check performance problem
3. If ok, create performance result
4. If job has performance issue,
notify user
5. Go to next CI stage
Sampling
data
1
Saturday, August 3, 13
Vaidya Rule<Diagnos)cTest>
<Title><![CDATA[Balanaced Reduce Partitioning]]></Title>
	
  <ClassName>
	
  	
  	
  	
  <![CDATA[
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  org.apache.hadoop.vaidya.postexdiagnosis.tests.BalancedReducePar77oning
	
  	
  	
  	
  ]]>
</ClassName>
<Descrip)on>
	
  	
  	
  	
  	
  <![CDATA[This	
  rule	
  tests	
  as	
  to	
  how	
  well	
  the	
  input	
  to	
  reduce	
  tasks	
  is	
  balanced]]>
</Descrip)on>
<Importance><![CDATA[High]]></Importance>
<SuccessThreshold><![CDATA[0.20]]></SuccessThreshold>
<Prescrip)on><![CDATA[advice]]></Prescrip)on>
</Diagnos)cTest>
See	
  if	
  the	
  reduce	
  job	
  is	
  
balance	
  or	
  not	
  
Rule	
  importance
Diagnose	
  success	
  
threshold
Test	
  Java	
  Class
Saturday, August 3, 13
Deploy
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Deploy
• Deploy to production cluster
• Easy to rollback
• Create a git tag
• Auto doc generating
• Each release should map to a ticket
• Auto comment in Bugzilla
Saturday, August 3, 13
Auto comment in bugzilla
Repo url
Release
Note
Issue status
change
Saturday, August 3, 13
Auto create git tag
Release Note
[Bug xxx] log....
Git Tag
Saturday, August 3, 13
DEMO
Saturday, August 3, 13
Demo
• Demo1 : Unit test fail
• Demo2 : Unit test success
• Demo3 : Check performance test
• Demo4 :Auto generate Doc
• Demo5 : Notify user
Saturday, August 3, 13
Demo
Saturday, August 3, 13
Conclusion
• CI will revolutionize your workflow
• CI will boost your productivity
Saturday, August 3, 13
Saturday, August 3, 13
Logic Debug
• Map/Reduce	
  job	
  oJen	
  takes	
  a	
  lot	
  of	
  )me	
  for	
  execu)on
• Repeated	
  Map/Reduce	
  execu)on	
  cost	
  	
  a	
  lot	
  of	
  )me	
  
during	
  logic	
  debugging	
  phase
• Need	
  a	
  way	
  to	
  find	
  out	
  logic	
  problem	
  before	
  
execu)on	
  produc)on	
  job
• Coding
Manual
Test
Exec
Get Bug
Saturday, August 3, 13
Performance
• Map/Reduce	
  performance	
  is	
  hard	
  to	
  es)mate	
  before	
  execu)on	
  
• Production Grid computing resource is shared by allYahoos
• Bad performance will affect otherYahoos Grid jobs
• Putting bad performance code on production grid is guilty
• We manually investigate the job performance before we actually execute it
on production Grid
Coding
Manual
Test
Manual
investgate
Get Bug
Saturday, August 3, 13

More Related Content

What's hot

LCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLinaro
 
Java 9 Functionality and Tooling
Java 9 Functionality and ToolingJava 9 Functionality and Tooling
Java 9 Functionality and ToolingTrisha Gee
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
LCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLinaro
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)Linaro
 
Perl Continous Integration
Perl Continous IntegrationPerl Continous Integration
Perl Continous IntegrationMichael Peters
 
Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)David Carr
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...All Things Open
 
Puppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet
 
Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Trisha Gee
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineEduardo Piairo
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web FrameworkDaniel Woods
 
Asynchronous job queues with python-rq
Asynchronous job queues with python-rqAsynchronous job queues with python-rq
Asynchronous job queues with python-rqAshish Acharya
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
 
Extreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryExtreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryAtlassian
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineEduardo Piairo
 

What's hot (20)

LCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node Testing
 
Java 9 Functionality and Tooling
Java 9 Functionality and ToolingJava 9 Functionality and Tooling
Java 9 Functionality and Tooling
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
LCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: Introduction
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
 
Perl Continous Integration
Perl Continous IntegrationPerl Continous Integration
Perl Continous Integration
 
Logstash and friends
Logstash and friendsLogstash and friends
Logstash and friends
 
Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)
 
Nautilus
NautilusNautilus
Nautilus
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
 
Puppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with Puppet
 
Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipeline
 
Test driving-qml
Test driving-qmlTest driving-qml
Test driving-qml
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web Framework
 
Geb Best Practices
Geb Best PracticesGeb Best Practices
Geb Best Practices
 
Asynchronous job queues with python-rq
Asynchronous job queues with python-rqAsynchronous job queues with python-rq
Asynchronous job queues with python-rq
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing one
 
Extreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryExtreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA Story
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipeline
 

Viewers also liked

Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applicationsKnoldus Inc.
 
Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Swiss Big Data User Group
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitIntroduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitEdureka!
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkAnu Shetty
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupQualitest
 

Viewers also liked (7)

Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitIntroduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unit
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing spark
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 

Similar to Coscup 2013 : Continuous Integration on top of hadoop

Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIwajrcs
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP TestingRan Mizrahi
 
Accelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineAccelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineSmartBear
 
Automated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUAutomated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUApplitools
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?Dmitri Shiryaev
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukMarcinStachniuk
 
Heavenly hell – automated tests at scale wojciech seliga
Heavenly hell – automated tests at scale   wojciech seligaHeavenly hell – automated tests at scale   wojciech seliga
Heavenly hell – automated tests at scale wojciech seligaAtlassian
 
So you-want-to-go-faster
So you-want-to-go-fasterSo you-want-to-go-faster
So you-want-to-go-fasterOoblioob
 
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfTest Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfDiana Gray, MBA
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015hirokiky
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionTechWell
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...NETWAYS
 
Automate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopAutomate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopRed Gate Software
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAnand Bagmar
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance TuningDavid Chen
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 
Scalamen and OT
Scalamen and OTScalamen and OT
Scalamen and OTgetch123
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopBrian Christner
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton Araf Karsh Hamid
 

Similar to Coscup 2013 : Continuous Integration on top of hadoop (20)

Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CI
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP Testing
 
Accelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineAccelerating Your Test Execution Pipeline
Accelerating Your Test Execution Pipeline
 
Automated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUAutomated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AU
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin Stachniuk
 
Heavenly hell – automated tests at scale wojciech seliga
Heavenly hell – automated tests at scale   wojciech seligaHeavenly hell – automated tests at scale   wojciech seliga
Heavenly hell – automated tests at scale wojciech seliga
 
So you-want-to-go-faster
So you-want-to-go-fasterSo you-want-to-go-faster
So you-want-to-go-faster
 
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfTest Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live Production
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
 
Automate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopAutomate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City Workshop
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaS
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance Tuning
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
Scalamen and OT
Scalamen and OTScalamen and OT
Scalamen and OT
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 

Recently uploaded

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Recently uploaded (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Coscup 2013 : Continuous Integration on top of hadoop

  • 1. Continuous Integration on top of Hadoop Wisely Chen and Neal Lee Saturday, August 3, 13
  • 2. Agenda • Who I am • Problem • Solution • Demo • Q&A Saturday, August 3, 13
  • 3. Who I am • Wisely Chen ( thegiive@gmail.com ) • Release manager of Yahoo![Taiwan] shopping and data team • Loves to promote open source tech in Taiwan • Hadoop Summit 2013 San Jose • Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007 • Puppet : PHPConf 2012 , RubyConf 2012 • Release Practice :Webconf 2013, Coscup 2012 Saturday, August 3, 13
  • 4. Who I am • Neal Lee (@neal_lee) • Data Engineer at Yahoo![Taiwan] data team • Aims to build a easy to use self-service BI platform connecting to Hadoop. Saturday, August 3, 13
  • 5. EC Data Team 拍賣/商城/購物中心 站台 流量/點擊/使用者行為 追蹤 Transactional data Tracking data Data Highway Data Warehouse/ Data Mart Data Infra BI Platform Report Recommendation API Machine Learning Serve Saturday, August 3, 13
  • 10. Continuous Integration • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Saturday, August 3, 13
  • 11. We focus on • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commits to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Saturday, August 3, 13
  • 12. CI on Hadoop Flow Code Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 13. One Click Deploy Commit Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 15. System diagram CI Master GitHub Alpha CI Slave Beta Cluster Hadoop JobTracker CI Slave Hadoop node Hadoop node Hadoop node Hadoop node Slave Node Prod ClusterGateway Saturday, August 3, 13
  • 16. Unit Test Commit Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 17. PigUnit • A simple xUnit framework • No cluster set up is required in local mode • Unit testing, regression testing, and rapid prototyping on the fly Saturday, August 3, 13
  • 18. Using PigUnit • After • Coding • Write PigUnit test case • Run local PigUnit test • Push to cluster • Run Pig on cluster • Get right result ! • Before • Coding • Manual local test • Push to cluster • Run Pig on cluster • Get right result ! Saturday, August 3, 13
  • 19. Unit test is live doc • Unit test is runnable live doc • Pass test case and meet previous requirement Saturday, August 3, 13
  • 20. Flexible • Pig can use PigUnit • MapReduce can use MapUnit • Hive can use hive_test Saturday, August 3, 13
  • 22. Vaidya • Rule based performance diagnosis of M/R jobs • Extensible framework • You can add your own rules • Write complex rules using existing rules Saturday, August 3, 13
  • 23. Performance Test Pig Job Pig Job History Vaidya Vaidya Rule 4 Pig Job Conf Notify User 3 Performance result Next CI Stage 1 1 2 2 2 5 1. Exec pig job with sampling data on beta server 2. Vaidya read job history,conf,rule to check performance problem 3. If ok, create performance result 4. If job has performance issue, notify user 5. Go to next CI stage Sampling data 1 Saturday, August 3, 13
  • 24. Vaidya Rule<Diagnos)cTest> <Title><![CDATA[Balanaced Reduce Partitioning]]></Title>  <ClassName>        <![CDATA[                      org.apache.hadoop.vaidya.postexdiagnosis.tests.BalancedReducePar77oning        ]]> </ClassName> <Descrip)on>          <![CDATA[This  rule  tests  as  to  how  well  the  input  to  reduce  tasks  is  balanced]]> </Descrip)on> <Importance><![CDATA[High]]></Importance> <SuccessThreshold><![CDATA[0.20]]></SuccessThreshold> <Prescrip)on><![CDATA[advice]]></Prescrip)on> </Diagnos)cTest> See  if  the  reduce  job  is   balance  or  not   Rule  importance Diagnose  success   threshold Test  Java  Class Saturday, August 3, 13
  • 26. Deploy • Deploy to production cluster • Easy to rollback • Create a git tag • Auto doc generating • Each release should map to a ticket • Auto comment in Bugzilla Saturday, August 3, 13
  • 27. Auto comment in bugzilla Repo url Release Note Issue status change Saturday, August 3, 13
  • 28. Auto create git tag Release Note [Bug xxx] log.... Git Tag Saturday, August 3, 13
  • 30. Demo • Demo1 : Unit test fail • Demo2 : Unit test success • Demo3 : Check performance test • Demo4 :Auto generate Doc • Demo5 : Notify user Saturday, August 3, 13
  • 32. Conclusion • CI will revolutionize your workflow • CI will boost your productivity Saturday, August 3, 13
  • 34. Logic Debug • Map/Reduce  job  oJen  takes  a  lot  of  )me  for  execu)on • Repeated  Map/Reduce  execu)on  cost    a  lot  of  )me   during  logic  debugging  phase • Need  a  way  to  find  out  logic  problem  before   execu)on  produc)on  job • Coding Manual Test Exec Get Bug Saturday, August 3, 13
  • 35. Performance • Map/Reduce  performance  is  hard  to  es)mate  before  execu)on   • Production Grid computing resource is shared by allYahoos • Bad performance will affect otherYahoos Grid jobs • Putting bad performance code on production grid is guilty • We manually investigate the job performance before we actually execute it on production Grid Coding Manual Test Manual investgate Get Bug Saturday, August 3, 13