SlideShare a Scribd company logo
1 of 22
Download to read offline
Jenkins World
#JenkinsWorld
Jenkins and Load Sharing Facility (LSF) Enables Rapid
Delivery of Device Driver Software
Brian Vandegriend
Jenkins World
#JenkinsWorld
Jenkins and Load Sharing Facility (LSF) Enables
Rapid Delivery of Device Driver Software
Brian Vandegriend
Product Verification Manager, Microsemi
Twitter: @BVandegriend
Jenkins World
#JenkinsWorld
Overview
How to dynamically allocate hardware evaluation boards across software
developers/testers and Jenkins in order to simplify resource sharing and
increase automated regression throughput
Agenda:
•  Our Build/Test Environment
•  Old/New Methods for Allocating Boards to Users/Jenkins
•  LSF Concepts and Short Overview of the Tool
•  Integrating LSF with Jenkins
•  4 Tips for Increasing Reliability and Throughput
•  Benefits Realized by the Team
Jenkins World
#JenkinsWorld
Our Build and Test Environment
•  Our group develops, tests and supports device
driver software/firmware for Optical/Ethernet
networking SOCs with 100 Gbps ports
–  Device driver written in C with 150Kloc
–  Subversion used for revision control
•  Cloudbees Jenkins Platform is used to
continuously build and test the device driver
–  1 master with 2 slave nodes (VM/bare-metal)
–  Production releases are shipped every 2 to 3 weeks
–  Continuous Delivery: Release process is automated
except for posting to web portal (manual approval)
•  Driver testing is done on lab-based boards
–  Over 500 automated system-level tests that have a
runtime of 200+ board-hours
Packet
Generator /
Monitor
FPGA
Packets
SoC Evaluation / Test board
Intel-based
COMExpress
running Linux
Jenkins World
#JenkinsWorld
Old Method: Manual sharing of boards
Problems with approach:
•  If an engineer is not using their board, it’s hard for other engineers
to use it.
•  If an engineer’s assigned board is used by someone else, they
typically hunt around trying to find a free board.
•  Software regressions can’t take advantage of idle boards and
run in parallel.
10	10	
10	
20	
SW Lab
(Vancouver)
Boards	are	
manually	assigned	
to	engineers	
Regressions	are	
sta2cally	assigned	
…	
25	 Jobs	
4	sites	
50	engineers
Jenkins World
#JenkinsWorld #JenkinsWorld
SW Lab
New Method: Use IBM’s Load Sharing Facility (LSF) tool to
dynamically allocate boards
2 Tool	dispatches	tests	to	free	boards	
LSF	
3
Board	becomes	free	
when	test	finishes	4
Jenkins	uses	lower	priority	queue	
to	make	use	of	idle	boards	
1 User	submits	test	to	queue	(FIF0)	
…	
25	
Jobs	
Advantages of using a queue-based solution, such as LSF:
• Enhanced productivity − users do not have to find/reserve boards. The
system will grant a free board to the user based upon their needs.
• Higher reliability − "problematic" boards are taken offline and the system
will direct jobs to the other boards. Individual users are not impacted.
• Scalability − as users/boards are added, no re-adjustment of board
assignments is necessary.
• Shorter automated testing cycles and higher efficiency of boards
(Board	can	only	run	
one	test	at	a	2me)
Jenkins World
#JenkinsWorld
Comparing 2 Solutions:
Running LSF versus using Jenkins Slaves on Boards
Running LSF on Boards
•  Board resources are treated as one large
resource pool
•  If a board crashes, only 1 test result is
lost
•  Test balancing across jobs is not required
as tests are dynamically allocated to
boards by LSF
•  LSF can allocate all boards to automated
tests when users are not using them
Using boards as Jenkins Slaves
•  Boards are divided into 2 groups: 1 for
Jenkins slaves and 1 for users
•  If a board crashes, all test results are
lost
•  Tests need to be equally partitioned
across jobs to maximize throughput
•  Jenkins can’t take advantage of free
boards that users are not using
LSF	
…	
25	
…	
10	
…	
15	Slaves	LSF	Hosts	
40	
Job	1	
40	
Job	10	 Jenkins	
Scheduler	
.	
.	
.	
400
Jenkins World
#JenkinsWorld
Prerequisites for using LSF for Sharing Boards
•  Boards have a version of Linux installed (CentOs, RedHat, Fedora, and
so on) à LSF sees each board as a Linux Server
•  Boards are fairly homogenous in their configuration/hardware
–  Many different board types will lead to a fragmented pool
o  LSF can easily handle multiple resource types and allocate jobs based on
resource requests
–  For our project, we ensured we had chip fuse overrides that were
controllable through SW
•  Requires users to close their debug sessions when finished to allow the
board to be allocated to the next user
•  Timeouts are enforced by LSF to ensure boards are returned back to the pool
•  Successful adoption by the team relies on individuals to use the
system and not circumvent it by logging directly into boards
8
Jenkins World
#JenkinsWorld
LSF Concepts
•  Each queue can enforce user limits and run times
–  Short queue typically has a job limit of 1, run time of 1 hour and highest priority
–  Long queue typically allows multiple jobs per user and has the lowest priority
•  LSF uses a priority-based, fair-share algorithm to dispatch jobs to hosts
•  Each host has a number of attributes which can be requested
Cluster	
…	
Hosts	
…	
Hosts	 Proj-1	
Proj-2	
Queues	
short	
normal	
long	
Resource	A2ributes	
-	Proj-1	
-	atom_cpu	
-	Num_devices=3	
-	Greenhills	
-	Fedora_Linux	
	
-	Proj-2	
-	i5_cpu	
-	Num_devices=2	
-	Fedora_Linux	
LSF	daemons	run	on	host	
(sbatchd/res/lim)
Jenkins World
#JenkinsWorld
Submitting Jobs using LSF
Running an interactive command on a board through LSF:
Ø  bsub -Ip -q short -R Proj1 echo "hello world!“
Job <623549> is submitted to queue <short>.
<<Starting on board-105>>
hello world!
Request specific resource constraints:
Ø  bsub –R “Proj1 && i5_cpu” –R “num_devices>=2” xterm
Ø  bsub –q long –m board_105 test_cmd ; # requests a specific board
To view job status:
> bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME
549269 vandegr RUN short swbuild01 board-105 xterm
Jenkins World
#JenkinsWorld
Monitoring Jobs/Queues/Hosts using LSF
To view the list of queues and their status:
Ø  bqueues
QUEUE_NAME PRIO STATUS JL/U NJOBS PEND RUN
short 60 Open:Active 1 1 0 1
normal 25 Open:Active 5 7 3 4
long 10 Open:Active 10 33 18 15
To view the host status:
Ø  Bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN
swbuild01 ok - 4 2 2
board-105 closed - 1 1 1
board-106 unavail - 1 0 0
Jenkins World
#JenkinsWorld
Boards/Jobs Administration using LSF
# Print out boards in LSF cluster:
> lshosts
HOST_NAME model cpuf ncpus maxmem maxswp RESOURCES
Board-73 Atom 10.0 1 1.8G 3.9G (proj1 atom)
Board-38 Intel_i5 40.0 2 5.6G 3.9G (proj2 i5)
# Take board offline:
Ø  badmin hclose –C “hardware issue” board-38
Kill/suspend/resume a job running in LSF:
Ø  bkill/bstop/bresume <job_id>
Jenkins World
#JenkinsWorld
Integrating LSF with Jenkins for Automated Testing
SW Lab
2 Script	dispatches	tests	to	LSF	as	individual	
jobs	with	resource	constraints	
LSF	
4 Script	waits	un2l	all		
tests	are	finished	
5
1
Jenkins	job	starts	
run_regression	python	
script	that	creates	test	list	
…	
25	Job	
3 	Script	scans	LSF	status	
and	log	files	to	print	
out	real-2me	status	to	
console	log	
Script	converts	test	log	files	
into	JUnit	format	which	is	
summarized	by	Jenkins
Jenkins World
#JenkinsWorld
Enabling User-based Automated Testing through Jenkins
•  Regression script is also used in
Jenkins for users to run regression
tests against their SW/FW changes
•  Users can submit their project
workspaces to Jenkins for testing
•  Helps ensure that the software trunk
remains “green”
•  Parameterized build is used to pass
variables to the Python script
•  LSF runs user regressions in parallel
with the main (trunk) SW regressions
Making QA regressions available to all developers!
Jenkins World
#JenkinsWorld
Tip: Using LSF to prevent corrupted boards from
killing your automated regression
Problem: Boards in a bad state will causes tests to fail in rapid succession
Solution: Automatically take boards offline that rapidly fail tests:
•  LSF monitors the job exit rate for boards, and closes the board if the rate
exceeds a configurable threshold
–  For example, LSF takes a board offline if 5 tests exit abnormally in less than 30 sec
•  Run a board initialization script prior to running the test
–  LSF executes a pre-execution script that puts the board into a known good state
–  If the board initialization script returns an error code, LSF takes the board offline
and finds a new board for the test
•  Re-queue tests that fail because of a board issue
–  LSF will re-run a test if it exits with a special error code(such as 99)
This technique has increased overall reliability of the system!
Jenkins World
#JenkinsWorld
Tip: Automatically Reboot/Power-cycle Bad Boards
Problem:
•  Boards taken offline by LSF need to be quickly brought back online to maximize
test throughput
Solution:
•  A board monitoring script, which is run periodically by Jenkins, will reset/
power-cycle offline boards and bring them back online
–  If a soft reset fails, then an Ethernet-based remote power switch (from
Digital Loggers) is used to power-cycle the board
–  If power-cycling fails, the Jenkins job fails with an error and e-mail notification is
sent to the administrator
Jenkins World
#JenkinsWorld
Tip: Qualify Boards Prior to using Them in Automated
Regression Testing
Problem:
•  A small percentage of our boards will have manufacturing defects (primarily
due to the SoC devices being removed/re-soldered
•  Boards with hardware faults will cause a small number of tests to fail
intermittently/consistently, which takes a lot of time to track down
Solution:
•  As boards are added, run a battery of working tests with a stable SW release on
each board with 5 to 10 iterations
•  Weed out problematic/failing boards and use the rest for automated testing
–  Use LSF groups to create a group of “golden” boards
Jenkins World
#JenkinsWorld
Tip: Handling Flaky Tests
Problem:
•  Flaky tests (those that can fail or pass with the same software code) make it
difficult for automated regressions to be “green” (no failures)
•  Our system suffers from flaky tests, similar to what is experienced by Google
–  Google Test blog article: “We see a continual rate of about 1.5% of all test runs
reporting a "flaky" result. There are many root causes why tests return flaky
results, including concurrency, relying on non-deterministic or undefined behaviors,
flaky third party code, infrastructure problems, etc.”
Workaround:
•  Regression script re-runs test failures 3 times and if all re-runs have passed,
then the test result is changed from “failed” to “skipped”
–  Test result is only modified if the test has been tagged as a flaky test
–  This method also helps to quickly identify new test failures as consistent or
intermittent failures.
Jenkins World
#JenkinsWorld
Tip: Handling Flaky Tests (continued)
•  Flaky tests in Jenkins can be represented by using the “Skip” column
–  It doesn’t seem possible to have Jenkins recognize new values for the TEST_RESULT
property: <property name="TEST_RESULT" value="SKIP"/>
Jenkins World
#JenkinsWorld
Benefits realized by adopting LSF
•  Our solution using LSF and Jenkins for Automated Regression
testing has:
– Eliminated manual sharing of boards à tedious and inefficient
– Simplifies a users’ experience of finding a free board
– Increased reliability of the system through automatic power-cycling
of corrupted boards and running a board initialization script
– Improved quality of code commits as users run automated regression
tests on their changes through Jenkins/LSF before committing
– Shortened release cycles as automated regressions complete faster
by gaining access to more boards
Jenkins World
#JenkinsWorld
Jenkins World
#JenkinsWorld
Microsemi Corporation (MSCC) offers a comprehensive portfolio of semiconductor and system solutions for communications, defense and security, aerospace
and industrial markets. Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs and ASICs; power
management products; timing and synchronization devices and precise time solutions, setting the world's standard for time; voice processing devices; RF
solutions; discrete components; enterprise storage and communication solutions, security technologies and scalable anti-tamper products; Ethernet solutions;
Power-over-Ethernet ICs and midspans; as well as custom design capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif., and has
approximately 4,800 employees globally.
Learn more at www.microsemi.com
©2016 Microsemi Corporation. All rights reserved. Microsemi and the Microsemi logo are registered trademarks of Microsemi Corporation. All other trademarks and service marks are the property of
their respective owners.  	
Microsemi makes no warranty, representation, or guarantee regarding the information contained herein or the suitability of its products and services for any particular purpose, nor does Microsemi
assume any liability whatsoever arising out of the application or use of any product or circuit. The products sold hereunder and any other products sold by Microsemi have been subject to limited testing
and should not be used in conjunction with mission-critical equipment or applications. Any performance specifications are believed to be reliable but are not verified, and Buyer must conduct and
complete all performance and other testing of the products, alone and together with, or installed in, any end-products. Buyer shall not rely on any data and performance specifications or parameters
provided by Microsemi. It is the Buyer’s responsibility to independently determine suitability of any products and to test and verify the same. The information provided by Microsemi hereunder is
provided “as is, where is” and with all faults, and the entire risk associated with such information is entirely with the Buyer. Microsemi does not grant, explicitly or implicitly, to any party any patent
rights, licenses, or any other IP rights, whether with regard to such information itself or anything described by such information. Information provided in this document is proprietary to Microsemi, and
Microsemi reserves the right to make any changes to the information in this document or to any products and services at any time without notice.
Microsemi Corporate Headquarters
One Enterprise, Aliso Viejo, CA 92656 USA
Within the USA: +1 (800) 713-4113
Outside the USA: +1 (949) 380-6100
Sales: +1 (949) 380-6136
Fax: +1 (949) 215-4996
email: sales.support@microsemi.com
www.microsemi.com	
	
Questions?

More Related Content

What's hot

VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Timesandrewmurraympc
 
Introduction to operating system, system calls and interrupts
Introduction to operating system, system calls and interruptsIntroduction to operating system, system calls and interrupts
Introduction to operating system, system calls and interruptsShivam Mitra
 
Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of VirtualizationSusheel Thakur
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
TI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootTI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootandrewmurraympc
 
Xen and the art of virtualization
Xen and the art of virtualizationXen and the art of virtualization
Xen and the art of virtualizationAbdul417101
 
Novell ZENworks Patch Management Best Practices
Novell ZENworks Patch Management Best PracticesNovell ZENworks Patch Management Best Practices
Novell ZENworks Patch Management Best PracticesNovell
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...gree_tech
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Resultscinedan
 
Device Drivers and Running Modules
Device Drivers and Running ModulesDevice Drivers and Running Modules
Device Drivers and Running ModulesYourHelper1
 
A tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorA tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorLouie Lu
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
 
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)BlueHat Security Conference
 
XPDS16: Hypervisor-based Security: Vicarious Learning via Introspektioneerin...
XPDS16:  Hypervisor-based Security: Vicarious Learning via Introspektioneerin...XPDS16:  Hypervisor-based Security: Vicarious Learning via Introspektioneerin...
XPDS16: Hypervisor-based Security: Vicarious Learning via Introspektioneerin...The Linux Foundation
 

What's hot (19)

VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
 
Introduction to operating system, system calls and interrupts
Introduction to operating system, system calls and interruptsIntroduction to operating system, system calls and interrupts
Introduction to operating system, system calls and interrupts
 
Nachos
NachosNachos
Nachos
 
Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of Virtualization
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
 
TI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootTI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBoot
 
Xen and the art of virtualization
Xen and the art of virtualizationXen and the art of virtualization
Xen and the art of virtualization
 
Novell ZENworks Patch Management Best Practices
Novell ZENworks Patch Management Best PracticesNovell ZENworks Patch Management Best Practices
Novell ZENworks Patch Management Best Practices
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Results
 
Device Drivers and Running Modules
Device Drivers and Running ModulesDevice Drivers and Running Modules
Device Drivers and Running Modules
 
A tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorA tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisor
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
Windows Server 2012 Virtualization: Notes from the Field
Windows Server 2012 Virtualization: Notes from the FieldWindows Server 2012 Virtualization: Notes from the Field
Windows Server 2012 Virtualization: Notes from the Field
 
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
 
XPDS16: Hypervisor-based Security: Vicarious Learning via Introspektioneerin...
XPDS16:  Hypervisor-based Security: Vicarious Learning via Introspektioneerin...XPDS16:  Hypervisor-based Security: Vicarious Learning via Introspektioneerin...
XPDS16: Hypervisor-based Security: Vicarious Learning via Introspektioneerin...
 
Xen Hypervisor
Xen HypervisorXen Hypervisor
Xen Hypervisor
 

Viewers also liked

Compliance as Code - Using the Open Source InSpec testing Framework
Compliance as Code - Using the Open Source InSpec testing FrameworkCompliance as Code - Using the Open Source InSpec testing Framework
Compliance as Code - Using the Open Source InSpec testing FrameworkSonatype
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsSonatype
 
Meetup TestingUY 2017 - Integración Continua con Jenkins + Taurus
Meetup TestingUY 2017 - Integración Continua con Jenkins + TaurusMeetup TestingUY 2017 - Integración Continua con Jenkins + Taurus
Meetup TestingUY 2017 - Integración Continua con Jenkins + TaurusTestingUy
 
Build and deployment with Jenkins and Code Deploy on AWS
Build and deployment with Jenkins and Code Deploy on AWSBuild and deployment with Jenkins and Code Deploy on AWS
Build and deployment with Jenkins and Code Deploy on AWSmitesh_sharma
 
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineEfficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineJules Pierre-Louis
 
Best Practices for Mission-Critical Jenkins
Best Practices for Mission-Critical JenkinsBest Practices for Mission-Critical Jenkins
Best Practices for Mission-Critical Jenkinsmrooney7828
 
Taller evento TestingUY 2016 - Probando la experiencia de usuario
Taller evento TestingUY 2016 - Probando la experiencia de usuarioTaller evento TestingUY 2016 - Probando la experiencia de usuario
Taller evento TestingUY 2016 - Probando la experiencia de usuarioTestingUy
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesSteffen Gebert
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineSlawa Giterman
 
Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Dennys Hsieh
 

Viewers also liked (12)

Compliance as Code - Using the Open Source InSpec testing Framework
Compliance as Code - Using the Open Source InSpec testing FrameworkCompliance as Code - Using the Open Source InSpec testing Framework
Compliance as Code - Using the Open Source InSpec testing Framework
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
 
Meetup TestingUY 2017 - Integración Continua con Jenkins + Taurus
Meetup TestingUY 2017 - Integración Continua con Jenkins + TaurusMeetup TestingUY 2017 - Integración Continua con Jenkins + Taurus
Meetup TestingUY 2017 - Integración Continua con Jenkins + Taurus
 
Build and deployment with Jenkins and Code Deploy on AWS
Build and deployment with Jenkins and Code Deploy on AWSBuild and deployment with Jenkins and Code Deploy on AWS
Build and deployment with Jenkins and Code Deploy on AWS
 
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineEfficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
 
Best Practices for Mission-Critical Jenkins
Best Practices for Mission-Critical JenkinsBest Practices for Mission-Critical Jenkins
Best Practices for Mission-Critical Jenkins
 
Jenkins-CI
Jenkins-CIJenkins-CI
Jenkins-CI
 
Taller evento TestingUY 2016 - Probando la experiencia de usuario
Taller evento TestingUY 2016 - Probando la experiencia de usuarioTaller evento TestingUY 2016 - Probando la experiencia de usuario
Taller evento TestingUY 2016 - Probando la experiencia de usuario
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
 
Jenkins Pipelines
Jenkins PipelinesJenkins Pipelines
Jenkins Pipelines
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
 
Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)Continuous Integration (Jenkins/Hudson)
Continuous Integration (Jenkins/Hudson)
 

Similar to Jenkins and LSF Enable Rapid Delivery of Device Driver Software

JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...
JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...
JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...CloudBees
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent
 
Improving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageImproving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageDr. Spock
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.pptshreesha16
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteLucidworks
 
Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Alexander Shalimov
 
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...CloudBees
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)STePINForum
 
C.Hibbard_Platform_LSF.ppt
C.Hibbard_Platform_LSF.pptC.Hibbard_Platform_LSF.ppt
C.Hibbard_Platform_LSF.pptChris Hibbard
 
Rapid CQ deployments by Jakub Wadolowski
Rapid CQ deployments by Jakub WadolowskiRapid CQ deployments by Jakub Wadolowski
Rapid CQ deployments by Jakub WadolowskiAEM HUB
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recoverysomnathb1
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoAndrea Lombardo
 
Automating Yourself Out of Trouble
Automating Yourself Out of TroubleAutomating Yourself Out of Trouble
Automating Yourself Out of TroubleJose De La Rosa
 

Similar to Jenkins and LSF Enable Rapid Delivery of Device Driver Software (20)

JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...
JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...
JUC Europe 2015: Jenkins-Based Continuous Integration for Heterogeneous Hardw...
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
 
Improving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageImproving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific Language
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
 
Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)
 
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
 
C.Hibbard_Platform_LSF.ppt
C.Hibbard_Platform_LSF.pptC.Hibbard_Platform_LSF.ppt
C.Hibbard_Platform_LSF.ppt
 
Rapid CQ deployments by Jakub Wadolowski
Rapid CQ deployments by Jakub WadolowskiRapid CQ deployments by Jakub Wadolowski
Rapid CQ deployments by Jakub Wadolowski
 
Windows ce
Windows ceWindows ce
Windows ce
 
Lec 1
Lec 1Lec 1
Lec 1
 
Status of Embedded Linux
Status of Embedded LinuxStatus of Embedded Linux
Status of Embedded Linux
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recovery
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea Lombardo
 
Automating Yourself Out of Trouble
Automating Yourself Out of TroubleAutomating Yourself Out of Trouble
Automating Yourself Out of Trouble
 

Jenkins and LSF Enable Rapid Delivery of Device Driver Software

  • 1. Jenkins World #JenkinsWorld Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software Brian Vandegriend
  • 2. Jenkins World #JenkinsWorld Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software Brian Vandegriend Product Verification Manager, Microsemi Twitter: @BVandegriend
  • 3. Jenkins World #JenkinsWorld Overview How to dynamically allocate hardware evaluation boards across software developers/testers and Jenkins in order to simplify resource sharing and increase automated regression throughput Agenda: •  Our Build/Test Environment •  Old/New Methods for Allocating Boards to Users/Jenkins •  LSF Concepts and Short Overview of the Tool •  Integrating LSF with Jenkins •  4 Tips for Increasing Reliability and Throughput •  Benefits Realized by the Team
  • 4. Jenkins World #JenkinsWorld Our Build and Test Environment •  Our group develops, tests and supports device driver software/firmware for Optical/Ethernet networking SOCs with 100 Gbps ports –  Device driver written in C with 150Kloc –  Subversion used for revision control •  Cloudbees Jenkins Platform is used to continuously build and test the device driver –  1 master with 2 slave nodes (VM/bare-metal) –  Production releases are shipped every 2 to 3 weeks –  Continuous Delivery: Release process is automated except for posting to web portal (manual approval) •  Driver testing is done on lab-based boards –  Over 500 automated system-level tests that have a runtime of 200+ board-hours Packet Generator / Monitor FPGA Packets SoC Evaluation / Test board Intel-based COMExpress running Linux
  • 5. Jenkins World #JenkinsWorld Old Method: Manual sharing of boards Problems with approach: •  If an engineer is not using their board, it’s hard for other engineers to use it. •  If an engineer’s assigned board is used by someone else, they typically hunt around trying to find a free board. •  Software regressions can’t take advantage of idle boards and run in parallel. 10 10 10 20 SW Lab (Vancouver) Boards are manually assigned to engineers Regressions are sta2cally assigned … 25 Jobs 4 sites 50 engineers
  • 6. Jenkins World #JenkinsWorld #JenkinsWorld SW Lab New Method: Use IBM’s Load Sharing Facility (LSF) tool to dynamically allocate boards 2 Tool dispatches tests to free boards LSF 3 Board becomes free when test finishes 4 Jenkins uses lower priority queue to make use of idle boards 1 User submits test to queue (FIF0) … 25 Jobs Advantages of using a queue-based solution, such as LSF: • Enhanced productivity − users do not have to find/reserve boards. The system will grant a free board to the user based upon their needs. • Higher reliability − "problematic" boards are taken offline and the system will direct jobs to the other boards. Individual users are not impacted. • Scalability − as users/boards are added, no re-adjustment of board assignments is necessary. • Shorter automated testing cycles and higher efficiency of boards (Board can only run one test at a 2me)
  • 7. Jenkins World #JenkinsWorld Comparing 2 Solutions: Running LSF versus using Jenkins Slaves on Boards Running LSF on Boards •  Board resources are treated as one large resource pool •  If a board crashes, only 1 test result is lost •  Test balancing across jobs is not required as tests are dynamically allocated to boards by LSF •  LSF can allocate all boards to automated tests when users are not using them Using boards as Jenkins Slaves •  Boards are divided into 2 groups: 1 for Jenkins slaves and 1 for users •  If a board crashes, all test results are lost •  Tests need to be equally partitioned across jobs to maximize throughput •  Jenkins can’t take advantage of free boards that users are not using LSF … 25 … 10 … 15 Slaves LSF Hosts 40 Job 1 40 Job 10 Jenkins Scheduler . . . 400
  • 8. Jenkins World #JenkinsWorld Prerequisites for using LSF for Sharing Boards •  Boards have a version of Linux installed (CentOs, RedHat, Fedora, and so on) à LSF sees each board as a Linux Server •  Boards are fairly homogenous in their configuration/hardware –  Many different board types will lead to a fragmented pool o  LSF can easily handle multiple resource types and allocate jobs based on resource requests –  For our project, we ensured we had chip fuse overrides that were controllable through SW •  Requires users to close their debug sessions when finished to allow the board to be allocated to the next user •  Timeouts are enforced by LSF to ensure boards are returned back to the pool •  Successful adoption by the team relies on individuals to use the system and not circumvent it by logging directly into boards 8
  • 9. Jenkins World #JenkinsWorld LSF Concepts •  Each queue can enforce user limits and run times –  Short queue typically has a job limit of 1, run time of 1 hour and highest priority –  Long queue typically allows multiple jobs per user and has the lowest priority •  LSF uses a priority-based, fair-share algorithm to dispatch jobs to hosts •  Each host has a number of attributes which can be requested Cluster … Hosts … Hosts Proj-1 Proj-2 Queues short normal long Resource A2ributes - Proj-1 - atom_cpu - Num_devices=3 - Greenhills - Fedora_Linux - Proj-2 - i5_cpu - Num_devices=2 - Fedora_Linux LSF daemons run on host (sbatchd/res/lim)
  • 10. Jenkins World #JenkinsWorld Submitting Jobs using LSF Running an interactive command on a board through LSF: Ø  bsub -Ip -q short -R Proj1 echo "hello world!“ Job <623549> is submitted to queue <short>. <<Starting on board-105>> hello world! Request specific resource constraints: Ø  bsub –R “Proj1 && i5_cpu” –R “num_devices>=2” xterm Ø  bsub –q long –m board_105 test_cmd ; # requests a specific board To view job status: > bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME 549269 vandegr RUN short swbuild01 board-105 xterm
  • 11. Jenkins World #JenkinsWorld Monitoring Jobs/Queues/Hosts using LSF To view the list of queues and their status: Ø  bqueues QUEUE_NAME PRIO STATUS JL/U NJOBS PEND RUN short 60 Open:Active 1 1 0 1 normal 25 Open:Active 5 7 3 4 long 10 Open:Active 10 33 18 15 To view the host status: Ø  Bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN swbuild01 ok - 4 2 2 board-105 closed - 1 1 1 board-106 unavail - 1 0 0
  • 12. Jenkins World #JenkinsWorld Boards/Jobs Administration using LSF # Print out boards in LSF cluster: > lshosts HOST_NAME model cpuf ncpus maxmem maxswp RESOURCES Board-73 Atom 10.0 1 1.8G 3.9G (proj1 atom) Board-38 Intel_i5 40.0 2 5.6G 3.9G (proj2 i5) # Take board offline: Ø  badmin hclose –C “hardware issue” board-38 Kill/suspend/resume a job running in LSF: Ø  bkill/bstop/bresume <job_id>
  • 13. Jenkins World #JenkinsWorld Integrating LSF with Jenkins for Automated Testing SW Lab 2 Script dispatches tests to LSF as individual jobs with resource constraints LSF 4 Script waits un2l all tests are finished 5 1 Jenkins job starts run_regression python script that creates test list … 25 Job 3 Script scans LSF status and log files to print out real-2me status to console log Script converts test log files into JUnit format which is summarized by Jenkins
  • 14. Jenkins World #JenkinsWorld Enabling User-based Automated Testing through Jenkins •  Regression script is also used in Jenkins for users to run regression tests against their SW/FW changes •  Users can submit their project workspaces to Jenkins for testing •  Helps ensure that the software trunk remains “green” •  Parameterized build is used to pass variables to the Python script •  LSF runs user regressions in parallel with the main (trunk) SW regressions Making QA regressions available to all developers!
  • 15. Jenkins World #JenkinsWorld Tip: Using LSF to prevent corrupted boards from killing your automated regression Problem: Boards in a bad state will causes tests to fail in rapid succession Solution: Automatically take boards offline that rapidly fail tests: •  LSF monitors the job exit rate for boards, and closes the board if the rate exceeds a configurable threshold –  For example, LSF takes a board offline if 5 tests exit abnormally in less than 30 sec •  Run a board initialization script prior to running the test –  LSF executes a pre-execution script that puts the board into a known good state –  If the board initialization script returns an error code, LSF takes the board offline and finds a new board for the test •  Re-queue tests that fail because of a board issue –  LSF will re-run a test if it exits with a special error code(such as 99) This technique has increased overall reliability of the system!
  • 16. Jenkins World #JenkinsWorld Tip: Automatically Reboot/Power-cycle Bad Boards Problem: •  Boards taken offline by LSF need to be quickly brought back online to maximize test throughput Solution: •  A board monitoring script, which is run periodically by Jenkins, will reset/ power-cycle offline boards and bring them back online –  If a soft reset fails, then an Ethernet-based remote power switch (from Digital Loggers) is used to power-cycle the board –  If power-cycling fails, the Jenkins job fails with an error and e-mail notification is sent to the administrator
  • 17. Jenkins World #JenkinsWorld Tip: Qualify Boards Prior to using Them in Automated Regression Testing Problem: •  A small percentage of our boards will have manufacturing defects (primarily due to the SoC devices being removed/re-soldered •  Boards with hardware faults will cause a small number of tests to fail intermittently/consistently, which takes a lot of time to track down Solution: •  As boards are added, run a battery of working tests with a stable SW release on each board with 5 to 10 iterations •  Weed out problematic/failing boards and use the rest for automated testing –  Use LSF groups to create a group of “golden” boards
  • 18. Jenkins World #JenkinsWorld Tip: Handling Flaky Tests Problem: •  Flaky tests (those that can fail or pass with the same software code) make it difficult for automated regressions to be “green” (no failures) •  Our system suffers from flaky tests, similar to what is experienced by Google –  Google Test blog article: “We see a continual rate of about 1.5% of all test runs reporting a "flaky" result. There are many root causes why tests return flaky results, including concurrency, relying on non-deterministic or undefined behaviors, flaky third party code, infrastructure problems, etc.” Workaround: •  Regression script re-runs test failures 3 times and if all re-runs have passed, then the test result is changed from “failed” to “skipped” –  Test result is only modified if the test has been tagged as a flaky test –  This method also helps to quickly identify new test failures as consistent or intermittent failures.
  • 19. Jenkins World #JenkinsWorld Tip: Handling Flaky Tests (continued) •  Flaky tests in Jenkins can be represented by using the “Skip” column –  It doesn’t seem possible to have Jenkins recognize new values for the TEST_RESULT property: <property name="TEST_RESULT" value="SKIP"/>
  • 20. Jenkins World #JenkinsWorld Benefits realized by adopting LSF •  Our solution using LSF and Jenkins for Automated Regression testing has: – Eliminated manual sharing of boards à tedious and inefficient – Simplifies a users’ experience of finding a free board – Increased reliability of the system through automatic power-cycling of corrupted boards and running a board initialization script – Improved quality of code commits as users run automated regression tests on their changes through Jenkins/LSF before committing – Shortened release cycles as automated regressions complete faster by gaining access to more boards
  • 22. Jenkins World #JenkinsWorld Microsemi Corporation (MSCC) offers a comprehensive portfolio of semiconductor and system solutions for communications, defense and security, aerospace and industrial markets. Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs and ASICs; power management products; timing and synchronization devices and precise time solutions, setting the world's standard for time; voice processing devices; RF solutions; discrete components; enterprise storage and communication solutions, security technologies and scalable anti-tamper products; Ethernet solutions; Power-over-Ethernet ICs and midspans; as well as custom design capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif., and has approximately 4,800 employees globally. Learn more at www.microsemi.com ©2016 Microsemi Corporation. All rights reserved. Microsemi and the Microsemi logo are registered trademarks of Microsemi Corporation. All other trademarks and service marks are the property of their respective owners.   Microsemi makes no warranty, representation, or guarantee regarding the information contained herein or the suitability of its products and services for any particular purpose, nor does Microsemi assume any liability whatsoever arising out of the application or use of any product or circuit. The products sold hereunder and any other products sold by Microsemi have been subject to limited testing and should not be used in conjunction with mission-critical equipment or applications. Any performance specifications are believed to be reliable but are not verified, and Buyer must conduct and complete all performance and other testing of the products, alone and together with, or installed in, any end-products. Buyer shall not rely on any data and performance specifications or parameters provided by Microsemi. It is the Buyer’s responsibility to independently determine suitability of any products and to test and verify the same. The information provided by Microsemi hereunder is provided “as is, where is” and with all faults, and the entire risk associated with such information is entirely with the Buyer. Microsemi does not grant, explicitly or implicitly, to any party any patent rights, licenses, or any other IP rights, whether with regard to such information itself or anything described by such information. Information provided in this document is proprietary to Microsemi, and Microsemi reserves the right to make any changes to the information in this document or to any products and services at any time without notice. Microsemi Corporate Headquarters One Enterprise, Aliso Viejo, CA 92656 USA Within the USA: +1 (800) 713-4113 Outside the USA: +1 (949) 380-6100 Sales: +1 (949) 380-6136 Fax: +1 (949) 215-4996 email: sales.support@microsemi.com www.microsemi.com Questions?