Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SCALING MOBILETESTING
ON AWS: EMULATORS ALL
THE WAY DOWN
Kim Moir, Mozilla, @kmoir
URES, November 13, 2015
Good morning. M...
A little about me. I live in Ottawa, Ontario, Canada. My hobbies include running and making ice cream, which complement ea...
Here’s a picture of the where the amazing Mozilla release engineering team work. As you can see, we are quite distributed ...
Mozilla is a non-profit. Our mission is to promote openness, innovation & opportunity on the web. 

You’re probably familia...
We have a continuous integration farm running 24x7 on commit. Our release cadence is every six weeks for Firefox for Andro...
DAILY
• 350 pushes
• 4700 build jobs
• 150,000 test jobs
Here are some recent numbers on the aggregate jobs we run (all pr...
15 MINUTE SERVICE
We have a commitment to developers that build/test jobs should start within 15 minutes of being requeste...
+ many Mozilla tools
Here are some of projects that we use in our infrastructure. 

Buildbot is our continuous integration...
DEVICES
• 6700+ in total
•1900+ for builds
•4700+ for tests
•75% AWS
These numbers are for both Android and desktop device...
HISTORY OF MOBILETESTING
AT MOZILLA
Before I talk about where we are today, I’d like to step back and talk about how our m...
Picture by Aki Sasaki

https://www.flickr.com/photos/drkscrtlv/3590924524/sizes/l

http://escapewindow.dreamwidth.org/20593...
In 2010, we then moved on to testing on Android 2.2 on Tegras. Tegra are bare reference boards.

We stored Tegra in shoe r...
In 2012, we started running continuous integration tests on Android reference cards in specially designed racks. We starte...
They had a custom relay board to allow us to reboot them remotely.

Pictures of Panda chassis from Dustin’s blog

https://...
Many racks of pandas

These devices are not as stable as desktop devices, and are prone to failure. Given their numbers, h...
WHAT DID WE LEARN?
What did we learn over these iterations of our mobile testing infrastructure?

Each successive mobile t...
We have bursty traffic, both for time of day, time of year etc

Example of the number of jobs running per hour in a typical ...
BRANCHING
We have many different branches in Hg at Mozilla. Our Hg branches are all named after different tree species

Deve...
Source: http://opensignal.com/reports/2015/08/android-fragmentation/
What do we need to test? Here’s a picture of Android ...
And here is current Android adoption (October 2015)

Android “Kit Kat” 4.4 has about 40% adoption rate

Android "Jelly Bea...
ANDROIDTEST PLATFORMS
•Android 2.3, 4,0, 4.2 (x86), 4.3
•Test types
•correctness
•debug
•performance
Obviously, we cannot ...
In 2012, we started moving our build and test infrastructure to Amazon. We first implemented this for desktop Firefox jobs ...
AWSTERMINOLOGY
• EC2 - Elastic compute 2 - machines asVMs
• EBS - Elastic block store - network attached
storage
• Region ...
MORE AWSTERMS
• AMI - Amazon machine image
• instance type -VM with defined specifications
and cost per hour. For example:
-...
PUPPETVS AMIS
AMIs are Amazon machine instances

Golden AMIs

We create golden image AMIs via cron each night. These image...
USE SPOT INSTANCES
• Use spot instances vs on demand instances
• much cheaper
• not instantiated as quickly
• terminated i...
Minimum viable instance type

Run more tests in parallel on a cheaper instance types rather than upgrading instance type

...
WHERE’STHE CODE?
• The tools we use are all open source
• https://github.com/mozilla/build-cloud-tools
• Which use boto li...
SMARTER BIDDING
ALGORITHMS
• Important scripts
• aws_stop_idle.py
• aws_watch_pending.py
-stop_idle stops instances that a...
REGIONS AND INSTANCES
• Run instances in multiple regions
• Start instances in cheaper regions first
• Automatically shut d...
LIMIT POOL SIZE
Limit pool size

The size of the AWS pools allocated to different instance types is limited so if the numbe...
LIMIT EBS USE
• EBS is network attached store to the EC2VM
• Much cheaper to use the disk that comes with the
instance type
SUMMARY: AWS
• Golden master of AMIs regenerated daily
• Use spot instances
• Smarter bidding algorithms
• Optimize use of...
EMULATOR ENVIRONMENT
(1)
• Android 4.3 (AOSP 4.3.1_r1, JLS36I); standard 2.6.29 kernel
• 1 GB of memory
• 720×1280, 320 dp...
EMULATOR ENVIRONMENT
(2)
• Run emulator that comes with Android SDK and
load the custom image, install Firefox apk
• We ru...
This a screenshot of when the emulator is starting up. We have a tooling in our test suites that creates a screen shot whe...
This screenshot is of and android test suite test failure.

Most of the time the logs that are uploaded with the screensho...
ACCESSTO DEVICES
• Access to processes via adb (Android debug
bridge)
• Allows us to kill errant processes
• Some test typ...
MIGRATION PROCESS
• Moved correctness tests, then debug
• Many intermittent issues
• Debug were problematic
• Take longer ...
MIGRATION LESSONS
• Use more powerful instances types
• Specify timeouts that are longer for individual tests
• Skip tests...
PERFORMANCE TESTS
• Autophone is a Mozilla project measuring page
load performance and testing video playback on
real Andr...
EMULATORS IN AWS:THE
GOOD
Emulators: the good

When we want to test a new Android version, we just need a new emulator ima...
EMULATORS IN AWS:THE BAD
• More tests running in parallel (tests run slower,
added more tests)
• No performance tests beca...
SUMMARY: EMULATORS ON
AWS
• Determine what testing can be done on emulator
vs real device
• Use minimum viable instance ty...
FUTURE WORK
• Android 5.0 on emulator
• Make it better
QUESTIONS?
WHERE’STHE CODE?
• Cloud tools: https://github.com/mozilla/build-cloud-tools
• buildbot configs https://github.com/mozilla/...
LEARN MORE
• @MozRelEng
• http://planet.mozilla.org/releng/
• Mozilla Releng wiki https://wiki.mozilla.org/
ReleaseEnginee...
MORE READING 1
• Laura's talks on monitoring complex systems http://vimeo.com/album/3108317/video/
110088288
• Armen’s tal...
MORE READING 2
• AWS spot instances vs reserved instances
• http://atlee.ca/blog/posts/now-using-aws-spot-instances.html
•...
MORE READING 3
• Scaling
• http://atlee.ca/blog/posts/bursty-load.html
• jacuzzis
• http://atlee.ca/blog/posts/initial-jac...
Upcoming SlideShare
Loading in …5
×

Scaling mobile testing on AWS: Emulators all the way down

This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6700 machines, 150,000 combined daily build and test jobs that are triggered by an average 300 pushes. This talk was given at USENIX release engineering summit in Washington, DC on November 13, 2015.

  • Be the first to comment

Scaling mobile testing on AWS: Emulators all the way down

  1. 1. SCALING MOBILETESTING ON AWS: EMULATORS ALL THE WAY DOWN Kim Moir, Mozilla, @kmoir URES, November 13, 2015 Good morning. My name is Kim Moir and I’m a release engineer at Mozilla. Today I’m going to discuss how we scale our Android testing on AWS. Show of hands - how many of you test on Android? On a continuous integration farm? References Androids by etnyk Attribution-NonCommercial-NoDerivs 2.0 Generic license https://www.flickr.com/photos/etnyk/5588953445/sizes/l
  2. 2. A little about me. I live in Ottawa, Ontario, Canada. My hobbies include running and making ice cream, which complement each other well. This picture shows a release engineering ice cream flavour - coffee ice cream with chocolate chip cookies soaked in Kahluha. Before I was a release engineer at Mozilla I worked at IBM as a release engineer on Eclipse. So 12 years working on open source release engineering. I’m really excited to be here today to share my stories, and learn from all of you.
  3. 3. Here’s a picture of the where the amazing Mozilla release engineering team work. As you can see, we are quite distributed across the world, and many of us work remotely from our homes.
  4. 4. Mozilla is a non-profit. Our mission is to promote openness, innovation & opportunity on the web. You’re probably familiar with the products we build, such as Firefox for Desktop, Android, iOS and Firefox OS. Firefox for iOS was actually released yesterday - so go and try it out! Note that we ship Firefox on four platforms and with ~97 locales on the same day as US English
  5. 5. We have a continuous integration farm running 24x7 on commit. Our release cadence is every six weeks for Firefox for Android. We release betas every week. https://wiki.mozilla.org/RapidRelease I’ll talk a little bit about our environment in general, before I delve into our Android test environment.
  6. 6. DAILY • 350 pushes • 4700 build jobs • 150,000 test jobs Here are some recent numbers on the aggregate jobs we run (all products, not just Firefox for Android). Today, about 66% of build jobs and 80% of test jobs are run on AWS. We only have our performance tests left that run on raw devices. They can’t run on emulators because performance is not constant. Each time a developer lands a change, it invokes a series of builds and associated tests on relevant platforms. Within each test job there are many actual test suites that run. September: 8188 pushes https://secure.pub.build.mozilla.org/buildapi/reports/pushes?starttime=1441090800&endtime=1443682800 September jobs https://secure.pub.build.mozilla.org/buildapi/reports/waittimes?starttime=1441090800&endtime=1443682800 Builds Oct 4-Oct10 https://secure.pub.build.mozilla.org/buildapi/reports/waittimes?starttime=1443942000&endtime=1444460400 builds 15560 Builds Tuesday Oct 6 https://secure.pub.build.mozilla.org/buildapi/reports/waittimes?starttime=1444104000&endtime=1444190400 2814
  7. 7. 15 MINUTE SERVICE We have a commitment to developers that build/test jobs should start within 15 minutes of being requested. We don’t have a perfect record on this, but certainly our numbers are good. We have metrics that measure this every day so we can see what platforms need additional capacity. And we adjust capacity as needed, and remove old platforms as they become less relevant in the marketplace. ——— Pizza picture by djwtwo Attribution-NonCommercial-ShareAlike 2.0 Generic (CC BY-NC-SA 2.0) https://www.flickr.com/photos/djwtwo/9864611814/sizes/l/
  8. 8. + many Mozilla tools Here are some of projects that we use in our infrastructure. Buildbot is our continuous integration engine. However, we are in the process of migrating to TaskCluster. Task cluster is a set of components that manages task queuing, scheduling, execution and provisioning of resources. It was designed to run automated builds and test at Mozilla. We use Puppet for configuration management all our Buildbot servers, and the Linux, Mac and machines. So when we provision new hardware, we just boot the device and it puppetizes based on it’s role that’s defined by it’s hostname. Our repository of record is hg.mozilla.org but developers also commit to git repos and these commits are transferred to the hg repository. We also use a lot of mozilla tools that allow us to scale. These tools are open source as well and I have links at the end of the talk to these repos. —— References octokitty http://www.flickr.com/photos/tachikoma/2760470578/sizes/l/
  9. 9. DEVICES • 6700+ in total •1900+ for builds •4700+ for tests •75% AWS These numbers are for both Android and desktop devices. The pools overlap. 80% test AWS and 66% build AWS ——- References https://secure.pub.build.mozilla.org/builddata/reports/slave_health/index.html * https://secure.pub.build.mozilla.org/slavealloc/ui/#silos
  10. 10. HISTORY OF MOBILETESTING AT MOZILLA Before I talk about where we are today, I’d like to step back and talk about how our mobile testing evolved over the years. Here’s a picture from 2009 of a mobile pedalboard. This was our first attempt at mobile test automation. It was used to report Fennec performance data on the Nokia N810's Picture by Aki Sasaki https://www.flickr.com/photos/drkscrtlv/3590117065/sizes/l
  11. 11. Picture by Aki Sasaki https://www.flickr.com/photos/drkscrtlv/3590924524/sizes/l http://escapewindow.dreamwidth.org/205930.html
  12. 12. In 2010, we then moved on to testing on Android 2.2 on Tegras. Tegra are bare reference boards. We stored Tegra in shoe racks from Bed Bath and Beyond These shoe racks were stored in a room that was shielded from wireless interference. The shoe racks allowed us to position the phones so they weren’t too close together, on a material that didn’t get too hot and did not conduct electricity. These racks also allowed us to easily take dead phones out, open, remove batteries, reimage and replace. Picture from John O’Duinn’s blog http://oduinn.com/blog/2010/02/11/unveiling-mozillas-faraday-cage/ http://oduinn.com/images/2013/blog_2013_RelEngAsForceMultiplier.pdf
  13. 13. In 2012, we started running continuous integration tests on Android reference cards in specially designed racks. We started with 800 of them, but only use about 200 today. The cards are called pandas. These were used to run Android 4.0 tests for correctness, debug and performance. ___ References Pictures of Panda chassis from Dustin’s blog https://blog.mozilla.org/it/2013/01/04/mozpool/2012-11-09-08-30-03/
  14. 14. They had a custom relay board to allow us to reboot them remotely. Pictures of Panda chassis from Dustin’s blog https://blog.mozilla.org/it/2013/01/04/mozpool/2012-11-09-08-30-03/
  15. 15. Many racks of pandas These devices are not as stable as desktop devices, and are prone to failure. Given their numbers, having to deal with the machines failing all the time is very expensive if they were managed by humans. We wrote some software called mozpool to automatically reimage and reboot them. Pictures of Panda chassis from Dustin’s blog https://blog.mozilla.org/it/2013/01/04/mozpool/2012-11-09-08-30-03/
  16. 16. WHAT DID WE LEARN? What did we learn over these iterations of our mobile testing infrastructure? Each successive mobile testing solution became more reliable (fewer infra failures) and easier to manage via automated tools Manufacturers EOL reference cards. Old reference cards don’t support new Android versions Does not scale for peak load Time consuming and expensive to adjust automation infrastructure to for every new hardware iteration Picture https://www.flickr.com/photos/wocintechchat/21909333504/sizes/l from http://www.wocintechchat.com/blog/wocintechphotos #WOCtechchat Picture: computer history museum https://www.flickr.com/photos/indigoprime/2239342335/sizes/o/
  17. 17. We have bursty traffic, both for time of day, time of year etc Example of the number of jobs running per hour in a typical week Bursty traffic - you can see that the number of jobs run each day is variable as time zones wake up, and the large trough is the weekend.
  18. 18. BRANCHING We have many different branches in Hg at Mozilla. Our Hg branches are all named after different tree species Developers push to different branches depending on their purpose. Different branches have different scheduling priorities within our continuous integration engine. So for instance, if a change is landed in a mozilla-beta branch, the builds and tests associated with that change will have machines allocated to them with at a higher priority than if a change was landed on a cedar branch which is just for testing purposes. Picture by Aurelio Asiain Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) https://flic.kr/p/v27AD
  19. 19. Source: http://opensignal.com/reports/2015/08/android-fragmentation/ What do we need to test? Here’s a picture of Android device fragmentation as of August 2015 Source: http://opensignal.com/reports/2015/08/android-fragmentation/
  20. 20. And here is current Android adoption (October 2015) Android “Kit Kat” 4.4 has about 40% adoption rate Android "Jelly Bean" versions (4.1–4.3.1), with a combined share of 30.2%. Sources https://en.wikipedia.org/wiki/Android_version_history
  21. 21. ANDROIDTEST PLATFORMS •Android 2.3, 4,0, 4.2 (x86), 4.3 •Test types •correctness •debug •performance Obviously, we cannot test on all those platforms and devices, it’s not feasible. We limit our testing to the following platforms.
  22. 22. In 2012, we started moving our build and test infrastructure to Amazon. We first implemented this for desktop Firefox jobs on Linux. We then implemented them for Android. Scalable infrastructure for bursty traffic with an API to manage it all. Scalable Deals with bursty load APIs! Picture by Tim Norris Create Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) https://www.flickr.com/photos/tim_norris/2600844073/sizes/o/
  23. 23. AWSTERMINOLOGY • EC2 - Elastic compute 2 - machines asVMs • EBS - Elastic block store - network attached storage • Region - separate geographical area • Availability zone - Multiple, isolated locations within a region I’m going to talk a bit about some AWS terms for those of you that may not be familiar with them. Notes: AWS instance types http://aws.amazon.com/ec2/instance-types/
  24. 24. MORE AWSTERMS • AMI - Amazon machine image • instance type -VM with defined specifications and cost per hour. For example: -AMIs - Amazon has standard ones that you can modify or create your own -pricing on instance types can depend on the region -m3.medium currently costs around $0.07hr in most regions (Nov 2015 costs) -Some instance types may not be available in all availability zones
  25. 25. PUPPETVS AMIS AMIs are Amazon machine instances Golden AMIs We create golden image AMIs via cron each night. These images are generated from our puppet configs. We have different images defined for different instance type and the role that they perform. For example test and build instances have different libraries and configuration in puppet. Originally we used puppet to manage all our of build and test instances. It was too slow to puppetize the spot instances Solution: Create golden AMIs from configs each night via cron. These are used to instantiate the new spot instances. We also use the same pool AMI to run Android tests and Linux tests, they just run in different directories. Another reason for nightly regeneration is pre-populating VCS caches to reduce first time startup load. Picture by shaireproductions Creative Commons Attribution 2.0 Generic (CC BY 2.0) https://flic.kr/p/dTfsCs
  26. 26. USE SPOT INSTANCES • Use spot instances vs on demand instances • much cheaper • not instantiated as quickly • terminated if outbid while running Amazon has many different types of instances. Initially, we used on demand instances. They instantiate quickly but cost more per hour than other options. Spot instances are Amazon way of bidding off excess capacity. You can bid for the instance and if nobody else bids for it at a price above your offer, the spot instances will be instantiated for you. However, if you’re running a spot instance and someone bids a price higher than you did, your instance can be killed. But that’s okay because we have configured our build farm to retry jobs that failed and a very small percentage are killed this way (< 1%) Since the spot instances aren’t available as quickly as the on-demand instances, some tests don’t start within 15 minutes but that’s okay. Spot instances are instantiated every time with the AMI you specify. Other notes Smart bidding spot bidding library https://bugzilla.mozilla.org/show_bug.cgi?id=972562
  27. 27. Minimum viable instance type Run more tests in parallel on a cheaper instance types rather than upgrading instance type Most tests run on m3.medium but some need more Limit the subset of tests run on more expensive instance types to those that actually need it Our tests have a timeout for a suite of tests. If they don’t complete within this timeout, they fail and retry. It’s much cheaper to run more tests in parallel on a cheaper instance type, than run on a more expensive instance type due to the scale of our operations. For example our Android 4.3 reftests invoke 48 parallel jobs. For instance, we have Android tests that run on Emulators on AWS. Some of the reference tests required a c3.xlarge to run. The correctness tests were fine to run on m3.medium Picture by kenny magic Creative Commons Attribution 2.0 Generic (CC BY 2.0) https://www.flickr.com/photos/kwl/4247555680/sizes/l
  28. 28. WHERE’STHE CODE? • The tools we use are all open source • https://github.com/mozilla/build-cloud-tools • Which use boto libraries (Python interface to AWS) https://github.com/boto/boto The code we use to interact with AWS APIs resides here
  29. 29. SMARTER BIDDING ALGORITHMS • Important scripts • aws_stop_idle.py • aws_watch_pending.py -stop_idle stops instances that are no longer needed given our current capacity (idle for a certain time period - threshold depends on if on-demand or spot) -aws_watch_pending activates instances given the criteria on the next slide
  30. 30. REGIONS AND INSTANCES • Run instances in multiple regions • Start instances in cheaper regions first • Automatically shut down inactive instances • Start instances that have been recently running • Bid on similar instance types If you look at aws_watch_pending.py, these are some of the rules that it implements We also use machines in multiple AWS regions, in case one region went down, and also to incur cost savings (some regions are cheaper). Currently we only use us-east1 and us-west2. Since all of our CI infrastructure resides in California, we don’t use most other regions. Unlike some companies that need to have instances available instantly - for instance I recently saw a talk by Bridget Kromhout (http://bridgetkromhout.com/speaking/2014/beyondthecode/), an operations engineer from DramaFever. This company provides international movies content on demand. They use every single AWS region because there customer base is so distributed. Better build times and lower costs if you start instances that have recently been running (still retain artifact dirs, billing advantages)
  31. 31. LIMIT POOL SIZE Limit pool size The size of the AWS pools allocated to different instance types is limited so if the number of requests spikes we have higher pending counts, but not a huge spike in our AWS bill. Bidding algorithm does not bid automatically bring up machines for all pending jobs. Adds some more capacity, waits, re-evaluates pending count, and adds some more if needed Similar to thermostat system to heat your house, gradually add more heat Picture - Ottawa Arboretum - Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) https://www.flickr.com/photos/rohit_saxena/4552766281/sizes/l
  32. 32. LIMIT EBS USE • EBS is network attached store to the EC2VM • Much cheaper to use the disk that comes with the instance type
  33. 33. SUMMARY: AWS • Golden master of AMIs regenerated daily • Use spot instances • Smarter bidding algorithms • Optimize use of regions, instance type and capacity • Limit pool size and increase capacity gradually • Use instance storage vs EBS to save $ With these changes, we reduced our initial AWS bill by 70% (as of last year) However, today we use AWS S3 (backend storage) so this has really increased our bill from our initial implementation (we migrated all of our FTP data to S3)
  34. 34. EMULATOR ENVIRONMENT (1) • Android 4.3 (AOSP 4.3.1_r1, JLS36I); standard 2.6.29 kernel • 1 GB of memory • 720×1280, 320 dpi screen • 128 MBVM heap • 600 MB /data and 600 MB /sdcard partitions • front and back emulated cameras; all emulated sensors • standard crashreporter, logcat, anr, and tombstone support So now that we’ve talked about our AWS environment, let’s talk about our move to emulators From https://gbrownmozilla.wordpress.com/2015/04/23/android-4-3-opt-tests-running-on-trunk-trees/
  35. 35. EMULATOR ENVIRONMENT (2) • Run emulator that comes with Android SDK and load the custom image, install Firefox apk • We run tests on a variety of instance types (m3.medium, m3.xlarge, c3.xlarge) http://developer.android.com/tools/devices/emulator.html
  36. 36. This a screenshot of when the emulator is starting up. We have a tooling in our test suites that creates a screen shot when the emulator starts, or when a test fails. These binaries of the screen shots, logs or other testing artifacts are uploaded to Amazon S3 storage and available for developers when their tests fails.
  37. 37. This screenshot is of and android test suite test failure. Most of the time the logs that are uploaded with the screenshot are more useful. Example log http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/ sha512/61c91375333e3265c832cff6f1ff314fb9b70c6a2d15386f0a303c7226cfd1ed7209680d88ac032332907a43cfcf4f03c5f02e5531101ae3b855c699ce1e4e02
  38. 38. ACCESSTO DEVICES • Access to processes via adb (Android debug bridge) • Allows us to kill errant processes • Some test types require root permissions to copy files to certain locations or for other privileged operations http://developer.android.com/tools/help/adb.html
  39. 39. MIGRATION PROCESS • Moved correctness tests, then debug • Many intermittent issues • Debug were problematic • Take longer and consume more resources Migration Process Intermittent issues Debug were problematic Take longer and consume more resources
  40. 40. MIGRATION LESSONS • Use more powerful instances types • Specify timeouts that are longer for individual tests • Skip tests on certain (slow) platforms • Split the tests into smaller tests • Optimize or simplify the test https://gbrownmozilla.wordpress.com/2015/05/26/handling-intermittent-test-timeouts-in-long-running-tests/
  41. 41. PERFORMANCE TESTS • Autophone is a Mozilla project measuring page load performance and testing video playback on real Android devices • Provision, verify, recover, run tests and identity status of variety of phones Retain small pool of real devices for performance tests From https://wiki.mozilla.org/Auto-tools/Projects/Autophone Verify that a phone is working correctly: sd card is writable and not full, etc. Attempt to recover a phone that reports errors, rerunning the current test/test framework. Provide at least a high-level status for all phones: whether they are idle, running a test, or disabled/broken. Support a large number of phones, potentially split amongst several host machines.
  42. 42. EMULATORS IN AWS:THE GOOD Emulators: the good When we want to test a new Android version, we just need a new emulator image, not a new hardware stack. No lead time associated with procuring and installing new hardware in the data centre. Increased reliability due to fewer retries (2% vs 18% on Pandas) Some of that reliability stems from the fact that with the emulator tests will run them from the same, fresh Android image each time. When the tests ran on devices, the reimaging process took a long time and the devices had to be re-imaged every so often which was a more manual process. Scalable to deal with daily job spikes We don’t have to write and maintain software to manage a pool of devices. We can just use the Amazon APIs to provisions resources for our CI system. Picture by SaturatedEyes - Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) https://www.flickr.com/photos/shuttershuk/7099823113/sizes/l
  43. 43. EMULATORS IN AWS:THE BAD • More tests running in parallel (tests run slower, added more tests) • No performance tests because we’re running emulators on emulators Emulators: the bad Tests run slower because we’re running tests on emulators on emulators More tests need to run in parallel because they take longer Example: Android 4.3 debug tests need to run about 2x many jobs as they did when running on raw devices No performance tests (have a separate pool of raw devices for this purpose) As a side note: Amazon has a new offering from this summer called Device Farm which allows you to run tests on a multiple devices. We don’t use it because it is through an API that doesn’t support the tests harnesses that we use. Also, it doesn’t that doesn’t allow root access to the device. Also, the pricing ($250 a month for a single dedicated device) is much more expensive than spot instances). Picture by Tuncay - Creative Commons Attribution 2.0 Generic (CC BY 2.0) https://www.flickr.com/photos/tuncaycoskun/15809887756/sizes/l
  44. 44. SUMMARY: EMULATORS ON AWS • Determine what testing can be done on emulator vs real device • Use minimum viable instance type • Run more tests in parallel May need larger instance type to speed up longer running tests Minimize the number of tests that need to run on real hardware. Running tests on real devices in continuous integration is much more complicated/painful that running them on emulators. Does not allow you to upgrade easily for the next Android version
  45. 45. FUTURE WORK • Android 5.0 on emulator • Make it better
  46. 46. QUESTIONS?
  47. 47. WHERE’STHE CODE? • Cloud tools: https://github.com/mozilla/build-cloud-tools • buildbot configs https://github.com/mozilla/build-buildbot-configs • builldbotcustom https://github.com/mozilla/build-buildbotcustom • Mozharness https://github.com/mozilla/build-mozharness • Mozpool https://github.com/mozilla/mozpool • Puppet configs https://github.com/mozilla/build-puppet
  48. 48. LEARN MORE • @MozRelEng • http://planet.mozilla.org/releng/ • Mozilla Releng wiki https://wiki.mozilla.org/ ReleaseEngineering • IRC: channel #releng on moznet
  49. 49. MORE READING 1 • Laura's talks on monitoring complex systems http://vimeo.com/album/3108317/video/ 110088288 • Armen’s talk on our hybrid infrastructure https://air.mozilla.org/problems-and-cutting- costs-for-mozillas-hybrid-ec2-in-house-continuous-integration/ • Move to AWS starting in 2012 • http://atlee.ca/blog/posts/blog20121002firefox-builds-in-the-cloud.html • http://johnnybuild.blogspot.ca/2012/08/migrating-linux32-and-linux64-builds-to.html • http://atlee.ca/blog/posts/blog20121214behind-the-clouds.html • http://rail.merail.ca/posts/firefox-unit-tests-on-ubuntu.html Scaling http://atlee.ca/blog/posts/bursty-load.html jacuzzis http://atlee.ca/blog/posts/initial-jacuzzi-results.html http://hearsum.ca/blog/experiments-with-smaller-pools-of-build-machines/ Caching
  50. 50. MORE READING 2 • AWS spot instances vs reserved instances • http://atlee.ca/blog/posts/now-using-aws-spot-instances.html • http://rail.merail.ca/posts/firefox-builds-are-way-cheaper-now.html • http://rail.merail.ca/posts/ec2-spot-instances-experiments.html • http://taras.glek.net/blog/2014/05/09/how-amazon-ec2-got-15x-cheaper-in-6-months/ • http://taras.glek.net/blog/2014/03/05/more-and-faster-c-i-for-less-on-aws/ • AWS networking • http://atlee.ca/blog/posts/aws-networks-and-burning-trees.html • http://rail.merail.ca/posts/using-dns-to-query-aws.html
  51. 51. MORE READING 3 • Scaling • http://atlee.ca/blog/posts/bursty-load.html • jacuzzis • http://atlee.ca/blog/posts/initial-jacuzzi-results.html • http://hearsum.ca/blog/experiments-with-smaller-pools-of-build-machines/ • Caching • http://atlee.ca/blog/posts/cache-em-all.html • Geoffrey Brown’s blog on Android tests https://gbrownmozilla.wordpress.com/

×