This document summarizes the trials and tribulations of managing a large-scale mobile device test automation lab at Yahoo. It discusses why mobile testing on real devices is important, Yahoo's CI/CD pipeline for mobile apps, and details of Yahoo's mobile device testing lab including supported frameworks and over 300 iOS and Android devices. It also outlines some of the challenges of running such a lab at scale including hardware issues, understanding user needs, operational processes, and keeping teams engaged over time.
1. Mobile Device Test Automation
The trials and tribulations of managing a mobile device test automation lab at scale
Heemeng Foo, Senior Manager, Engineering Services (Mobile Excellence), email: heemeng@yahoo-inc.com
1
3. Why test on real mobile devices?
Simulators/Emulators cover only functional aspects
and idealized hardware
What’s missing are h/w related issues eg. screen
size, sensors, SoC, GPU, and how they work
together
“Bloatware” introduced by OEMs
Devices with low end hardware eg. iPod Touch, Moto
E reveal issues you don’t see on flagship devices
3
4. CI / CD @ Yahoo
Cloud based CI/CD pipeline for building
iOS/Android/Java/Node/Python apps
Available to any developer, product manager, designer, or team
Anyone is able to create a build runway with a “simple” config file
Runs 24/7/365
Mobile build runway: Git -> Build -> Test -> Dogfood
http://spectrum.ieee.org/view-from-the-
valley/computing/software/yahoos-engineers-move-to-coding-
without-a-net (bit.ly/yahoo-build)
4
5. Mobile Device Testing @ Yahoo
We have ~ 60 iOS/Android apps
Automation: Mobile Device Lab hooked up to CI/CD pipeline
Manual: Mobile Device Library open 24/7/365, badge access
Device Lab in operation since 2013
● 300+ iOS/Android devices, > 1.5mil test runs to date
● ~ 50 MacMini, ~ 40 Linux (bare iron / OpenStack VM)
● Test Frameworks supported:
● Robotium
● Espresso
● UI Automator (Android)
● UI Automation (iOS),
● XCUITest
● Appium5
6. Why support so many frameworks?
Different teams have different mix of devs/test engineers
Dev heavy
● frameworks integrated with dev env eg. Espresso, UI Automation
● frameworks in dev language eg. Espresso (Java), XCUITest (objC, Swift)
● Core principle: any test automation should be runnable locally
Test engineering covering API/Web/iOS/Android – Appium
6
9. Disclaimer
We didn’t build it all - Initial platform was licensed
I have a great team and support
As a company, Yahoo has invested heavily in CI/CD and developer
productivity
Great developer community in the company
9
11. Challenges / Lessons
It’s a production system
Hardware is not software
Understanding the user
Operationalize, Operationalize, Operationalize
Keeping team engaged
1
1
12. It’s a Production System
Monitoring, bake-in testing, deployment windows all apply
Canary in the coal mine
Operations comms - users should not be surprised
Regular internal end-to-end tests
Health checks and redundancy
Splunk (or ELK) is very useful
Lots of planning needed for major system updates to maintain
uptime
1
2
13. Hardware is not software
Batteries explode
iOS - once upgrade, can’t downgrade
iPhone pre-orders
iOS upgrade popups
Mobile hardware moves fast - follow blogs
WiFi AP Density
1
3
14. Understanding the user
Every mobile test engineer, Device Lab DevOps engineer knows
how to build a simple iOS/Android app
End-to-end canary apps: Git -> Build -> Test
Get involved in a project (and actually use your system)
Reference implementations speak louder than documentation ie.
point devs to working code
1
4
15. Operationalize, Operationalize, Operationalize
Runbooks / SOPs
Maps
Tools to test modules in isolation
Tools to test devices in isolation eg. Remote screencapture
Automate as much as possible (but just enough)
1
5
Good morning everyone. I know it’s almost lunch so I’ll try to be quick.
My name is Heemeng Foo and I represent the Mobile Excellence team at Yahoo and I’m the owner of the Mobile Device Test Automation Lab – or Device Lab in short at Yahoo.
The problem with calling it a “Device Lab” in a talk like this is that for most people, a device lab is just a room with test devices – so I had to STRETCH the name to make it clearer, hence mobile device test automation lab. (smile) This platform is part of our CI/CD system and it allows developers and teams to test their apps on real mobile devices.
A little bit about the Mobile Excellence team at Yahoo, we’re a team of engineers, architects in charge of some key mobile SDKs, build systems and mobile engineering standards eg. Release checklists, telemetry, metrics in the company. Mobile Test automation standards are part of that charter.
As for me, I’ve worked in mobile for the last 16 years starting with SMS, mobile web, iOS/Android apps and more recently IoT. And the title of my talk is …
Before I dive into the “Trials and Tribulations” or what we as managers call “Challenges and lessons” (smile) I’d like to give everyone a little context about mobile testing at Yahoo
First of all why test on real mobile devices? Why not just test on simulators and emulators, surely that will be good enough right???? (smile) We all know that doesn’t cut it.
The reason is that simulators cover idealized hardware. The mobile world is not yet as mature as PC or web, so screen sizes, SoC/GPU and how they work together still matter.
Just to give you an idea of what I mean – this is our Weather app. When the iPhone 6 first came out, we got the device on the same day of launch and found that there was a line in the middle of the View. This never came out on any other previous versions. We fixed this on the day and released a new version on the same day.
On Android, some smartphone vendors (I’m not going to mention names here – smile) have a history of bloatware and running stuff on the background that you won’t find on a Nexus. This showed up in automation runs when we did a comparison of wait times between actions.
Most developers like to develop for the flagship devices eg. iPhone 7 or Pixel but nobody likes to test on low end hardware. However you can find a lot of performance bottlenecks on low end hardware. It used to be like single core, 256Mb RAM but now it’s like dual core 512Mb – esp those you can get at Walmart or Target.
A funny story – I was involved in testing our relaunched Flickr Android app 4 years ago and I had a really low end HTC to test with and because of all the image processing it was doing, you could literally see the GC trying to catch up because of all the swapping in and out. We didn’t launch for those low end Android for the initial launch (smile)
One of the great things about being a developer at Yahoo is that we have a single build platform for iOS/Android etc
Any developer, PM, designer, team is able to construct their build pipelines using a self-service portal. So instead of having to manage your own Jenkins boxes or VMs, it’s all managed by a team.
For mobile a pipeline would consist of Git -> Build -> Test -> Dogfood
There’s an article on IEEE Spectrum that talks a little about this approach. It covers more about developers writing tests but one of the key enablers for this is the build platform.
At Yahoo, in order to do mobile development at scale, we also have to be able to test at scale.
To do that we have a 2 pronged strategy: Device Lab which handles automated tests and a Device Library which is self service badge accessible location for test devices to be tested manually.
[Talk about Device Lab]
I figured that at this point everyone will be falling asleep so I thought I’d put some moving pictures
So the image on the left is one of our devices chugging away on the automation.
Ok so the one in the middle is one of 3 tapster robots we acquired and set up for testing iOS APNS notifications. It involves some OCR and rule based logic.
The last mess of wires is our battery consumption lab.
Now to the meat of the talk
I would summarize the key challenges and lessons to the following
It is a production system that needs to run 24/7/365. We have teams around the world esp in Asia who are working while we sleep.
Hardware is not software – they literally blow up (smile) we were lucky we didn’t have to deal with the Samsung Note 7 issue.
Understanding the user – this is a pretty critical success factor for us as with any other product
Operationalize – this is something I always ask my team: how can we do this in less time with less effort. Because there’s always more work than you can have time to do
Lastly keep the team engaged – with all the ops work we do, keeping the team engaged is a challenge
Although it is part of the build system, it has to be treated like a production system since teams rely on the platform for release certification. This means Monitoring …
One thing early on we realized that was useful was to have a canary app in the build pipeline building on regular intervals. This is a very simple app that does not have external dependencies eg. Network to check the health of the pipeline – this has allowed us to rather frequently know of issues in the build pipeline before the CI/CD guys (smile) The other thing good about this is that users have a tendency to say your platform has a problem when in actual fact their test code has issues.
Ops comms is absolutely critical. We went from users telling us of issues to us telling users of expected issues. We were able to do it because of the canary app.
Regular internal end-2-end tests – the canary app helped us to find issues in the pipeline
Once we had the canary app, we were able to run health checks on all the devices and automatically take bad ones out automatically – most of the time before a user gets it. In order to do that we needed to have redundancy eg. For a particular h/w profile eg. iPhone 6 running 10.1 we’ll have a minimum of 3 and max about 10.
Splunk has been very helpful to us. There are some issues that are extremely intermittent and difficult to isolate and fix. With Splunk you can beacon any data points into a store and make it instantly searchable.
Lastly we have a target to minimize downtime for our users – this means that for any major system updates eg. Glibc updates etc require extensive testing and planning to ensure nothing breaks and uptime.
So I don’t suppose I need to explain the first point. (smile) We were lucky to not deploy the Galaxy Note 7 so we dodged that bullet but we still do have batteries on Nexus 5s that have expanded due to so much charge/discharge we do.
iOS is fun [read]
We make it a point to have the iOS device on our hands on the day of launch – this means staying up for pre-orders and standing in line to get a few units.
It takes a couple of days to hook up for automation but we usually send out a note to all iOS devs to borrow the devices to test their apps
We’re constantly following blogs and announcements on hardware.
WiFi AP Density – most WiFi APs have a limit of maybe 40 devices connected simultaneously, we have > 70.
We usually get the newest member of the team (or myself) to be on ops duty and use the runbook to fix issues. This is great for finding gaps in the documentation.
Maps – when you have 300+ devices all around the data center you need a map
Tools to test stuff remotely – very important unless you want to live in the data center (smile)
Automate as much as possible but don’t go overboard – cos you have to maintain that code!
Is it worth it?
It depends. For us it has helped us scale testing and implement some interesting measures eg. Battery consumption testing
But not every app needs it. Most companies can do well with vendors like SauceLabs