How long has Selenium API, the RC been around for? if you answer was 7 years then you are spot on! WebDriver is 6 years old this yearwhat a long time it has been! Over that time we have seen many languages move from supporting one or other language to supporting both! The code base has slowly but surely merged from having this massive uber jar to only having a large uber jar that allows you to run Selenium Grid, Selenium Server and do all the remote calls that you want to. A lot of this has happened since 2009 where Jason Huggins and Simon Stewart announced that they will want to merge the Selenium and WebDriver code. At that conference Simon had the pleasure of showing webdriver built into Opera. This is a monumental moment the
for the project since it meant that the browser automation tool that we have come to love was being wrapped in the browser. Thinking back on it it seemed that this was an obvious step for the project. Moving webdriver into the browser. The next time we saw this was with the Chrome Driver that Google released. This project was added to the waterfall for the browser. This means that it needs to pass all the tests otherwise whatever broke it needs to be backed out and reapplied at a later stage. So what about FIrefox? Of course we are going to make sure that it is in there too but I will talk about that in a bit.
Why would we want a standard for browser automation? Selenium, the original creation of Jason Huggins, started to make sure that only 2 browsers worked the same against a web app he was working on. Since then we have many many more browsers. Each of these browsers is gaining market share all the time. While this is a good thing for the internet since many competitors mean there will be faster innovation it also means that projects like Selenium have a harder and harder time making sure that we support browsers in a standard way. We have seen support for browsers like Safari degrade because Apple are closing down the approaches we were using. There are times that Selenium is using a security hole to drive the browser in an automated way. Closing those holes is the right thing to do but makes
The other reason is we need to try off load a lot of work to the browser vendors! A lot of the work currently done on the project is purely by volunteers, in their spare time! There is a regular flow of commits from around 15 people. On the screen we can see a few of the people who's job it is is to make sure that works as intended. There are some who's employers encourage them to work on the project and they do most of their day. With browsers moving to a faster release cycle, like Mozilla and Google have, changes to the browser are happening so fast that it is extremely hard to Getting people who know the browser to support all that is needed to make it automatable is the right way forward.
The other main reason is and this is the original reason Mozilla was created was what would happen if a new browser were to be created in the Future? When Selenium WebDriver becomes a standard all new browsers that enter the market, and want to be standards compliant will have to implement this standard! It also means that all browsers on the market would need to implement a version of the API so that people can test their apps. This is especially important for mobile browsers as more and more end users are switching to things like tablets and mobile phones for their day to day internet usage!
Anyone and everyone can contribute to the standard. One of the things that really drew me to working at Mozilla was the ability to make sure that everything I did was out in the open. No matter how hard or hard it is. To be brutally honest working in the open is a lot harder than working on closed source. Doing stuff with W3C is also meaning that a lot of the work that we are doing has to be out in the open so that we can get as much visibility and buy in from other companies that might be interested. Being out in the open also means that if someone has a software patent that deals with with this type of technology they can find out easily and stake any claims. We have seen this happen with a number of different technologies in browsers, like touch events and widgets. Apple has claimed a patent on them when the W3C process is nearly complete
Simon Stewart and I are the current editors of the standard. We will be writing a lot of the stand but our main job is making sure relevant parts of the standard are written and if there are any questions that they get answered or passed over to the most relevant person. We are in discussions with a number of the different browser vendors and as I said earlier we need to get them involved where possible. Google, Mozilla and Opera now have a vested interest in the standard and are working to make sure that it becomes part of the browser. Microsoft have shown some interest but would like the first version of the specification to be available.
This is going to be a very long process since we need to get buy in from a number of companies. We also need to see if there are any patents out there that may influence the way things work. We are hoping that we can have all of this done within the next 2 years. Yes I did say years. creating a standard is probably the same if not harder than trying to pass a law. Everyone will have an opinion and to make sure that we get this in we have to listen to everyone's thoughts on the process. I do mean that we should listen to them since the quickest way to have someone turn against this project is to make sure that they do not have a proper voice! Thats not in the spirit of OSS!
So what is the next thing that needs to be done? Well everything really but let me start by breaking down each of the items and asking the relevant owners within the Selenium Project to complete what it currently does and then start a discussion from that. We are still in the early stages and I really mean that. We have only had one meeting and that created more questions but answers. We have also split up a lot of work and given them to Selenium committers. A good example is the Interaction API that Eran Messeri wrote. Getting him to fill in all that section for the browser is really really important step to fleshing it out.
And when we are fleshing out the spec we want to make sure that we are creating something that mirrors what we are doing now. As you all know there are roughly two parts to make Selenium WebDriver work. There is the client code and there is the server code. In this instance the client code is the Java, .NET, Ruby or Python code. Then there is the browser or the RemoteWebDriver. They mirror each other as much as possible. We call this the Janus API. And these API's speak to each other via the JSON Wire Protocol currently but we will not hold browsers to this form of transport layer. Currently everything that is with the Selenium project works this way as well ChromeDriver and OperaDriver. We will document this but if people don't follow it then great too. The flexibility will mean that we are more likely to gather support.
We at least know the world is not flat and that we want to finish with a standard way that will eventually come to light.
A standard for_browser_automation
The road to a standard David Burns @automatedtester
The road to a standard David Burns @automatedtester