Why you shouldn't start learning Selenium by automating Google's products?

Published: by Creative Commons Licence

Whenever I see questions on StackOverflow about UI automation against Google's products (especially Gmail, Google maps, Youtube, etc.), the first thing pops up in my head would be - do those OPs work for Google? I guess not. Then why are there so many people interested in automating Google's applications from UI?

My wild guess is that this mainly includes two situations:

  • Someone who starts off learning Selenium and has decided to automate Google search as a starting point. Since Google homepage has a reasonably clean and simple UI, people have the reason to believe that Google could be the best site for learning Selenium. After playing with "Google Search" scenario, they might continue to try out automating Gmail or Youtube.
  • Someone who needs to access Google services like Gmail, Google maps as part of the job, which can be either testing projects or applications involving browser automation.

Well, for whatever the reason is, people should rarely need to automated Google's applications from UI level, especially for those who have just started learning Selenium.

Don't automate Google's products

Google search use case is too easy

Almost all Selenium 101 tutorials use Google search scenario, which seems like one the of easiest use cases.

  1. Send keys to Google's search box
  2. Submit
  3. Check the title in browser window

This is indeed a reasonable starting point for people who have just started coding with Selenium, as it provides an intuitive way to demonstrate what Selenium does - "Selenium automates browsers"[1].

However, considering the fact that it looks so easy purely because it was deliberately simplified for educational purposes, this use case doesn't have to be the first Selenium program recommended for novice users to learn. More importantly, the dynamic DOM structure and the instant search feature have made Google homepage definitely not the easiest thing for them. Without revealing what's under the hood, it might give newbies the false impressions that

  • Selenium projects are like some auto-generated scripts - let's copy n' paste these statements.
  • Google's products are extremely straightforward to automate - let's try automating Gmail or Youtube next.

Instead of giving a use case that could potentially cause trouble, it might be better to find a scenario as simple as below for learning purposes.

  1. Search [selenium] on StackOverflow
  2. Find out how many questions are tagged with "selenium"

Everything else is too difficult

Even though Google's products are technically automatable, it requires certain amount of time and effort to achieve[2]. Learning Selenium or performing certain tasks by automating them from UI would certainly be a poor choice.

  • DOM is too complex

    Google is well-known for its meaningless, minimized DOM structures. Not only straightforward approaches like id, name won't work well, but also advanced XPath/CSS selectors can merely be used because most of the things in DOM are not human readable. Difficulties in element locating could be really common in UI automation sometimes, while automating Google's products makes it inevitable. Strong XPath/CSS selector skills are crucial to get the job done.

    What about text-free XPaths based on position relationship, like //div[3]/div[1]/div[3]/span[10]/a? They are even worse and likely to be changed frequently by Google. Brittle XPaths shouldn't be used in any kind of programs in the first place anyway.

  • Some are Ajax powered

    Dynamic content handling techniques like waiting system WebDriverWait are essential for UI automation, but Ajax powered Google products (Google Maps, Youtube, etc.) would be definitely considered as one of the hardest. For those JavaScript and Ajax heavy applications, wait time might vary and debugging them are difficult due to other constraints, like dynamic DOM structures.

  • DOM changes too frequently

    Just like many other active-developing web applications, the DOM of Google's products are constantly changed either deliberately in order to prevent scripting or as part of the development process. Assuming best-available locators have been used, it may still frequently break the existing Selenium code and make it highly unmaintainable,

  • Source code is unavailable

    Having both read and write access to internal issue tracker, source code, documentation, private APIs can be extremely useful during UI automation process. On one hand, modifying source code for testing purpose, like adding class names, is a common practice for making elements locating easier. On the other hand, monitoring UI related bug/feature tickets would greatly help developers track down what has been changed. Unfortunately, external automators outside Google won't be able to take advantage of this.

There are APIs

What if you are not only learning Selenium, but also trying to achieve something from those Google services, for instance, to verify emails in Gmail at part of the testing process?

The answer is to use APIs or existing libraries that wraps around the APIs. Google provides APIs for Gmail, Youtube, Drive, Maps and more[3]. This is the intended and more correct way to access data. People who have just started UI automation might think and solve problems in a way that end users will do. However, using Selenium doesn't mean that absolutely all actions have to be performed from UI level, which, most of the time, would be an inefficient approach.

For example, ruby-gmail is a Ruby library that allows accessing emails in Gmail. After installing the gem using gem install ruby-gmail, writing a program to get inbox count would be just a matter of seconds:

require 'gmail'

gmail = Gmail.new('GMAIL_ADDRESS', 'GMAIL_PASSWORD')
puts "Inbox count: #{gmail.inbox.count}"

It was against the "Term of service"

Although this might no longer be true for Google applications[4] and nobody really cares about user agreement, it's still worth noting that Google specifically had statement regarding automated access in their old versions of "Term of Service". Same happens to Youtube as well, which still exists in current Youtube Terms of Service 4-H. Show

Who might still do it?

  • UI automation team within Google

    If such team exists, the biggest advantage for them is the accessibility to source code. With ability to modify source code, UI automation can be a lot easier without fighting against nasty DOM structure. Additionally, they can easily track down which feature/bug ticket has changed DOM structure that causing previous Selenium WebDriver code failing. On the other hand, instead of automating from UI level, they might also use something at lower levels, like private APIs to interact with certain front-end functionalities.

  • People with no other choice

    Sometimes accessing from UI might be the only way to achieve a task that must be done. If Google doesn't provide API for a particular product and it's so important that must be done, then just do it from UI level. The project might be extremely difficult to implement and can be super fragile to maintain, but it should still be achievable. However, from my experience, I can't think of any valid use cases for this.

What should you automate then?

Your own (or your company's) applications

Don't procrastinate. Instead of spending time poking around and looking for something easy enough, it is probably better to start from what you are supposed to do right away. While coding up the project gradually, many questions might be asked by yourself:

  • How can I find elements with dynamic ID?
  • How can I find elements with absolutely nothing identifiable?
  • How can I avoid unnecessary waiting?
  • How can I create more generic expected conditions?
  • How can I DRY up my code?
  • How can I make my classes more extensible?
  • How can I abstract my pages?
  • How can I …

As time goes by, you will be more familiar with Selenium API and the quality of the project will be improving automatically, as long as you are keen on getting the answers to those questions and willing to ask more.

Additionally, creating own demos could also be a good alternative for learning Selenium. For example, "The Internet" is a demo application written for a similar purpose, which provides lots of examples for common web functionalities, like hovering, frames, JavaScript alerts, etc.

Demos of JavaScript UI frameworks

There are common JavaScript frameworks for building web user interface, like Ext JS, Dojo Toolkit and qooxdoo. They provide heaps of interesting demos on their websites which can be used for learning Selenium WebDriver. More importantly, you can download and create your own playground to have a better understanding of how DOM is generated and structured. To start with, even writing some code to automate their homepage would be fun, like navigating, searching, etc.

Further reading

[1]: "What is Selenium?" section.

[2]: Comment about Gmail here made by Ross Patterson.

[3]: Products — Google Developers

[4]: Current version of Google Terms of Service.

[5]: Relationships between different versions of Selenium#WebDriver.