Publicerades 28 september 2016

E2E tests – a real world experience

E2E tests - a real world experience

Today, I want to share my experience incorporating E2E tests (End to End tests) as part of the daily work in a fairly large project. I will try to reason about my perception of E2E before the project started, the initial challenges we were facing, the ups and downs we did face during the course of the project and finally my perception of E2E post project.

I want to begin with clarifying that I – and the team members – had zero hands on experience with E2Es prior to the project and there are probably many other approaches to E2E than the one we took. If you read this and think that we were doing it in a bad/wrong way I would be more than happy to recieve your input on the matter!

Thoughts prior to the project

As I’ve written, I had zero hands on experience of E2E prior to the project, but I did have my perception of it. E2E is a high level form of the standard integration test. An integration test would typically test how bits and pieces of the system works together, i.e. how they integrate with eachother. The tests might call remote services but might as well just work locally. E2E have a very similair concept, but operates at a higher level – it tests how the system works, from one end, as the form of a user interaction, to the middle (might be a database) and then back to the other end which, for example, can be visual feedback to the user.

E2Es would not be a substitute to Unit tests or other Integration tests, it would test the normal flow a user would take to finish a task and would therefore not test edge cases and a variation of input parameters and similair testing patterns. It would serve as a good base of documentation that is readable for non-technical persons, i.e. the test cases (scenarios) should be written in as plain English as possible. In addition, it seemed to be helpful by giving quick indications that a defect had been introduced somewhere in a feature, defects that would not be found by running a unit test suite. All in all, I went in with big hopes and happy to get to work with E2Es hands on.

Project setup

We were around 10 developers in this team that developed an AngularJs website. It was a pure client side only project that communicated with a platform that handled business rules, storage of data and more. This was our setup, that matters for the E2E, in the form of a list.

  • AngularJs
    Front-end framework to aid you in creating dynamic web sites. Often used in ”SPAs”.
  • Gherkin
    It is a Business Readable, Domain Specific Language that lets you describe software’s behaviour without detailing how that behaviour is implemented.
  • Cucumber
    A software that executes scenarios/specifications and then generates reports that tell whether the system behaves according to the scenario/specification
  • Protractor
    An end-to-end test framework for AngularJS applications

Challenges

All of these, for us, new frameworks and tools aside we had one big challenge – data. We did not control any of the data since we were a thin client application talking to a business platform. This platform did not provide any means of exposing test data for us for reasons outside of the scope of this blog. This left us with a single approach – mock out the platform. This ofcourse narrowed the boundary of the scope of our E2Es, i.e. our E2Es were a user interaction in one of the ends and a mocked out platform in the middle and then visual feedback (in most of the cases) on the other end. Sure, this is not a pure E2E but it would have to do.

To tackle this challenge, we used Angulars ngMock service, $httpBackend, to intercept calls to the platform and return controlled and business-valid data. The data was provided by using specification by example, i.e. it was provided, in a readable way, in our Given (Gherkin) statements of each test, the data was then matched with a specific platform URL or a URL RegEx and stored in the browsers session storage. When the E2E test were run and a matching URL was found, we read from the session storage and returned the corresponding data from the $httpBackend service. We built a mini-framework which included so called ”MockRegistrators” for each logical group of contracts. After that foundation was laid, this approach proved to be an easy and reliable way to provide and control test data.

E2E in the daily development cycle

Our client was clear in this aspect – they wanted E2Es to be able to rely on the system. There are many reasons behind this that is outside of the scope of this blog, but essentially, the implementation of E2Es were sanctioned/required from the top. Keeping this in mind we set out implementing E2Es for each new feature implemented, it was a part of our Definition Of Done.

In the beginning it was not a big deal, when our E2E framework was set and we started writing E2Es more frequently it went really smooth and we all liked it. I’d estimate that E2Es allocated around 20% of the total time spent on a feature – time well spent. Then we started with more complicated features and it quickly became evident that the time allocated for E2Es increased due to the complexity of setting up mocked data, writing good business readable test scenarios and the increased amount of code that needed to be written. I’d estimated that we allocated around 30% of a features total time to write E2Es.

As the features grew in numbers we started to see our E2Es failing more frequently when we did changes to the code. It would fail due to a change of a static text in the UI or that some attribute, that we performed DOM selection on, changed. We started to add to the routine to run E2Es as we would run Unit tests, i.e. continuously or at least before checking in the code. The feedback loop on these tests were very long compared to unit tests, it could take minutes to boot up the web driver (Selenium) and run all the tests. We could feel the overall feel-good of the E2Es to decline. At this point we decided that E2Es would NOT fail a CI build, a decision we should have made from the get go. For me, it is now evident that a E2E test suite should never fail a CI build.

More and more features were added and the feedback loop eventually grew way too long – we divided up the E2Es into logical slices. These slices grouped features together so that we, for example, could run all tests that affects a certain area of the website and so on. In cucumber, this grouping is called ”tags”. This reduced the feedback loop dramatically. We started adding TeamCity build configurations for these tags as well, giving us a more granular level of CI reporting. The problem was though – these builds were failing a lot – defeating their purpose.

Random failures

If you read this so far you have noticed that we had issues with failing tests. Some were of course real bugs but there was numerous random failing tests, which makes them non-deterministic and non-deterministic tests are evil and totally useless. Here are a few of the random failures that we have yet to find a solution to:

  • WebDriverError: Error communicating with the remote browser. It may have died.
  • Error: Step timed out after 45000 milliseconds
  • Error: Error while waiting for Protractor to sync with the page

In addition to these we had plenty of timing issues which resulted in that certain HTML elements were unclickable when they were in the middle of a Bootstrap dialogue animation or that the resolution used on a specific TeamCity agent were too low which caused some content were hidden. These timing issues are random in nature and makes the tests non-deterministic. In the end, we did not have a single green suite of E2Es for months.

At the point of worthlessness

As I’ve already written, the number of features and scenarios kept growing and the random fails were frequent. This in addition to fragile tests forced us to make an effort to try find the root cause of the random failures and making our suite more rigid. We did not overcome the randomness in our own timings nor Protractor/WebDriver errors. Although we did stabilize our own tests, too many of them were still failing randomly and our suite never saw the sweet green colour that we so much strived for. At this point the team

  • Did not trust the tests
  • Did not care if a test went red
  • Did not run E2Es before check in, except for the newly written ones in the feature

Which essentially turned the tests worthless, they had defeated their own purpose. That is a shame, especially considering that it took, as a rough estimate, 1/3 of the total development process.

A time for reflection

For something that started out as hopeful and valuable, both for business and for the team, to degenerate into something close to a necessary evil makes me sad. Let me put the key points up in a list and see why this happened:

  • The tests were non-deterministic
  • The non-deterministic nature of the suite made us loose trust of them…
  • …and did not care if they turned red or green
  • The tests were expensive
  • They were difficult to debug. Console logs were frequently used.
  • There were long feedback loops
  • It was difficult to write rigid tests. Many times one has to rely on DOM selection (i.e. when clicking on buttons or inspecting visual feedback) and when the DOM changed a test could fail.
  • It was difficult to isolate tests, allthough it is a E2E one would strive for isolated test cases that does not affect each other more than necessary.
  • They were difficult to maintain. I’d expect a person with no experience with the frameworks used to struggle, more than what is normal when being new, with the test code.

All these bullets probably have their own solution, but we struggled to find them. Maybe our initial approach was incorrect but at the moment of writing I don’t know what we should have done instead. Feedback/ideas are always appreciated. I also think, if we would find a way to make the test not fail randomly, or at least make them semi-rigid, that we could still see the benefit and live with the other drawbacks. I do believe that E2Es has value – both in terms of business value, but also when it comes to understanding a feature to be implemented.

As a final word of advice, if you run E2Es as part of your build chain, do not fail a build in case of a failing E2E. Use unit tests for that kind of guard since they are much more deterministic in nature. After all, you do not want a build version to fail just because the WebDriver fails, do you?

 

Today, I want to share my experience incorporating E2E tests (End to End tests) as part of the daily work in a fairly large project. I will try to reason about my perception of E2E before the project started, the initial challenges we were facing, the ups and downs we did face during the course of the project and finally my perception of E2E post project.

I want to begin with clarifying that I – and the team members – had zero hands on experience with E2Es prior to the project and there are probably many other approaches to E2E than the one we took. If you read this and think that we were doing it in a bad/wrong way I would be more than happy to recieve your input on the matter!

Thoughts prior to the project

As I’ve written, I had zero hands on experience of E2E prior to the project, but I did have my perception of it. E2E is a high level form of the standard integration test. An integration test would typically test how bits and pieces of the system works together, i.e. how they integrate with eachother. The tests might call remote services but might as well just work locally. E2E have a very similair concept, but operates at a higher level – it tests how the system works, from one end, as the form of a user interaction, to the middle (might be a database) and then back to the other end which, for example, can be visual feedback to the user.

E2Es would not be a substitute to Unit tests or other Integration tests, it would test the normal flow a user would take to finish a task and would therefore not test edge cases and a variation of input parameters and similair testing patterns. It would serve as a good base of documentation that is readable for non-technical persons, i.e. the test cases (scenarios) should be written in as plain English as possible. In addition, it seemed to be helpful by giving quick indications that a defect had been introduced somewhere in a feature, defects that would not be found by running a unit test suite. All in all, I went in with big hopes and happy to get to work with E2Es hands on.

Project setup

We were around 10 developers in this team that developed an AngularJs website. It was a pure client side only project that communicated with a platform that handled business rules, storage of data and more. This was our setup, that matters for the E2E, in the form of a list.

  • AngularJs
    Front-end framework to aid you in creating dynamic web sites. Often used in “SPAs”.
  • Gherkin
    It is a Business Readable, Domain Specific Language that lets you describe software’s behaviour without detailing how that behaviour is implemented.
  • Cucumber
    A software that executes scenarios/specifications and then generates reports that tell whether the system behaves according to the scenario/specification
  • Protractor
    An end-to-end test framework for AngularJS applications

Challenges

All of these, for us, new frameworks and tools aside we had one big challenge – data. We did not control any of the data since we were a thin client application talking to a business platform. This platform did not provide any means of exposing test data for us for reasons outside of the scope of this blog. This left us with a single approach – mock out the platform. This ofcourse narrowed the boundary of the scope of our E2Es, i.e. our E2Es were a user interaction in one of the ends and a mocked out platform in the middle and then visual feedback (in most of the cases) on the other end. Sure, this is not a pure E2E but it would have to do.

To tackle this challenge, we used Angulars ngMock service, $httpBackend, to intercept calls to the platform and return controlled and business-valid data. The data was provided by using specification by example, i.e. it was provided, in a readable way, in our Given (Gherkin) statements of each test, the data was then matched with a specific platform URL or a URL RegEx and stored in the browsers session storage. When the E2E test were run and a matching URL was found, we read from the session storage and returned the corresponding data from the $httpBackend service. We built a mini-framework which included so called “MockRegistrators” for each logical group of contracts. After that foundation was laid, this approach proved to be an easy and reliable way to provide and control test data.

E2E in the daily development cycle

Our client was clear in this aspect – they wanted E2Es to be able to rely on the system. There are many reasons behind this that is outside of the scope of this blog, but essentially, the implementation of E2Es were sanctioned/required from the top. Keeping this in mind we set out implementing E2Es for each new feature implemented, it was a part of our Definition Of Done.

In the beginning it was not a big deal, when our E2E framework was set and we started writing E2Es more frequently it went really smooth and we all liked it. I’d estimate that E2Es allocated around 20% of the total time spent on a feature – time well spent. Then we started with more complicated features and it quickly became evident that the time allocated for E2Es increased due to the complexity of setting up mocked data, writing good business readable test scenarios and the increased amount of code that needed to be written. I’d estimated that we allocated around 30% of a features total time to write E2Es.

As the features grew in numbers we started to see our E2Es failing more frequently when we did changes to the code. It would fail due to a change of a static text in the UI or that some attribute, that we performed DOM selection on, changed. We started to add to the routine to run E2Es as we would run Unit tests, i.e. continuously or at least before checking in the code. The feedback loop on these tests were very long compared to unit tests, it could take minutes to boot up the web driver (Selenium) and run all the tests. We could feel the overall feel-good of the E2Es to decline. At this point we decided that E2Es would NOT fail a CI build, a decision we should have made from the get go. For me, it is now evident that a E2E test suite should never fail a CI build.

More and more features were added and the feedback loop eventually grew way too long – we divided up the E2Es into logical slices. These slices grouped features together so that we, for example, could run all tests that affects a certain area of the website and so on. In cucumber, this grouping is called “tags“. This reduced the feedback loop dramatically. We started adding TeamCity build configurations for these tags as well, giving us a more granular level of CI reporting. The problem was though – these builds were failing a lot – defeating their purpose.

Random failures

If you read this so far you have noticed that we had issues with failing tests. Some were of course real bugs but there was numerous random failing tests, which makes them non-deterministic and non-deterministic tests are evil and totally useless. Here are a few of the random failures that we have yet to find a solution to:

  • WebDriverError: Error communicating with the remote browser. It may have died.
  • Error: Step timed out after 45000 milliseconds
  • Error: Error while waiting for Protractor to sync with the page

In addition to these we had plenty of timing issues which resulted in that certain HTML elements were unclickable when they were in the middle of a Bootstrap dialogue animation or that the resolution used on a specific TeamCity agent were too low which caused some content were hidden. These timing issues are random in nature and makes the tests non-deterministic. In the end, we did not have a single green suite of E2Es for months.

At the point of worthlessness

As I’ve already written, the number of features and scenarios kept growing and the random fails were frequent. This in addition to fragile tests forced us to make an effort to try find the root cause of the random failures and making our suite more rigid. We did not overcome the randomness in our own timings nor Protractor/WebDriver errors. Although we did stabilize our own tests, too many of them were still failing randomly and our suite never saw the sweet green colour that we so much strived for. At this point the team

  • Did not trust the tests
  • Did not care if a test went red
  • Did not run E2Es before check in, except for the newly written ones in the feature

Which essentially turned the tests worthless, they had defeated their own purpose. That is a shame, especially considering that it took, as a rough estimate, 1/3 of the total development process.

A time for reflection

For something that started out as hopeful and valuable, both for business and for the team, to degenerate into something close to a necessary evil makes me sad. Let me put the key points up in a list and see why this happened:

  • The tests were non-deterministic
  • The non-deterministic nature of the suite made us loose trust of them…
  • …and did not care if they turned red or green
  • The tests were expensive
  • They were difficult to debug. Console logs were frequently used.
  • There were long feedback loops
  • It was difficult to write rigid tests. Many times one has to rely on DOM selection (i.e. when clicking on buttons or inspecting visual feedback) and when the DOM changed a test could fail.
  • It was difficult to isolate tests, allthough it is a E2E one would strive for isolated test cases that does not affect each other more than necessary.
  • They were difficult to maintain. I’d expect a person with no experience with the frameworks used to struggle, more than what is normal when being new, with the test code.

All these bullets probably have their own solution, but we struggled to find them. Maybe our initial approach was incorrect but at the moment of writing I don’t know what we should have done instead. Feedback/ideas are always appreciated. I also think, if we would find a way to make the test not fail randomly, or at least make them semi-rigid, that we could still see the benefit and live with the other drawbacks. I do believe that E2Es has value – both in terms of business value, but also when it comes to understanding a feature to be implemented.

As a final word of advice, if you run E2Es as part of your build chain, do not fail a build in case of a failing E2E. Use unit tests for that kind of guard since they are much more deterministic in nature. After all, you do not want a build version to fail just because the WebDriver fails, do you?