Posted February 17, 2015

Ad Testing Made Easy (well, easier!)

At first glance, testing ad copy or landing pages in paid search ads appears to be very simple: just create a new ad, right?  See which one drives more clicks and/or conversions, duh.  If the test loses, pause it, and if it wins, pause the other one.

2Vector-PetriDishAfter spending millions of dollars testing ads across Google and Bing, I can assure you that it is NOT that simple.  However, it can be a straightforward and repeatable process.  First, I’ll outline a few administrative considerations so that the search engines and small tests do not work against you.  I will then outline a general test framework, and finally provide guidance on how to interpret results.

Administrative First Steps

What you should consider…

  • Set your ads to rotate indefinitely
  • Disregard small tests


The first consideration is a no-brainer; if Google is set to maximize clicks, your test will be tainted by subjective ad serving.  For a clean test, baseline metric share should be 50/50 (more on this later).  It’s also important that tests are large enough to return relevant results, rapidly.  With this in mind, order your testing calendar by clicks, conversion metrics, or your own KPIs.  The early learnings you get from bigger adgroup tests will help you avoid meaningless tests on smaller adgroups.

Test Framework

What you should consider….

  • Duplicate your control ad
  • Duplicate your control ad
  • Determine a threshold for win/loss
  • Follow your heart


A and B are duplicated for effect, but the point cannot be overstated.  If you want a clean test, you need to standardize all aspects of the test subjects (control and test).  Time active is a nontrivial component in determining how/when an ad gets served by the SE (something you can’t influence externally beyond the rotate setting), so duplicating the control ad ensures that the both the test and control are “fresh” to start the test.

Win/loss thresholds should consider 2 metrics: test balance and business 2Vector-BunsonBurnerimpact.  Test balance refers to the baseline metric share between the test and control, and answers the question “Was each ad given an opportunity to succeed?”  Generally this implies a 50/50 impression share, but this could mean click-throughs if actual visits are easier to evaluate (think display + landing page test).  Business impact is a measure of test maturity, and can be measured through spend, clicks, conversions, or other relevant KPIs.  Brand-related adgroups will generally spend 90% less and receive 10x more clicks than non-brand adgroups, so a good business impact metric will consider individual adgroup performance and performance differences between adgroups.

There will be tests where you the metrics don’t point to an obvious winner.  On the other side, it’s also easy to get caught up in personal subjective preference (everyone wants their tests to win).  Ultimately, your heart should be in your business and as such, should guide you to the best business outcome.

Interpreting Results

What you should consider…

  • How good are your hypotheses?


What does it mean if your hypothesis wins?  How do you know if your hypothesis won at all?  This comes down to solid hypothesis development.  The better you are able to apply the scientific method to your hypothesis (reducing tests to one variable), the easier it will be to see how your test won, and how you can then apply that learning in other adgroups.  Yes, the most important part of interpreting results is determined before the test even starts!

Vector-TestTubesOn the other side, don’t be afraid of 100% greenfield tests where EVERYTHING changes.  Sometimes you need to shake things up, and big wins come from big risks.  You offer a new product and want to build awareness across all your adgroups, or find a way to make your ads speak less like a salesperson and more like your ideal customer.  Localized tests like these in high-impact adgroups can enable you to make night-and-day changes in a controlled fashion, and make for great hypotheses.  However, systematic, incremental testing is much more sustainable in controlling test loss and identifying test lift.

Ad testing is fun!  These steps will help you manage uncertainty and design/run clean tests with actionable results.