InPublishing: Testing: Dos and don’ts

I remember learning about the scientific method in school, running controlled science experiments where we carefully articulated the importance of using a control to test our hypothesis. Measuring significance, back-testing unexpected results, measuring uplift: when I started working in publishing, I didn’t realise that I would be harnessing the methods learned in colouring flowers or making ice melt faster to my role in subscriptions marketing.

Working in magazine subscriptions in 2019 is an incredibly interesting but complex field. Gone are the days when circulation growth could be achieved solely from large direct mailing campaigns. Today, marketers are looking for subscription growth wherever their audiences are, and audiences are on social media, listening to podcasts, watching videos and speaking to Alexa. There is only so much time, budget and resource that we all have available to find the one thing that will make the biggest difference to building quality, lasting readership. And that is why I fly the flag for good, old-fashioned testing.

We know it costs more to get a subscriber than to keep a subscriber. But what difference would it make for your acquisitions if you increased the page conversion on your payment form by 15%, or found a 20% better open rate for your renewal emails? Sometimes we can find ourselves concentrating so much on finding new channels that we neglect the ones we already have.

Not convinced yet? Behemoths like Microsoft, Amazon, and Google each conduct more than 10,000 controlled experiments every year. And Obama’s team ran a CTA button test (“sign up” vs “learn more”*) that contributed to an extra $60 million in donations.

So, in that light, I want to take us all back to our science lessons and get back to testing basics, and share with you my top tips:

Any test is only as good as your ability to measure its effect!

How to run a great A/B test

First, a good test always compares a variation against a current experience. It is really important, however, to make sure you know what you’re testing: a good, focused question will help you best measure the impact of a change you make. And to that end, I recommend testing one thing at a time. This is slower – and in some ways – frustrating, but it also allows you to learn what does (and doesn’t) make a difference in your channel, learning as you go to understand the impact of different elements.
Goals. An important aspect of any test is to establish clear goals. It sounds incredibly obvious, but think how many times you would do a test because it’s easy (change the button colour, anyone?) instead of investing the effort into devising a test that will have a bigger impact (for example, testing a one-page checkout). And remember, any test is only as good as your ability to measure its effect! Wanting to improve brand affinity is a perfectly reasonable goal, but if you run a test without the ability to know if it worked (exit surveys? heatmaps?) then you’ve wasted an awful lot of that effort.
Duration. People are always asking me how long to run a test for, and I never want to be one of those people who just responds with an unhelpful, “it depends”. But honestly… it depends. It depends on how many variations you’re testing. It depends on how big a change you’re testing (small tweaks vs huge changes). It depends on loads of other things besides: is there a non-expected traffic source (did your site get picked up by a large news source and are you getting lots of new visitors out of the blue?) or are you testing something during a slow time of year, or over Christmas, or when you don’t have new content on your website etc? So, you see, it depends! But there are some easy free tools online (just search for ‘test duration calculator’) to give you a clue. And you need to make duration rules that suit your organisation and culture. At the London Review of Books, for example, we have a new issue every two weeks, and so our web traffic changes considerably over a fortnight. To that end, no test can be run over less than two weeks to make sure that we are accounting for these large variations.
Which One Won? A simple uplift in conversion might seem like a winner, but it’s a bit more complex than that! There is an equation for statistical significance using p-value (It’s complicated maths and I don’t get it either; don’t worry – there are calculators for it online which do the hard stuff for you). This equation basically asks, “if I ran this test 100 times, how many times would I get the same response?” So, say, for instance, you ran an A/B test to 8,000 visitors each and the control got 160 conversions (a 2% conversion rate) and the test got 178 conversions (a 2.23% conversion rate) which is an 11.5% difference. On paper, version B clearly looks like the winner, but it’s only 55% statistically significant. If you flipped a coin 100 times you could easily land on heads 55 times – would that be enough for you to make a lasting decision when really it could be that much down to chance? Prevailing testing wisdom looks for at least 90% significance to determine a winner worthy of adapting the tested element. If you flipped a coin 100 times and got heads 90, you would be pretty sure something more significant than chance was at play!
Software. You don’t need to invest in expensive testing software to run good tests. Google Optimise is free. Most ESPs have A/B testing functionality in-built. AdWords and Facebook allow for split testing right in their systems. You can even put two different response URLs in page ads. Don’t let the lack of expensive equipment hold you back.

Sometimes we can find ourselves concentrating so much on finding new channels that we neglect the ones we already have.

Why people stop testing

The truth is, getting that magic 90% statistical significance sometimes feels exactly that way: like magic! Our team’s Marketing Testing channel in Slack has more facepalm / depressed / agony emoticons than any other. Button tests fail to get any significant readings (tell us your secret, Obama!). Image tests repeatedly produce inconclusive results. Copy tests bring uplift for three days then a decrease the next. It’s enough to drive anyone crazy! Word on the street is that only about 10% of A/B tests produce results, and that comes from everyone from Netflix to Google. So, it’s no wonder that you put all of this time and resource into setting up a wonderful test with strong hypothesis, testing methodology and enthusiastic testers, only to fail to find anything out nine times out of ten.

The important thing to remember is that a losing test is really a winning test! I have tested the button on my checkout page about three times and you know what, not one of them has had a significant result. Brilliant, I say! No more testing that button – I know it doesn’t make a difference. Testing different covers on my checkout page has made no discernible difference. Well, great – I can use whatever cover matches my campaign with no adverse consequences! But wait a minute, my little circle saying “pay just £1 an issue” vs “save 75%” did have a statically significant result. That’s one to keep iterating. So, you see, all of my losing tests just help me figure out which areas are worth iterating, and which aren’t.

Over five years, tests on my checkout page have led to twice as many annual conversions and triple the annual revenue. Testing might be painful sometimes, but it’s worth it.

A simple uplift in conversion might seem like a winner, but it’s a bit more complex than that!

Recording and learning from tests

An element of testing that is sometimes overlooked is how much more widely applicable the results from tests can be. Do people respond to some copy better in the US than the UK on your PPC ads? You can test that in your email copy, too. Or maybe you learn that your audience responds better to sans serif fonts, or longer-form copy. Eventually a trend might emerge that can shape your business and prove useful everywhere from offline marketing to video, and can have long-standing positive contributions for your business.

A couple of final tips: first, I urge you to be open to testing ideas. Absolutely get ideas from colleagues, but make sure you are open to any department making testing suggestions. See what your competitors do, and sign up to newsletters with real-life case studies for when inspiration fails you. Finally, make it fun! Run bets on outcomes or have prizes for the best test idea, wherever, or whomever, it comes from. Everyone should be invested in the outcome of your testing.

* Oh, and so you know: it turns out Obama donators wanted to “learn more”, not “sign up”!

The important thing to remember is that a losing test is really a winning test!

Some quick pointers

Test one thing at a time so you learn what gave the desired impact.
Re-test anything that seems completely unexpected (or even old winning tests from a long time ago – things change).
When you start off, try an A/A test. It will give you a baseline conversion rate for your testing, and it gives you a measure of whether your testing tech and site are all functioning properly. (If you have enough traffic, run an A/A/B test so you don’t waste any time.)
Consider multivariant testing if you want to test several things at once to determine how elements’ relationship affects conversions. (It can save time and be really informative, but only do this if you have a lot of traffic and only if the different elements all contribute to conversion, otherwise you would be better off with an A/B test).

How to run a great A/B test

Why people stop testing

Recording and learning from tests

Some quick pointers

dsb.net

Related articles

Receive InPublishing magazine