Mobile navigation

FEATURE 

How tagging makes the digital world go round

Media businesses are brilliant at producing content, with deep expertise in their subject matter. But, writes Tom Jackson, they haven’t always been as brilliant at categorising that content to make it as discoverable as possible on digital platforms.

By Tom Jackson

How tagging makes the digital world go round
‘Harry Kane’ is one of 7,000 different terms to have been used to tag Sun and Times articles so far.

Optimising the categorisation workflow has been a fundamental focus for News UK across the last twelve months. Home to news brands including The Times, The Sunday Times and The Sun, as well as a portfolio of broadcast brands including talkSPORT, Virgin Radio UK, Times Radio and TalkTV, News UK reaches almost 40 million people. Most on digital platforms.

So, ensuring the infrastructure is in place to tag digital news articles with consistent and comprehensive metadata – including video and audio – is fundamental to News UK’s audience and revenue growth strategy.

Metadata tagging is not often the first thing that puts a spring in the step of journalists and broadcasters. It is seen as a dry and dusty area, the province of taxonomists and librarians. In the increasingly important world of data though, it is often these under loved areas that lubricate the systems needed to power new capabilities. Everyone should care about tagging. Getting this right is seen as a technology challenge. In fact, as with so many projects, the human element – getting people to care enough – is the hard bit.

The benefits of tagging

In news media, publishers want to package up content in the most enticing way possible for their audiences. Newsrooms want their staff to focus on storytelling though, not having to manually place every piece of content on the page. Data-led personalisation and auto-curation are therefore on many roadmaps. The systems and algorithms used for this rely on consistent and accurate topic metadata and this is where tagging comes in.

The audience needs to find the content, so SEO is critical. Content providers need to prove to the search engines that they are an authority on certain topics. It is no good if their product consistently has theatre reviews ending up in the fashion section, or health news on the sports pages. The same issue will be a problem internally too. Teams need to be able to rely on analytics reporting and they will only get a true picture of how their coverage of a recent election is going down, for example, if the analysis only includes articles about that election. Again, this is where tagging comes in.

The past and its problems

Getting benefit from the above has been a key driver of digital growth at News UK in recent times. Given patchy and sometimes non-existent tagging, it has been harder than it should have been to do so though. Data engineers and data scientists have had to hack their way around dirty data and, in some cases, have just had to give up.

Inconsistent tagging has led to visible problems in News UK’s products too. In the past, given variations in the tags applied on The Sun, you could end up with multiple pages on the same topic. At one point, for example, there were two topic pages on the O2 entertainment venue, one whose title began with a capital O and the other with a zero. On The Times, keyword tags were assigned post-publication using Google’s natural language AI. These tags were not routinely checked by newsroom staff however. On the ‘Strikes’ topic page, correctly placed articles on industrial action sat alongside articles on different kinds of strikes – possible North Korean or Russian nuclear strikes, for example.

These problems no longer occur today as the majority of News UK content is now being tagged according to a single topic taxonomy. AI models are still used to suggest tags but journalists check and correct the suggestions, ensuring consistency and accuracy.

Future use cases

When the tagging initiative was conceived, benefits to News UK were assumed to be mainly in the three areas mentioned above: SEO, analytics and personalisation. Since the initiation of the tagging project, there have been many times where News UK has been glad it now has clean tagging to support new use cases. Here are some recent examples.

The Times’ digital products are still organised in the same way as the print newspaper. A project is under way to create new navigation, a new home page and new topic-specific channel pages that will make more sense to an increasingly digital audience. When the project is complete, there will be sixty channel pages as opposed to the eleven current section pages. It will be too onerous to curate all of those pages manually all of the time. Auto-curation is no longer a nice-to-have, it is an imperative for this new project and good tagging is the key that unlocks this.

To target advertising, News UK uses a third-party service that auto-categorises articles. If an article is deemed to be about beach holidays, adverts for tour operators are more likely to be served. The service used is fully automated; no humans are in the loop, so it misclassifies content sometimes. This can mean adverts are targeted ineffectively or that valuable inventory goes unsold. News UK is currently looking at whether using its own proprietary tagging to target advertising could give higher revenue yields given the improved accuracy.

As is the case with most news media businesses, News UK is looking to extend tagging beyond text-based articles. The Times produces a highly regarded range of podcasts but there is more that could be done to weave them in alongside related articles. Audio can be converted to text which can then be run through the AI models to be tagged in the same way as articles and The Times is looking at how best to make use of this.

The tagging system

The tagging system that allows News UK’s journalists and broadcasters to apply taxonomy terms to their content has three main elements: a single topic taxonomy stored and managed in a third-party system; third-party AI models to auto-suggest terms based on the content; and a News UK developed back-end and APIs for integrating with the content management systems used.

The Sun’s Cost of Living topic page relies on accurate tagging. Photograph: News UK.

The taxonomy News UK uses is proprietary; it is not all home-grown however. Around 7,000 different terms have been used to tag Sun and Times articles so far. Where possible, wheels have not been reinvented and the taxonomy takes in terms from a combination of sources. Some are from specialist third-parties, players and teams from a sports data provider, for example. Others, such as celebrities and politicians, are from large open source providers like Wikidata. The remaining gaps are filled with News UK’s own terms, specific to the different titles’ output.

When journalists cannot find a term in the taxonomy, they can suggest a new one. For the tagging to be valuable, tight control needs to be retained, particularly when it comes to adding new terms. News UK uses a third-party tool to manage its taxonomy. Other companies may have a specific person using that tool and playing the role of librarian. At News UK, a small group of people in the respective Audience and SEO teams play that role jointly and are the only ones who can approve taxonomy additions.

To help minimise the effort for journalists using the tagging system, AI models are run across the content and topics are auto-suggested. The journalists check these, rejecting any that are wrong and adding any that are missing. Things may change given the power of generative AI but, for now, humans are still needed in the loop to ensure the accuracy of the tags assigned.

To reduce the checking overhead, the News UK system focuses on tagging for so-called ‘aboutness’. Thresholds are set in the AI models meaning that the terms returned are those that the article is principally about – on average three to four terms per article. Passing mentions of people or places are ignored as, so far, there have been no use cases for capturing these and they just add noise for those who are doing the checking.

The devil is in the detail when it comes to ensuring the accuracy of the terms assigned. How should we deal with words with two meanings (homonyms), for example? If cancer is suggested as a term for an article, is it the disease or the sign of the zodiac? If Amazon is suggested, is it the business or the rainforest? News UK’s system helps those doing the tagging by colour-coding homonyms. When the journalist hovers over one of these terms, the different options are provided to help make sure they have chosen the right one.

Conclusion

News UK’s data strategy director, Will Sach, recently said, “Data is not the new oil; it’s the new renewable energy source.” News UK is using data to power many of its new capabilities and good tagging of content underpins much of that new energy source. Success has been achieved here partly through implementing a great technology solution. More importantly though, the project has been successful on the human side, changing the behaviour in busy newsrooms. The benefits are increasingly visible across News UK and all are clear that to drive digital growth in the business, tagging really matters.


This article was first published in InPublishing magazine. If you would like to be added to the free mailing list to receive the magazine, please register here.