InPublishing: ACAP: the future of online publishing

When you buy a book, how do you know what you’re allowed to do with it? You can read it, obviously. What else? Turn to the small print and you might see some stuff about "may not be stored in a retrieval system" or "all rights reserved". Not that anyone pays much attention, and the occasional bit of photocopying is commonplace. As a humble reader, the rules are irrelevant – you read the book and move on.

Websites are worse. They don’t just have a paragraph of small print which nobody reads, they have ten thousand words of legal mumbo-jumbo in terms and conditions which nobody reads. People have an idea in their heads about what they can or can’t do, and that’s good enough for most readers (although it can be a bit scary for publishers).

Now imagine you’re the most prolific reader in the world, a search engine robot (or "spider"). These are the software programs which create search engines like Google and MSN Search. They go out onto the web, following links from one page to another and copying everything they find for processing so that the pages come up when someone types a relevant search. They operate on a massive scale, dealing with millions of pages at a time and completely automated. If those uber-wordy terms and conditions have anything in them about what search engines are allowed to do, how is the robot supposed to know – especially if the average human would need the help of a lawyer to work it out?

As we have seen many times before, the old ways of doing things just aren’t working in the internet world. It’s a real dilemma for publishers. As soon as you post something on the internet, a bunch of robots are quite likely to swing by, copy it, index it and then… who knows? Where are the rules? They haven’t read your terms and conditions and the only way of telling them what your preferences are (a thing called the "Robots Exclusion Protocol" or "robots.txt") is unsophisticated and inconsistently interpreted by different robots.

What are publishers to do?

The issue of how to communicate permissions online has been bothering publishers for a while, some so much so that they have simply avoided publishing their content online at all rather than risk losing control of it. So what should they do? There are plenty of options. Sit on the sidelines and moan about it? Simply not publish online? (not really a realistic long term option). Lobby the politicians to change the law? Just put up with things the way they are, regardless of the effect on their business? Lock up their content using digital rights management (DRM)? Sue the search engines?

Different publishers have adopted all of these approaches and the energy and money devoted to addressing the issue is a sure sign that it matters and isn’t going to go away. But despite the lawsuits, the lobbying, the DRM and all the rest, we’re still not much closer to a solution. None of the approaches are ideal, and none are even close to being universally appropriate to all the differing needs of publishers.

What’s really needed is a way of tailoring the solution to suit the specific interests of the publisher. Just like in the "real" world, the first step in getting people to obey rules is to tell them what the rules are.

Birth of ACAP

A group of publishers have realised this, and also acknowledged that if they are facing a problem of such magnitude they need to be ready to do something about finding the solution. And so they are. It’s called the Automated Content Access Protocol, or ACAP.

At the beginning of 2006, the major Europe-based publishing trade associations came together to discuss the issues and dilemmas thrown up for publishers by the search engines and the difficulties of automating rights-based relationships online. They established a Working Party to look at ways in which mutually beneficial and well balanced relationships can be established between publishers and users, particularly search engine operators.

This led to a major pilot project which is now well underway and aims to address issues like the ones outlined above and many others where creating machine readable permissions is important. It also aims to put an end to future legal conflict between search engines and content providers such as publishers and broadcasters and open up content to everyone by supporting a wide range of potential business models.

The ACAP project is led by the World Association of Newspapers (WAN), the European Publishers Council (EPC) and the International Publishers Association (IPA), and involves the development of a standard by which publishers, broadcasters and other owners of content published on the web can provide permissions information (relating to access and use of their content) in a form that can be recognised and interpreted by automated systems like search engine robots, so that the search engine operator can systematically comply with the permissions granted by the owner.

Our Search Engine Dilemma

If there seems to be a strong focus on search engines in this, it’s no coincidence. They are currently the focus of much attention for publishers. All sectors of publishing have a "search engine dilemma". The search engines are an unavoidable and valued port of call for anyone seeking an audience on the internet. Search engines sit between internet users and the content they are seeking out and have found brilliantly simple and effective ways to make money from the audience that they attract. As a result, they have become so dominant that even the largest website owners are not large enough to have any serious impact on their commercial fortunes.

However, it’s obviously not a simple issue - the benefits of powerful search technology to both users and providers of content are well recognised by publishers. Few if any publishers want to or could afford to simply block them. At the same time, publishers are aware that search engines are, in following their business logic, inevitably and gradually moving into a publisher-like role, initially merely pointing, then caching and, finally, aggregating and "publishing" and perhaps even creating content themselves, while using other peoples’ content as the basis for many of their services.

With the current technology, there can be none of the differentiation of terms of access and use that characterises content-trading relationships in publishing environments, whether electronic or physical. The search engines can and do reasonably argue that, since their systems are completely automated, and they cannot possibly enter into and manage individual and different agreements with every website they encounter, there is no practical alternative to their current modus operandi.

The search engines are able to – are forced to, in fact - make their own rules and decide for themselves whose interests are worth considering. This means that publishers are reluctant to make available for search much content that might otherwise be available to users.

Lots in common

Publishers and search engines are natural bedfellows. There are countless ways in which we can enhance each other’s businesses. There is a strong mutual opportunity which is under-exploited by both sides at the moment. If publishers and search engines are to be successful in establishing orderly business relationships, they need to collaborate both to fill the technical gap and ensure its political implementation. Since search engine operators rely on robot "spiders" to manage their automated processes, publishers’ websites need to start speaking a language that the operators can teach their spiders to understand. What is required is a standardised way of describing the permissions which apply to a website or webpage so that it can be decoded by a dumb machine without the help of an expensive lawyer.

There is a widespread misapprehension that a mechanism already exists for just this purpose: robots.txt. Search engines, when challenged on the digital rights and permissions issue, say that robots.txt already provides content providers with everything they need to grant or refuse access. Robot.txt may be widely implemented but it does not do the job publishers need. It was devised at a time when the web was young, and has been left behind by the ever-increasing sophistication of online publishing and search. While robots.txt will almost certainly continue to play an important role, it allows only a simple choice between allowing and disallowing access. ACAP will provide a standard mechanism for expressing conditional access which is what is now required.

Guiding Principles

The objective is for ACAP to become a widely implemented standard, ultimately embedded into website and content creation software. In order for it to achieve this, ACAP has some guiding principles. The standard created needs to be:

* enabling not obstructive: facilitating normal business relationships, not interfering with them, while providing content owners with appropriate mechanisms for managing permissions for access and use of their content.
* flexible and extensible: the technical approach should not impose limitations on individual business relationships which might be agreed between content owners and search engine operators; and it should be compatible with different search technologies, so that it does not become rapidly obsolete.
* granular: able to manage permissions associated with arbitrary levels of content from a single digital object to a complete website, to many websites managed by the same content owner.
* universally applicable: the technical approach should initially be suitable for implementation by all text-based content industries, and so far as possible should be extensible to (or at the very least interoperable with) solutions adopted in other media.
* able to manage both generic and specific: able to express default terms which a content owner might choose to apply to any search engine operator and equally able to express the terms of a specific licence between an individual search engine operator and an individual content owner.
* as fully automated as possible: requiring human intervention only where this is essential to make decisions which cannot be made by machines.
* efficient: inexpensive to implement, by enabling seamless integration with electronic production processes and simple maintenance tools.
* open standards based: a pro-competitive development open to all, with the lowest possible barriers to entry for both content owners and search engine operators.
* based on existing technologies and existing infrastructure: wherever suitable solutions exist, we should adopt and (where necessary) extend them – not reinvent the wheel.

The first technical working group, comprising publishers, search engines and other technical partners, met in March to move the project forward.

The publisher participants in the project represent a broad cross-section of the global publishing business, including newspaper publishers, a major news agency, magazine, book and journal publishers, all with significant online business activities. Three of the major search engine operators are working collaboratively with the project, as is the British Library.

The project has also attracted interest and support from more than a dozen other companies and organisations, many of which will be contributing additional use cases to the project, thereby helping to ensure that the technical framework is developed to meet the requirements of as broad as possible a range of use cases.

The main work of the project so far has been the preparation of use cases by each of the publisher participants and by the British Library, all of which were discussed at a Technical Workshop in London in March 2007. As a result of the Workshop, ACAP has identified a number of key issues that need to be resolved. The ACAP team is now working to address these issues; none appear insoluble.

The relationship between content providers and search engines is vital. ACAP is not about good and evil. It’s about providing the right solution at the right time. It’s about providing a more expressive, automated and enabling technical solution for the demands of digital content publishing in the 21st Century. Significant progress has already been made and ACAP’s objectives appear to be in sight.

Related articles

Receive InPublishing magazine