Mobile navigation

AI SPECIAL 

What we’ve learnt so far at Infopro Digital

The B2B publisher has used AI to launch a new data service for its flagship brand. Thomas Lake, director of product and technology, outlines what they’ve learnt from the experience.

By Thomas Lake

What we’ve learnt so far at Infopro Digital

Q: What have been your key learnings?

A: We have learnt that specific AI tools often outperform generic ones — don’t just rely on LLMs like GPT-4 to solve every use-case or technical challenge. A combination of specialised tools, each performing a specific task, often yields better results.

This way of thinking runs parallel to the recent proliferation of AI agents: tools designed for distinct tasks. These are often more effective than generalist ‘co-pilots’ that became popular at the start of 2024. Large companies like Salesforce and HubSpot are betting big on agents, and the ability to connect multiple agents together into a workflow to meet a variety of needs.

We also learnt that creating a minimum viable product (MVP) with generative AI using our editorial content is relatively straightforward. However, monitoring and refining the outputs is vastly more resource intensive. Understanding why an AI gives a particular response can be complex, and the fine-tuning of models requires continuous iteration.

A final lesson has been the value of remaining LLM-agnostic. Testing different models and being able to switch for specific purposes is vital, as the field evolves so rapidly, and no single model is best for every scenario.

Q: In which use-cases have you had the best results?

A: One of our most mature and successful uses of AI so far has been the enhancement of Counterparty Radar, a data service from our flagship brand, Risk.net.

Counterparty Radar aggregates regulatory filings from US mutual funds, exchange-traded funds (ETFs), US life insurers and European UCITS funds, to provide a unique snapshot of trades in the OTC derivatives market. It shows which firms are making the biggest trades, how they rank against peers, and crucially, which dealer counterparty they are trading with. Aggregating this data helps our subscribers understand the competitive landscape, benchmark their activity, and identify prospective clients.

Initially, the product aggregated feeds available from the US Securities and Exchange Commission (SEC), which were available in a structured XML format and straightforward to parse. However, to provide the insight our customers require, we had to provide equivalent data from European funds which were buried in non-standard, unstructured PDFs from multiple jurisdictions.

To accurately extract the correct information, we developed a multi-tool AI workflow in conjunction with an external partner. First, we use an object recognition model to accurately identify the tables within the PDFs. Once we have the tables, we use Optical Character Recognition (OCR) tools for extracting the text. With the data located and extracted into a structured format, an LLM cleans up the data to produce the final output. This process enables us to parse thousands of filings through the workflow and automatically build a unique dataset.

One of the main challenges we faced was table object recognition within the PDFs. The generic LLMs such as GPT-4 struggled with interpreting complex table structures accurately, such as distinguishing between two-column tables. By training a specialised object recognition model, we improved our table recognition accuracy by more than 30%.

Integrating OCR tools into our workflow, rather than relying on LLMs, further ensured consistency when converting data into structured formats like csv. The LLMs often introduced errors and often got confused where to introduce new rows.

Earlier this year, we successfully launched EU UCITS funds data into Counterparty Radar. This achievement significantly expanded our coverage, providing both US and EU fund data for our customers — a unique offering in the market. This expansion has already driven new sales, and Counterparty Radar can now call some of the world’s largest banks, customers.

Three best practice top tips

  1. Specific tools for specific jobs. It would have been great to use a single tool to locate, extract and format the data perfectly every time. AI is not there yet. Embrace a variety of tools to meet nuanced needs effectively and know there is much more to AI than OpenAI.
  2. Start with the business case. It doesn’t help to add a sprinkle of AI onto existing product if it isn’t going to change the underlying value proposition. Think about what AI can enable in terms of new frontiers for your brand, which couldn’t have otherwise been achieved.
  3. Find your champions. Many AI use-cases require nuanced, domain-specific knowledge to validate. Identifying and supporting internal champions — especially those with domain expertise — can make all the difference. In publishing, this often means working with journalists who are eager to experiment and push boundaries. Their enthusiasm and expertise can help accelerate AI adoption and success.

Thomas and the other contributors to our AI Special will take part in an ‘AI Special – Q&A’ webinar on Tuesday, 28 January. Click here for more information and to register.


This article was included in the AI Special, published by InPublishing in December 2024. Click here to see the other articles in this special feature.