Since the explosion in popularity of generative artificial intelligence (AI), several scholarly publishers have forged agreements with technology companies looking to use content to train the large language models (LLMs) that underlie their AI tools. A new tracker aims to catalogue what deals are being made — and by whom.
“We were seeing announcements of these deals, and we got to thinking that this is starting to become a pattern,” says Roger Schonfeld, a co-creator of the tracker and vice-president of libraries, scholarly communication and museums at Ithaka S+R, a higher-education consulting firm in New York City. “We wanted to shine some light on not just the individual deals, but also what the overall pattern was starting to look like — and provide a source for the community.”
Has your paper been used to train an AI model? Almost certainly
Schonfeld and his colleagues launched the Generative AI Licensing Agreement Tracker in October. It includes information about licensing deals — confirmed and forthcoming — between technology companies and six major academic publishers, including Wiley, Sage and Taylor & Francis. Schonfeld says that the list documents only public agreements, and that there are probably several others that remain undisclosed.
Many publishers are considering questions such as how licensing — or not licensing — content to generative-AI companies will affect revenue, and the risks or benefits of being among the first to act in this space, Schonfeld says. “Every publisher of a certain scale and above is absolutely grappling with this issue.”
Growing trend
Several big publishers have cashed in on AI licensing deals this year. In May, Informa, the parent company of the UK academic publisher Taylor & Francis, announced that it made a US$10-million deal to license content to Microsoft. The next month, the US academic publisher Wiley announced to its investors that it had earned $23 million from a deal with an unnamed firm developing generative-AI models. In September, the company said that it expected to earn another $21 million from such agreements this financial year. Nature’s news team contacted several other publishers including Elsevier and Springer Nature, Nature’s publisher, about whether they had plans for licensing deals, but received no comment. (Nature’s news team is editorially independent of its publisher.)
“We are providing data and content under license for the purposes of training AI, such as LLMs, so that those models become more accurate and relevant for the benefit of everyone who uses them,” a spokesperson for Taylor & Francis said in a statement. “Licensing activities such as this are a key responsibility for research publishers and part of our ongoing commitment to ensuring authors’ ideas make the fullest possible contribution.”
How ChatGPT and other AI tools could disrupt scientific publishing
The spokesperson says that royalties will be paid to authors, and that there are strict boundaries attached to their AI partnership agreements. For example, data and content can be used only for training and are under no circumstances permitted to be reproduced in an equivalent format.
A Wiley spokesperson said that royalties will be paid to book authors and other publishing partners, and that it is monitoring AI-model developers for use of copyrighted material without permission. Several of the publishers contacted by Nature said that they had put measures in place to prevent AI tools from scraping their content from the web without permission.
Some publishers haven’t yet entered into any agreements — including the American Association for the Advancement of Science (AAAS), a non-profit academic publisher that publishes Science. Meagan Phelan, communications director for the Science family of journals in Washington DC, says that the AAAS might consider licensing its content to technology companies in the future, if they meet certain criteria. These include assessing a firm’s trustworthiness and the usefulness of the tools that will be created with the content.