Despite the belief among many tech companies that training on copyrighted material is fair use, there is a rush among tech companies to acquire all types of data to train AI models. There is a particular need for high-quality content, including material that is not yet available on the internet for scraping and that could come from licensing publisher archives.
Apple charms publishers with $50 million deals for AI training content
In the following table, VIP+ provides an overview of all confirmed content licensing agreements between technology companies and publishers for data used to train AI models. It provides all publicly announced or reported details about the specific data publishers have licensed, deal values, and other types of value exchanged in agreements.
In addition to payment, deal terms often include other forms of value exchange, such as giving publishers privileged access to tools or development teams to help publishers create new AI-powered products. This suggests that publishers who provide licenses are typically interested in actually using the tools they contribute their content to.
AI CompaniesOpenAI is the most prolific licensor, having established content and product partnerships with several major publishers since launching ChatGPT and DALL-E in fall 2022, which kick-started the emerging market for AI training data.