There's gold in your media archives

There’s Gold in Your Media Archives!

Picture of Nota Staff
Nota Staff

Summary: How data from legacy media fuels the advancement of AI

There are emerging data marketplaces that aim to capitalize on publisher data – all for the purpose of training AI solutions. In fact, industry giants such as Apple and OpenAI are pouring billions into publisher agreements. These deals provide apps and foundational AI companies with a wealth of top-tier data, enabling them to make chatbots more robust, forecasts more precise and inform users of current events faster. The best news in our opinion? These marketplaces represent a potential windfall for struggling media brands and news outlets.

  • Data marketplaces that focus on publisher data are just now being created
  • Companies like Apple and OpenAI have invested billions in publisher data deals
  • With the full value of the market yet to be determined, billions are already being spent

AI’s Impact on the Evolution of Data Marketplaces

New marketplaces are popping up for publisher data – aimed directly at training AI models. Leveraging vast data archives and real-time updates from news organizations, AI learning and training capabilities are rapidly accelerating across industries. 

When thinking about where to secure impactful data assets, a great option is the archives from media companies. Think about it – their assets can offer up-to-the-minute updates around news, historical insights and more. This would pave the way for a significant acceleration in AI learning and training, across all verticals. While the value of the data is yet to be totally understood, companies like Apple and OpenAI reportedly spent millions for these data sets. It’s interesting, the deals range from specific training use cases to full-scale licensing for all future applications. As end-user expectations around AI tools increase, it’s crucial platforms have access to lots of high-quality data; this enables them to deliver against solution promises. 

Here’s an example of why media archives are ideal for training AI models: imagine a legacy travel company is looking to enhance their offerings by strengthening their AI chat bot. They should consider purchasing current and legacy data from leading travel publications. This would enable their tools to offer hyper-granular insight on travel recommendations, tips and tricks, and more. The bot would be able to answer specific questions and provide detailed help on a consumer’s future trip. By answering questions immediately, friction is reduced and it keeps audiences on the travel company’s website longer.

With some of the largest organizations showing interest in this sector, it’s critical media companies and publishers consider how they can bolster their own datasets to train AI tools, or contribute to the evolving ecosystem where data can be shared, purchased, and utilized to enhance AI’s capabilities.

Several data marketplaces have started to emerge, such as Dawex, Ocean Protocol and Snowflake, which provide platforms for trading and accessing diverse datasets. According to IoT Analytics, AI is projected to drive significant growth in the data management market, which is expected to reach $513.3 billion by 2030​. It’s clear data marketplaces stand as gatekeepers to the next frontier of innovation in the technology sector.

Navigating the Regulatory Landscape

That said, it’s worth calling out information like published works from well known academics, analysts, artists, authors, journalists, etc, are considered the most valuable. These types of works have been developed by experts and unique thinkers, and carefully reviewed and edited. Because of this, AI companies and solution developers want to train their models with it, regardless of copyright status… But therein lies a particularly challenging situation.

There’s an ongoing industry-agnostic discussion around how to effectively train AI, while protecting the original creators’ work, authenticity, income, and future ability to create. Coupled with this hurtle is a slew of relevant domestic and international data usage and privacy regulations (to name a few: GDPR, CCPA, DGA, PIPL). The regulatory landscape for data licensing in AI training is becoming increasingly complex – but also ripe with opportunity.

From where we sit, it’s pretty clear businesses can unearth and introduce new revenue streams. It’s easy to see a near-term future where there’s more than one marketplace companies can participate in, and different ways of connecting with each one. We envision integrations within content management systems (CMSs) or other content creation platforms, standalone marketplaces, and niche specialists or resellers. Over time, a wealth of data will be accessible and actionable for any organization. 

By establishing fair and transparent ways for compensation and collaboration, both tech companies, publishers and creators can benefit from the advancements in AI, fostering sustainability in the digital economy. 

Recommendations for Companies

For companies interested in either fueling or capitalizing on emerging AI data marketplaces, there are several recommendations to consider:

  • Get outside help: Work with emerging companies in the AI space to help clean up your archives, get that content that isn’t online, online. 
  • Identify Valuable Data Sources: Focus on collecting data that AI developers find valuable, such as professionally written and edited content, which AI researchers prize for its quality.
  • Keep a Pulse on Evolving Regulations: Organizations need to keep updated on evolving regulations and adopt data governance practices to ensure compliance and trust. Collaboration between regulators, industry stakeholders, and international bodies will be essential in shaping a balanced and effective regulatory framework for AI.

Future Considerations 

As the AI industry and data marketplaces continue to mature, media companies and publications should stay updated on the changing dynamics and opportunities marketplaces present. By doing so, they’ll position themselves to unlock new revenue streams and partnerships. Additionally, companies that navigate the regulatory landscape effectively and capitalize on these opportunities stand to benefit immensely from this new wave of AI-driven data utilization.

Related Articles

end of data scraping
As data scraping becomes increasingly scrutinized, AI platforms are turning to first-party data strategies and data marketplaces to train their models, ensuring ethical and sustainable solutions.
Picture of Nota Staff
Nota Staff
New Era Newsrooms
Newsrooms are leveraging their legacy data to unlock new content and revenue strategies, with data marketplaces emerging as a potential new revenue stream for the industry.
Picture of Nota Staff
Nota Staff
Why It’s More Important Than Ever to Vet Content and Sources
Media companies and publications are evolving their policies to address the challenges of AI-generated content, using machine learning models and specialized tools to detect manipulation signs and ensure content authenticity, while also implementing robust consent and transparency measures.
Picture of Nota Staff
Nota Staff

Request a demo

Your demo request was successful!