Data for Generative AI

Generative AI models such as large Language Models and image generation models are transforming the relationship between humans and content. Generative models require large amounts of training and validation data including images, text, audio, video, and structured data. The performance of the model and the capacity to accurately adapt to the context providing relevant outputs depends very much on the quality of the input data. Data is not always easy to get, and in many cases, private data is needed to finetune model performance. Enabling access to high-quality data can drive innovation and bring competitiveness to industries in the market.

Europe's public and private partners are heavily investing in deploying Common European Data Spaces which are at the core of implementing the European data strategy. Data spaces offer access to data for primary and secondary use while guaranteeing data sovereignty and preserving European values and rights. But what kind of requirements do data spaces need to accelerate the growth of competitive European Generative AI models? Access to vast high-quality (private and public) data, synthetic data and data augmentation are key topics to address. Generative AI can be used to generate high-quality synthetic data which can be offered through data spaces. Technical, governance, business, regulatory, ecosystem and ethical issues must also be discussed. The session will explore the novelties and requirements of Generative AI in data spaces and ecosystems.

List of session speakers:

Session chairs: Edward Curry and Ana García Robles
Bjoern Juretzki, European Commission
Joachim de Greeff, TNO
Georg Rehm, DFKI
Natalie Bertels, imec
Mike Matton, VRT