Speakers, abstracts and biographies

Please find below a list of speakers, abstracts and biographies for the Beyond Deepfakes: advancing watermarking in Generative AI workshop.

Title: Watermarking in the Age of Generative AI: lessons learned from the past and current challenges

Abstract: Watermarking has a long and eventful history, punctuated by early enthusiasm, theoretical breakthroughs, and some disillusionments. In this talk, I will revisit nearly three decades of research on digital watermarking, drawing key lessons from the past and examining how they apply - or are being overlooked - in the context set by generative AI. As new applications emerge, from model attribution to content provenance, we face the risk of repeating old mistakes and missing the actual challenges posed by the applications of watermarking in a generative AI setting. I argue that rather than reinventing the wheel, we must build on established knowledge, adapting and extending it to develop solutions suited to the challenges of watermarking AI-generated content. Through a series of practical examples and conceptual insights, I outline both promising directions to follow and some traps to avoid.

Speaker biography: Mauro Barni is Full Professor at the University of Siena, where he leads the Visual Information Processing and Protection (VIPP) group. His research spans digital watermarking, multimedia forensics, adversarial signal processing and security of machine learning. He is among the pioneers of signal processing in the encrypted domain, and has contributed foundational work to the security and robustness of media protection and authentication systems. Over the last three decades, Prof. Barni has played a key role in developing the theoretical and practical underpinnings of digital watermarking, particularly in modeling imperceptibility, robustness, and security trade-offs. His recent work focuses on adversarial machine learning and the protection of intellectual property in AI through function and model watermarking.

He has authored or co-authored over 350 scientific publications and holds multiple patents in the field. His work has received more than 20,000 citations (Google Scholar), with an h-index of 71. His contributions have been recognized with several awards, including the IEEE Signal Processing Magazine Best Column Award, the IEEE TGRS Best Paper Award, and the EURASIP Technical Achievement Award. He is an IEEE Fellow, EURASIP and AAIA Fellow.

Title: SynthID-text: Scalable watermarking for identifying large language model outputs

Abstract: Text watermarking can help to address the growing importance of tagging AI-generated content to promote transparency, maintain the integrity of the information ecosystem, and improve the training of future AI models. This talk will compare text watermarking to other detection approaches, and discuss the primary objectives of a watermark: detectability, quality preservation, robustness, and low latency. We will then introduce SynthID-text, a scalable watermarking method that preserves text quality and enables high detection accuracy, with minimal latency overhead. We will present the methodology of how SynthID-text generates and detects text, and how it compares to other methods. Finally, we will discuss the outlook for the further development and usage of text watermarks.

Speaker biography: Abigail See is a Research Scientist at Google DeepMind in London. After an undergraduate degree in mathematics, she completed her PhD on open-ended neural text generation at the Stanford Natural Language Processing group. At Google DeepMind, she has worked on watermarking for AI-generated text, the AlphaEvolve coding agent, and radiology report generation.

Title: Zero-bit watermarking and the detection of AI-generated content

Abstract: This technical talk focuses on the application of zero-bit watermarking for detecting AI-generated content. It begins with the definition of traditional watermarking, which involves embedding a mark within the host content. It details one application for monitoring the use of registered pictures from photo agencies. Then, we see whether this technology can be of any use for detecting AI-generated content. Among the new challenges, the main pitfall is the evaluation of the probability of false alarms, a difficulty that has been underestimated in the scientific literature.

Speaker biography: Teddy Furon is Director of Research and the head of the ARTISHAU team dealing with the security of AI/ML in the Inria centre at Rennes University, France. He has been working on digital watermarking since 1998, serving as a technical adviser for MovieLabs, Content Armor, Technicolor, and Meta. He co-funded two startups in this field: Imatag and Label4.ai.

Title: Watermarking AI generated images

Abstract: As the outputs of image generation models become increasingly difficult to tell apart from 'real' images, there is a strong need to robustly label both 'AI images' and 'real images' as such. In this talk we focus on the former, exploring how a trusted model provider could use imperceptible watermarks to mitigate misuse of their generated content. We will start with a quick overview of different watermarking paradigms and the inner workings of diffusion/flow-matching image generators. Afterwards, we will introduce a number of recent developments around embedding watermarks as part of the image generation process, motivating each by the limitations of the one before it. Finally, we end up at the question of watermarking few-step image generation models (such as FLUX.1 [schnell]), where all previous approaches provided insufficient results in our experiments.

To highlight this area, we discuss some of the things we have tried, and bring up previous work which might provide insights to the problem.

Speaker biography: Bence is a Co-Founder of Garandor, a local data ownership startup. He is working on imperceptible watermarking of both pre-existing and AI-generated images.

Title: Watermarking and the Future of Trust in Generative AI

Abstract: As generative AI reshapes the digital landscape, watermarking has emerged as a leading proposal to promote transparency and trust in an increasingly fraught online ecosystem. This talk examines the purpose of watermarking, global policy developments, and its adoption across the tech industry. It also highlights key limitations—technical, practical, and governance-related—that make watermarking only part of the solution. To build trust online, we must consider complementary approaches such as content provenance, platform transparency tools, and digital literacy. A pro-innovation policy agenda should support flexible, interoperable, and open solutions—enabling trust without stifling progress.

Speaker biography: Ayesha Bhatti is Head of Digital Policy for the UK and EU at ITIF’s Center for Data Innovation, where she currently works to encourage European laws to grant AI the same access rights to public information as humans. Ayesha is a licensed attorney in the state of New York, and holds a master’s degree in computer science from Birkbeck, University of London. Prior to joining ITIF, she worked at a technology consultancy as a software engineer.

Title: What about textual deepfake detection?

Abstract: Common Crawl is a multi-petabyte longitudinal dataset containing over 100 billion web pages which is widely used as a source of language data for sequence model training and in web science research. In this talk I'll give a short introduction to Common Crawl (CC): how its constituent archives are created, their structure and what kind of tooling you need to work with them. We have local copies of a dozen CC archives attached to Cirrus, one of EPCC's multiprocessor compute resources. I'll briefly describe some recent work I've done using these archives, including a surprising insight they provide into how the Web has been changing from human-authored to automatically constructed. CC was the single largest data source used to train GPT-3, the base LLM for the first ChatGPT releases. CC will continue to provide new monthly archives, which are very likely to provide the bulk of the data used to train new versions of the big commercial LLMs. There is a very real risk that as the role of ChatGPT and its competitors in producing text for the Web grows, the proportion of information to be found on the Web, and thus in new archives, which is both new and true, will decrease, drowned by LLM-generated text. Unfortunately, to date none of the big players have shown any (public) interest in watermarking or otherwise identifying their output. I'll finish with a summary of the state of play with respect to categorising text as human-authored or LLM output.

Speaker biography: Short bio: Henry S. Thompson is Professor of Web Informatics in the University of Edinburgh School of Informatics. His research interests have ranged widely, originally in the areas of speech and language processing, more recently focusing on understanding and articulating the architectures of the Web. He retains an interest in the fundamental goals of Cognitive Science as originally conceived, believing that both sub-symbolic and symbolic mechanisms have a part to play in an adequate account of human cognition. He is committed to the proposition that our professional responsibilities and the success of our intellectual pursuits both depend on keeping our humanity clearly in view.

He was a member of the SGML Working Group of the World Wide Web Consortium (W3C) which designed XML, a major contributor to the core concepts of XSLT and W3C XML Schema and a member of the XML Core and XML Processing Model Working Groups at the W3C. He was elected five times to the W3C TAG (Technical Architecture Group), serving from 2005 to 2014, and continues to contribute to Web standards through the W3C and IETF.

Host: Mirella Lapata

Host biography: Mirella Lapata is a Professor of Natural Language Processing at the School of Informatics. Her work is centred on enabling computers to understand, reason with, and generate natural language. She is recognized for her contributions to the field, having received several prestigious accolades. Mirella was the inaugural recipient of the British Computer Society and Information Retrieval Specialist Group's Karen Sparck Jones award. She is also a Fellow of the Royal Society of Edinburgh, the Association for Computational Linguistics (ACL), and Academia Europaea. Throughout her career, Mirella has earned multiple best paper awards at leading natural language processing conferences. She has contributed to the academic community by serving on the editorial boards of several respected journals, including the Journal of Artificial Intelligence Research, the Transactions of the ACL, and Computational Linguistics. In 2018, she served as president of SIGDAT, an association promoting the study of linguistic data and corpus-based approaches to natural language processing. Mirella has received many significant funding awards. She has received an ERC Consolidator Grant, a Royal Society Wolfson Research Merit Award, and a UKRI Turing AI World-Leading Researcher Fellowship, affirming her status as a top-tier researcher in artificial intelligence and natural language processing.

Mauro Barni, University of Siena

Abigail See, Google DeepMind

Teddy Furon, INRIA

Bence Szilágyi, Garandor

Ayesha Bhatti, Center for Data Innovation

Henry S. Thompson, University of Edinburgh

Panel Session

Related links