Dragnet Data Protection: Dilemmas of Web Scraping and Public Interest Research

  • Panel
  • Orangerie
  • Friday 22.05 — 11:50 - 13:05

Organising Institution

University College London

International

Indiscriminate scraping of data from digital environments — a new form of ‘dragnet’ data collection — has fuelled both the latest generation of AI research and practice as well as leading to widespread public controversy around extraction from the public sphere and surveillance of online communities. Universities are more important than ever when research methods are societally controversial, as they can remain able to discover and innovate by establishing and maintaining a social license. They do this through their focus on the public interest, their key educational and outreach roles, and the care and leadership in meeting regulatory and ethical obligations. Yet universities are struggling on multiple fronts to work out what their own collection and use of this data should look like. At the same time, large firms, especially content delivery networks (CDNs) such as Cloudflare and Akamai are operating bot blockers which both protect sites against unwanted commercial scraping, but hinder bone fide researchers' ability to understand, appraise, and archive the digital world. In this panel, we bring together practitioners, computational researchers and legal scholars to discuss the ways forward for ethics and legality in web scraping.

Questions to be answered

  1. How does data protection understand and apply to web scraping, and what tensions does interpreting it present?
  2. What issues are researchers facing with studying the Web due to the rise of anti-bot measures, and should bone fide research institutions be technically facilitated to scrape?
  3. Should researchers use datasets collected for AI training? Should they publish datasets, knowing they might be used this way?
  4. What safeguards in research involving scraped data should be in place, both during scraping and when the data has been gathered? What role for data rights?

Moderator

Michael Veale

UCL Faculty of Laws - International

Prof Michael Veale is Professor of Technology Law and Policy, and Vice-Dean (Education Innovation) at the Faculty of Laws, University College London. His research focusses on how to understand and address challenges of power and justice that digital technologies and their users create and exacerbate, in areas such as privacy-enhancing technologies and machine learning. This work is regularly cited by legislators, regulators and governments.

Speaker

Sophie Stalla-Bourdillon

Brussels Privacy Hub - Belgium

Sophie Stalla-Bourdillon is co-Director of the Brussels Privacy Hub. She is also a visiting professor at the University of Southampton Law School of law, where she held the chair in IT law and Data Governance until 2022. She was Principal Legal Engineer at Immuta Research for six years. Sophie is the author and co-author of several legal articles, chapters and books on data protection and privacy. She has been Editor-in-chief of the Computer Law and Security Review, a leading international journal of technology law, for almost a decade and is now Honorary Editor. She has also served as a legal and data privacy expert for the European Commission, the Council of Europe, the Organisation for the Cooperation and Security in Europe, and for the Organisation for Economic Cooperation and Development.

Speaker

Alexandra Potts

University College London - International

Alexandra Potts is the Chief Privacy Officer at University College London. As UCL's DPO, Alexandra leads the Data Protection and Freedom of Information Team. She is a very experienced in house lawyer with 16 years post qualification experience, the latter 10 of which have been primarily focused on data protection law. Alexandra has worked across private and public sector organisations and so understands the practical impacts of applying data protection law, and the real-world challenges of AI and data protection compliance. Alexandra holds undergraduate and postgraduate degrees from UCL (Faculty of Laws) and completed her training contract in the City of London. She went from there to enlist as a Captain in Army Legal Services, completing training at the Royal Military Academy Sandhurst. Alexandra’s roles in the Army included training special forces soldiers in the law of armed conflict, as well as serving as an infantry platoon commander with 2 R Anglian when they were Theatre Reserve Battalion. In her civilian career, Alexandra has worked as an in-house lawyer at The Royal British Legion and the Bank of England. Prior to working at UCL, she was Head of Legal at an ad-tech start up.

Speaker

Thomas Vandamme

Université libre de Bruxelles - Belgium

Thomas Vandamme is a postdoctoral researcher at JurisLAB (Faculty of Law, Université libre de Bruxelles). A trained engineer, he completed his doctoral thesis in Engineering and Technology at ULB in 2025, titled "Algorithmic Confusion: A Transversal Study of Computational Trade Mark Similarity". His work investigates how AI systems assess legally relevant similarities between trademarks, and the impacts these tools have in practice. Throughout his doctoral and postdoctoral research, large-scale scraping of public legal databases has been central to his methodology.

Speaker

Luc Rocher

University of Oxford - United Kingdom

Professor Luc Rocher is an associate professor and UKRI Future Leaders Fellow at the Oxford Internet Institute. Luc leads the Synthetic Society Lab, a research group working to make technology and digital power accountable to the public, and guide the development of accountable, sustainable, and safe algorithms that serve the public interest. Luc's research on the limitation of anonymisation practices has been referenced by the European Commission, OECD, World Bank, WEF, FTC, by European data protection authorities, in US legal cases, and led to changes to the UK’s Data Protection Act. Prior to joining Oxford, Luc received a PhD from the Université catholique de Louvain in 2019 and worked as a researcher at the Data Science Institute and Computational Privacy Group of Imperial College London, at the ENS de Lyon, and at the MIT Media Lab.