Tackling AI and data protection

Applying key data protection principles to AI models

The growth of AI continues at a tremendous rate. While many people are jumping in with both feet, others have growing concerns about the implications for individuals and their personal data.

Generative AI and Large Language Models

Generative artificial intelligence relates to algorithms, such as ChatGPT, which can be used to create new content like text, images, video, audio, code and so on. Recent breakthroughs in generative AI has huge potential to affect our whole approach to creating content.

ChatGPT for instance relies on a type of machine learning called Large Language Models (LLMs). LLMs are usually VERY large deep-neural-networks, trained on giant datasets such as published webpages. Recent technology advances have enabled LLMs to become much faster and more accurate.

What are the main concerns?

With increased capabilities and the growth in adoption of AI come existing and emergent risks. We are at trigger point, where governments and industry alike are keen to realise the benefits to drive growth. The public too are inspired to try out AI models for themselves.

There’s an obvious risk of jobs being displaced, as certain tasks carried out by humans are replaced by AI technologies. Concerns recognised in the technical report accompanying GPT-4 include:

  • Generating inaccurate information
  • Harmful advice or buggy code
  • The proliferation of weapons
  • Risks to privacy and cyber security

Others fear the risks posed when training models using content which could be inaccurate, toxic or biased – not to mention illegally sourced!

The full scope and impact of these new technologies is not yet unknown and new risks continue to emerge. But there are some questions that need to be answered sooner rather than later, such as:

  • What kinds of problems are these models best capable of solving?
  • What datasets should (and should not) be used to create and train generative AI models?
  • What approaches and controls are required to protect the privacy of individuals?
  • What are the main data protection concerns?

Data inputs

The datasets used to train generative AI systems are often likely to contain personal data that might not have been lawfully obtained. In many AI models, the data used may be obtained by “scraping” (the automated gathering of data online), which often violates most privacy principles.

Certain information may have been used without consideration of intellectual property rights, where the owners have not been approached nor given their consent for use.

The Italian Data Protection Authority (Garante) blocked ChatGPT, citing its illegal collection of data and the absence of systems to verify the age of minors. Some observers have pointed out these concerns are broadly similar to why Clearview AI received an enforcement notice.

Data outputs

AI not only ingests personal data, but may also generate it. Algorithms can produce new data that may unexpectedly exposes personal details, which leaves individuals with limited control over their data.

There are many other concerns such as transparency, algorithmic bias and inaccurate predictions and the risk of discrimination. Fundamentally, there are concerns that appropriate accountability for AI is often lacking.

Key considerations for organisations looking to adopt AI

We need to understand what people across the business are already doing with AI, or planning to do. Get clarity about any personal data they are using; particularly any sensitive or special category data. Make sure they are aware of the potential risks and know what questions to ask, rather than dive straight in.

We suggest you start by talking business leaders and their teams to identify emerging uses of AI across your business. It’s a good idea to carry out Data Protection Impact Assessment (DPIA) to assess privacy risks and identify proportionate privacy measures.

Rather than adopting huge ‘off-the-shelf’ generative AI models like Chat GPT (and what may come next), businesses may consider adopting smaller, more specialised AI models trained on the most relevant, compliantly gathered datasets.

Differing regulatory approaches

EU – The EU has adopted the world’s first Artificial Intelligence Act. Its aim is to ban unacceptable use of artificial intelligence and introduce specific rules for AI systems proportionate to the risk they pose. It’s taking a ‘harm and risk’ approach which will impose extensive requirements on those developing and deploying high-risk AI systems, yet be lighter touch for low risk/low harm AI applications.
Some have questioned whether existing data protection and privacy laws are appropriate for addressing AI risk, which can increase privacy problems and add new complexities to them. IAPP EU AI Cheat Sheet

UK – Despite calls for targeted regulation, the UK has no EU-equivalent legislation and currently looks unlikely to get one in the foreseeable future. The Government says it’s keen not to rush in and legislate on AI, fearing specific rules introduced too swiftly could quickly become outdated or in effective. For the time being the UK is sticking to a non-statutory principles-based approach, focusing on the following:

  • Safety, security, and robustness;
  • Appropriate transparency and explainability;
  • Fairness;
  • Accountability and governance; and
  • Contestability and redress.

Key regulators such as the Information Commissioner’s Office (ICO), the Financial Conduct Authority (FCA) and others are being asked to take the lead. Alongside this a new advisory service; the AI and Digital Hub has been launched.

There’s a recognition advanced General Purpose AI may require binding rules. The government’s approach is set out in its response to the consultation on last year’s AI Regulation White Paper. ICO guidance can be found here: Guidance on AI and data protection. Also see Regulating AI: The ICO’s strategic approach April 2024

US – In the US a number of AI guidelines and frameworks have been published. The National AI Research and Development Strategic Plan was updated in 2023. This stresses a co-ordinated approach to international collaboration in AI research.

As for the rest of the world, the IAPP has helpfully published a Global AI Legislation Tracker 

Wherever you operate it is vital data protection professions seek to understand how their organisations are planning to use AI, now and in the future. Evaluate how the models work and assess any data protection and privacy risks before adopting them.