Advances in AI – how should data protection teams respond?

April 2023

Will generative AI models like Chat GPT radically change life as we know it?

Everyone seems to be talking about AI lately, with a recent explosion of new tools such as ChatGPT, Whisper APIs, Microsoft Co-Pilot, Siri’s Operation Bobcat and Google Workspace.

While many people are keen to jump in and try them out, there are growing concerns about jobs being replaced by ‘robots’, inaccurate or undesirable results and, for those thinking about data protection, what types of data (including personal and special category data) are being used to train these models.

Whilst the rapid growth in AI may seem inevitable and perhaps unstoppable, some leading industry figures have called for the development and training of powerful AI systems to be suspended for six months, due to worries fears they pose a threat to humanity.

Elon Musk amongst others signed an open letter warning of the risks, saying the escalation in development of AI systems has spun out of control. They want to give industry time to assess the risks.

What are Generative AI and Large Language Models?

Generative artificial intelligence relates to algorithms, such as ChatGPT for example, which can be used to create new content like text, images, video, audio, code and so on.

Recent breakthroughs in generative AI has huge potential to affect our whole approach to creating content.

ChatGPT for instance relies on a type of machine learning called Large Language Models (LLMs). LLMs are usually VERY large deep-neural-networks, trained on giant datasets such as published webpages. Recent technology advances have enabled LLMs to become much faster and more accurate.

What are the main data worries?

With increased capability and growth in adoption of AI come existing and emergent risks. We may be reaching a trigger point, where governments and industry alike are keen to realise the benefits to drive growth. The public too are inspired to try out models like ChatGPT for themselves and find out more.

There’s an obvious risk of jobs being displaced, as certain tasks carried out by humans are replaced by AI technologies.

Concerns recognised in the technical report accompanying GPT-4 include:

  • Generating inaccurate information
  • Harmful advice or buggy code
  • The proliferation of weapons
  • Risks to privacy and cyber security

Others fear the risks posed when training models using content which could be inaccurate, toxic or biased – not to mention illegally sourced!

The full scope and impact of these new technologies is not yet unknown and new risks continue to emerge. But there are perhaps some questions that need to be answered sooner rather than later. Such as:

  • What kinds of problems are these models best capable of solving?
  • What datasets should (and should not) be used to create and train generative AI models?
  • What approaches and controls are required to protect the privacy of individuals?
  • What are the main data protection concerns?

The datasets used to train generative AI systems are often likely to contain personal data that might not have been lawfully obtained.  Certain information has been used without consideration of intellectual property rights, where the owners have not been approached nor given their consent for use.

The Italian Data Protection Authority (Garante) has blocked ChatGPT, citing its illegal collection of data and the absence of systems to verify the age of minors. Some observers have pointed out these concerns are broadly similar to why Clearview AI received an enforcement notice.

Key data protection considerations for businesses

We need to understand what people are doing with AI, or planning to do, across the business. Make sure they are aware of potential risks and know to ask questions, rather than dive in.  Talk with business leaders and their teams to identify emerging uses of AI across your business.

We need to understand specific AI models the business is considering adopting and get clarity about any personal data they are using, particularly any sensitive or special category data.  It’s a good idea to carry out Data Protection Impact Assessment (DPIA) to assess privacy risks and identify proportionate privacy measures.

Rather than adopting huge ‘off-the-shelf’ generative AI models like Chat GPT (and what may come next), businesses may consider adopting smaller, more specialised AI models trained on the most relevant, compliantly gathered datasets.

The ICO updated its ‘Guidance on AI and data protection’ in March 2023.

The prospects for further regulation

We might look to new regulation. The EU still working on the new AI Act, to regulate the use of certain types of AI. However, there is no date for it to become written into EU member state laws yet.

Over in the States, an initial approach to AI regulation emerged in 2022, but it was very limited in its scope. Broader AI regulatory initiatives are likely in 2023/24 and specific states are looking AI.

National Institute of Standards and Technology (NIST) released its Artificial Intelligence Risk Management Framework (AI RMF 1.0) in January 2023; guidance for use by organisations designing, developing, deploying or using AI systems to help manage the many risks of AI technologies.

The UK has no equivalent law and currently looks unlikely to get one in the foreseeable future. The UK Government recently published a white paper entitled ‘AI regulation: a pro-innovation approach‘. The white paper outlines the Government’s framework is underpinned by five principles to guide and inform the responsible development and use of AI across the UK economy:

  • Safety, security, and robustness;
  • Appropriate transparency and explainability;
  • Fairness;
  • Accountability and governance; and
  • Contestability and redress.

So to my final thoughts.  It’s vital that we seek to understand how AI models work and assess any privacy risks before adopting them within our organisations.