Large language models: the legal and ethical implications of synsethised information

For several years, discussions about artificial intelligence (AI) have largely remained within the confines of university classrooms, research labs, and academic journals. As these technologies became more widespread, that all changed and the situation is radically different today. AI is now a central topic of public debate and is talked about in relation to justice, healthcare, sustainable transport, financial markets, national security, political elections, the green transition and much more. AI can be presented as a very strong accelerator for innovation in all areas of human activity.

One important factor that is contributing towards building AI’s media leadership is generative artificial intelligence, particularly large language models (like ChatGPT-4). Generative AI is designed to generate text in a natural way. It can analyse, interpret and answer questions. It can complete sentences, translate text into different languages, generate code, write articles and much more. Its functioning is complex but a simplified explanation can be provided. Given a sentence as input, the system proceeds to write by extracting from its database the words or phrases that are most likely to be consistent with that input (and those that preceded it). With this approach, generative AI can even write entire articles and scientific essays.

The attention given to this new technology is justified by the fact that it presents something new compared to what was provided by previous generations of AI, something immediately and directly perceptible to the general public. That something is a ‘real property of the species’ — language — which is indispensable for most knowledge work and can be used for different purposes.

Within the scientific community, a debate is developing on the basic question of whether these systems are limited to combining and reproducing information learned from training data in a very sophisticated way, or whether they actually do something more. The topic of discussion is whether these models are creative, original, whether they really learn. Or do they simply memorise the training data and reproduce it in a purely statistical manner, combining and reproducing information learned from the training data?

If we wanted to provide an answer from a legal perspective, we could say that these machine are indeed creative, and that they are also not! That is to say, large language models are creative because they generate coherent and contextually relevant textual narratives with an approach that goes beyond simply reproducing existing information. They are not creative because without human input (prompts) they cannot generate anything. This leads to two observations. The first is that the doctrine of ‘free marketplace of ideas‘ is inapplicable. The second concerns the relational and ethical value of generating synthesized information.

Regarding the first aspect, we can observe that the information society has undergone momentous transformation in recent years. Thanks to online platforms, search engines, and social networks, it has become an extensive network of freely circulating ideas, thoughts, and beliefs. Access to this network is cheap, fast, and fair (it is accessible to the vast majority of people). The expansion of the web has been supported by the doctrine that ideas should be allowed to emerge freely, potentially reaching the entire world. Thus, the freedom to express thoughts, to inform, and to be informed, are no longer limited, as was the case with traditional media. The free marketplace of ideas has produced well-known critical issues related to disinformation, information pluralism and concentration of power. But it has become part of the legal system and this was affirmed, most recently, by the EU’s Digital Service Act, which excludes liability (under certain conditions) for those responsible for the “mere transfer” of information produced by others.

Large language models open up a new scenario that no longer involves a free marketplace of ideas and, above all, a market. There is no competition by which the strongest thoughts can emerge. Instead, there is a new ‘source’ that distills tailor-made information for each individual through a sophisticated process of training and customisation based on individual profiling. This new source does not exercise freedom of expression but it still participates in that endeavour by simulating language to engage with human freedom of thought. And for that reason, it must be assigned some responsibility, as is the case with the press, radio and television.

Certainly, constitutional rules include strong guarantees that safeguard freedom of expression. Interpretation techniques will enable current principles and rules to be adapted as these technologies evolve. Nevertheless, for the sake of legal certainty, it would be good if legislators could examine new legal aspects concerning liability deriving from the creation of “synthesised information”. The challenge we face is how to overcome the cultural gap that ties the concept of authorship to the category of a person, an individual who creates or originates something like a book, painting, music, theory or any other form of artistic or intellectual expression. We have separated the ability to understand from the ‘technological’ ability to produce ideas and opinions. We have created authors without personhood. They don’t have freedom but they do simulate excercising it. We now have the task of identifying regulatory models to manage the responsibilities that derive from these new forms of authorship.

As for the second aspect, we can observe that language simulation occurs in human-machine interaction. Ideas, opinions, convictions will be increasingly formed in this environment, capable of transforming streams of consciousness and shaping the right to self-determination. On that topic, it is useful to recall Gregory Bateson’s definition of information as “a difference that makes a difference”. This definition emphasises the role of information in bringing together two worlds, the observed world and the world perceived by the observer. It creates a relationship between what we perceive from the external world and what we conceptualise internally as a representation of reality. Information has the potential to be meaningful and become our knowledge of the world.

But what happens if this potential is mediated, if not replaced, by a technology that can affect our representation of reality? We cannot know what this new technology will be used for. We cannot know what use will be made of the knowledge extracted from millions of human prompts — an outlet for projects, intentions, fears, expectations, desires, etc… But we can imagine that the business model will be consistent with the profit-oriented logic of private companies.

Undoubtedly, human beings have always related to the outside world mediated by the tools they have produced over time, tools that have eventually reshaped human existence. The novelty of synthesised information is the fact that it creates such intimate and profound interaction between technology and human identity that it can reshape human existence from within. This puts regulators in the position of having to address challenges that would seem insurmountable without a strong and widely shared ethical component.