context-generative-ai

Generative AI in Perspective: An Overview

Author: Miriam Finglass | Translation Project Manager at Context

In our recent post Where is the translation industry right now on the AI hype curve?, we shared our thoughts on AI and translation. To put the current AI boom into perspective, here we give an overview of developments in the field and look at some of the common terms currently encountered in relation to AI and machine learning.

Artificial intelligence is not new. Alan Turing was one of the first to conduct substantial research in what he termed “machine intelligence” and published his seminal paper “Computing Machinery and Intelligence” in 1950 (Turing, 1950). In this paper, he proposed an experiment called “The Imitation Game”, now called the “Turing Test”, under which a machine was considered intelligent if a human interrogator could not distinguish it in conversation from a human being. It was AI pioneer Arthur Samuel who popularised the term “machine learning”, describing it in 1959 as the “programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning” (Samuel, 1959). In other words, machine learning (ML) involved computers learning from experience, giving them the ability to learn without being explicitly programmed. Samuel appeared on television in 1956 to demonstrate a computer playing checkers against a human, having used machine learning techniques to program the computer to learn to play the game. The term “artificial intelligence” itself was coined by American cognitive and computer scientist John McCarthy for the Dartmouth research conference later in the same year, one of the first dedicated events in the AI field (McCarthy et al, 1955). See Karjian’s timeline of the history and evolution of machine learning for more details on the development of AI over the last eight decades.

How can machines learn without explicit instructions? The answer is data. In ML machines are trained with large amounts of data. Most machine learning involves developing algorithms (sets of rules or processes) that use statistical techniques to analyse and draw inferences from patterns in data (Xlong, 2023). After training and testing, the ML algorithms or models have learned from existing data and can make decisions or predictions for unseen data. The more data the models analyse, the better they become at making accurate predictions. ML models have been built for a range of tasks and find application in many different fields, including image recognition, speech recognition, recommendation systems, data analysis, fraud detection, medical diagnostics and many more. They are also used in natural language processing (NLP), the branch of AI that enables computers to understand, generate, and manipulate human language, including tasks such as machine translation (MT), text classification or summarisation.

ML models that are trained to recognise and generate plausible human language are called language models. Language models are trained to conduct a probability distribution over words or word sequences. Simply put, they look at all the possible words and their likelihoods of occurring to predict the next most likely word in a sentence based on the previous entry (Kapronczay, 2022). They do this by converting text to numerical representations called tokens, and based on the context, estimate the probability of a token or sequence of tokens occurring next. The simplest language models are n-gram models. An n-gram is a sequence of n words, e.g. a 3-gram is a sequence of three words. These models estimate the likelihood of a word based on the context of the previous n-1 words. One of the main limitations of n-gram models is the inability to use long contexts in calculating the probability of the next word. Language models are the basis of the technology behind autocomplete, speech recognition, optical character recognition and are also used in machine translation. For more information on types of language models and how they work, see Voita (2023).

Most ML models today are based on artificial neural networks (ANNs). These are ML models inspired by the neural networks in the human brain. The origins of ANNs go back to the work of Warren McCulloch and Walter Pitts who published the first mathematical model of a neural network in 1943, providing a way to describe brain functions in abstract terms and to create algorithms that mimic human thought processes (Norman, 2024). An artificial neural network is a statistical computational ML model made up of layers of artificial neurons (Mazurek, 2020). Data is passed between the neurons via the connections or synapses between them. A simple neural network consists of three layers: an input layer, a hidden layer and an output layer. The input layer accepts data for calculation and passes it to the hidden layer, where all calculations take place. The result of these calculations is sent to the output layer. Each synapse has a weight, a numerical value that determines the strength of the signal transmitted and how much it affects the final result of the calculation. During the training process, a training algorithm measures the difference between the actual and target output and adjusts the weights depending on the error, so that the ANN learns from its errors to predict the correct output for a given input (DeepAI.org). In this way, ANNs can be developed to become special-purpose, task-specific systems. The first artificial neural network was developed in 1951 by Marvin Minsky and Dean Edmonds. The Perceptron, developed by Frank Rosenblatt in 1958 was a single-layer ANN that could learn from data and became the foundation for modern neural networks.

ANNs with at least two hidden layers are referred as deep neural networks and were first developed in the late 60s. In 2012, there was an event that set off an explosion of deep learning research and implementation. AlexNet, a ML model based on a deep neural network architecture, won the ImageNet Large Scale Visual Recognition Challenge, a competition that evaluated ML algorithms’ ability in the area of object detection and image classification. AlexNet (Krizhevsky et al, 2012) achieved an error rate more than 10.8% lower than the runner up. Its success was largely based on the depth of the model and the use of multiple GPUs (graphical processing units) in training the model, which reduced the training time, allowing a bigger model to be trained. Deep learning transformed computer vision and drove progress in the late 2000s in many areas, including NLP.

Neural network architectures also transformed language models. Neural language models use deep learning to predict the likelihood of a sequence of words. Compared to n-gram models, they differ in the way they compute the probability of a token based on the previous context. Neural models encode context by generating a vector representation for the previous context and using this to generate a probability distribution of the next token. This means that neural language models are able to capture context better than traditional statistical models. Also, they can handle more complex language structures and longer dependencies between words. For further details on the mathematics behind these models, see Voita (2023).

Machine translation (MT) based on artificial neural networks is referred to as neural machine translation (NMT), which outperformed statistical machine translation (SMT) systems in 2015. NMT models learn from parallel corpora using artificial neural networks, carrying out translation as a computational operation. NMT offers improved quality and fluency for many language combinations in a variety of domains compared to previous MT systems, although the apparent fluency can sometimes make errors more difficult to identify. NMT models are, for example, the technology behind Google Translate, DeepL and Microsoft Bing Translator.

The reason for the current AI boom, generative AI models are capable of generating text, images, video or other data. They are often thought of as the models we can interact with using natural language. But how are they different from all the previous technology discussed? These models also work by learning the patterns and structure of their input training data, using this to generate new data. And they are still based on neural architectures. The difference is that prior to the emergence of generative AI models, neural networks, due to limitations of computer hardware and data, were usually trained as discriminative models in that they were used for distinguishing classes of data, classifying, rather than generating data, a good example being their application in computer vision. However, the availability of more powerful computer hardware and even more immense datasets have made it possible to train models that are capable of generating data. In general, generative AI models tend to be very large, while traditional models tend to be smaller. Generative models also tend to be multi-purpose, whereas traditional models tend to be task-specific. For a detailed discussion on distinguishing generative AI from traditional AI ML models and common network architectures for generative AI models, see Zaamout (2024).

Large language models (LLMs) are deep neural networks trained on enormous amounts of data and are capable of generating what appears to be novel, human-like content. They are the current technology behind many NLP tasks. They function in the same way as small language models, i.e., conducting a probability distribution over words as described above. The main differences are the amount of data on which they are trained and the type of neural network architecture, with most current models using the Transformer architecture. Transformer architecture is discussed in more detail below.

OpenAI introduced the first GPT model, a type of LLM, in 2018. GPT stands for generative pre-trained transformer. The transformer architecture is a neural network model that was developed by Google in 2017 (Vaswani et al., 2017) and has since revolutionised the field of NLP and deep learning, thanks to its attention mechanisms. Attention is a mathematical technique that enables a model to focus on important parts of a sentence or input sequence, allowing it to better consider context, consider relationships between words at a longer distance from each other and resolve ambiguities for words with different contextual meanings (Shastri, 2024). Transformer models are also capable of processing input data in parallel, making them faster and more efficient. Pre-training involves training a model on a large amount of data before fine-tuning it on a specific task. GPT models are pre-trained on a vast data set of text, containing millions of websites, articles, books etc, learning the patterns and structures to give them a general understanding of the language. After pre-training, the model is fine-tuned on specific tasks, for example translation, text summarisation, question answering or content generation. Following the first GPT, OpenAI introduced successive releases, the most recent being GPT-4o. GPTs can be used to write many different types of content, including essays, emails, poetry, plays, job applications or code. ChatGPT, the chatbot service developed by OpenAI, is based on task-specific GPT models that have been fine-tuned for instruction following and conversational tasks, such as answering questions. Although it is a conversational, general-purpose AI model and not an MT system, it can be used for translation.

Studies have shown positive results for generative models and LLMs in the translation of well-resourced languages but poor quality for low resource languages. For example, Hendy et al. (2023) tested three GPT models on high and low resource languages, finding competitive translation quality for the high resource languages but limited capabilities for low resource languages. Castilho et al. (2023) investigated how online NMT systems and ChatGPT deal with context-related issues and found the GPT system outperformed the NMT systems for contextual awareness except in the case of Irish, a low resource language, where it performed poorly. It should also be remembered that such studies are limited to small-scale test sets and may not be generalisable across language pairs, specific domains and text types.

Some drawbacks of generative AI and GPTs/LLMs also need to be considered.

  • Transformer models are computationally expensive, requiring substantial computational resources during training and inference (when using the model to generate predictions) and training times and costs are high.
  • LLMs come at a high cost to the environment. They have a high carbon footprint and as generative models have become larger and larger to improve performance, their energy requirements have become immense. Large amounts of water are also needed to cool data centres and demand has grown for the rare earth minerals required to manufacture GPUs.
    Due to their highly complex architecture and the “black box” nature of the internal working of the models, interpreting and explaining why certain predictions are made is difficult.
  • Due to the way the attention mechanisms of transformers work, transformer models are very sensitive to the quality and quantity of the training data and may inherit and amplify societal biases present in the data (Vanmassenhove, 2024).
    LLMs require a large amount of training data. In the case of machine translation, a lack of data generally means poor quality results for low-resource languages.
  • Hallucinations, i.e. the generation of text that is unfaithful to the source input or nonsensical Ji et al. (2023), occur across models used for natural language generation (NLG). In the case of machine translation, LLMs, like traditional NMT models, can produce hallucinated translations. Since LLMs tend to generate fluent and convincing responses, it is more difficult to identify their hallucinations, posing a risk of harmful consequences. Guerreiro et al. (2023) found that the types of hallucination differed between traditional NMT models and GPTs. Hallucinations in the case of LLMs also extend to deviations from world knowledge or facts. For more information on hallucinations in the field of NLG, see Ji et al. (2023).

The EU AI Act, the first binding regulation on AI in the world, was adopted by the European Council in May 2024. It aims to “foster the development and uptake of safe and trustworthy AI systems across the EU’s single market by both private and public actors. At the same time, it aims to ensure respect of fundamental rights of EU citizens and stimulate investment and innovation on artificial intelligence in Europe”. There are questions as to whether the Act will be effective in protecting the environment from the impact of AI, see Warso and Shrishak, (2024) and Laranjeira de Pereira, J.R. (2024), but it’s clear that at this point in the development of AI, it is time that proper consideration is given and action taken on its social and environmental consequences.

New developments in AI are happening at an ever-increasing pace and bring both opportunities and challenges to Translation and many other industries. We will continue to monitor changes in this space as well as the environmental repercussions of AI.

How has AI impacted on your role/industry? What is your experience?