Where is the Translation industry right now on the AI hype curve?
Authors: Angelika Zerfass | Ulrike Fuehrer | Miriam Finglass
Context AB and AI
May 2024 saw the inauguration of the Context Advisory Board as an information and consultancy resource for the operational Context team.
On the occasion, internationally renowned translation technology expert and Context Board member Angelika Zerfass, made a very welcome and meaningful input in the Context discussion on AI and Translation.
Key Takeaways from the Discussion
Here are the thoughts and thought-provoking nuggets we took away from Angelika’s presentation:
- AI is machine learning. Machines are trained with large amounts of data. They use statistics to discern patterns in the data in order to be able to make decisions or predictions for unseen data.
- Generative AI is capable of generating text, images, video or other data. It has been made possible by the availability of more powerful computer hardware and immense datasets.
- AI is pattern matching. It’s very useful in areas such as radiography/healthcare where, for example, X-ray patterns can be established in seconds to feed into diagnosis and patient care but, as it stands, mainly inadequate in situations where context knowledge and understanding is crucial.
- AI hallucinates. Where there is no content, it makes it up by using the most probable combination (of words, sounds, pixels…). While the result looks plausible to the human user at first glance, these most probable combinations state something that is simply not true.
- AI tools do not understand, cannot evaluate and do not know when something is incorrect, biased, inappropriate or untrue.
- AI systems have been shown to produce text (and images) that perpetuate gender, racial and other biases.
- Hence the content quality available on the internet may have been at its best up until recent years. As AI propagates its own mistakes and myths, content quality stands to deteriorate. Content may look great – and yet bear no relation to reality.
Where are the Human Competencies required?
- While large language models (LLMs) and other AI tools can generate images, videos, songs, texts and translations, they rely on human created and curated content as training material.
- Human translations continue to be an essential component in the quality segment of the market.
- Human intervention on machine translated output is required, new linguistic profiles can add value in:
- light or full post-editing of machine translated content
- continuous development of QA tools for machine translated output
- clean TMs, term lists, added metadata
- editing content created in the target language: checking facts, ensuring consistency, eliminating bias
- determining which texts are suitable for machine translation post-editing and which not. Possibly pre-editing texts to make them more suitable for machine translation
- We’ll need to hear from linguists as to the quality of the machine translated output and how that might vary by domain, language pair or text type. We’ll need their feedback on the post-editing effort needed and their experience of the translation process, considering job motivation and satisfaction.
- For smaller languages, insufficient training data is available; humans are crucial here as subject matter experts, product experts and language experts.
Environmental
The environmental impact of AI is huge. In their study, Strubell et al (2019) look at machine learning models based on the transformer neural architecture, commonly used for machine translation. The graphical processing unit (GPU) emissions generated when training a large model were equivalent to the output of 1.5 cars over the 20-year lifetime of those cars. And that’s only considering the training. This doesn’t consider the power and cooling requirements for the computers or the carbon emissions generated each time one of these systems is used. Luccioni et al (2023) highlight the additional emissions related to generative AI as compared to traditional “task-specific” systems.
Data Protection and IP
- Confidentiality of data processed by AI systems must be a priority.
- There are intellectual property considerations in terms of the source of data used in training AI systems and the copyright of its authors.
So where are we at Context on the hype curve that all new technology – all new products? – traverse. Perhaps more inclined to critically evaluate generative AI solutions, to discuss and pilot post editing models with our linguists and clients, to embrace the creation of new specialised job profiles, and – quite horrified at the environmental cost of AI.
Where do you sit on the curve?
References
Luccioni, A.S., Jernite, Y. and Strubell, E. (2023). ‘Power Hungry Processing: Watts Driving the Cost of AI Deployment?’ Available at: http://arxiv.org/abs/2311.16863
Strubell, E., Ananya, G., and McCallum, A. (2019). ‘Energy and Policy
Considerations for Deep Learning in NLP’. Available at: http://arxiv.org/abs/1906.02243