Evaluation of translation ability of language models for education-related communication
The NSW Department of Education (DET) oversees public schools across New South Wales. It is interested in AI-based products to provide administrative assistance and maintaining data security through department-owned, secure GenAI tools. NSWEduChat is a secure, department-owned generative AI tool designed specifically for the NSW education environment, ensuring privacy and equity for all users.
DET identified a use-case for NSWEduChat that can allow teachers to translate communications to students. Reliable translations using NSWEduChat can help educators communicate effectively with multicultural communities.
The goal of this project is to assist DET in achieving robust multilingual communication within educational settings through the deployment of Large Language Models (LLMs). The project evaluated the performance of language models tasked with translating education-related communications across 22 languages.
Reference-based evaluation is when a reference translation and a generated translation are available. Machine translation metrics were established to compare candidate translations against post-edited references. The post-edited references are assumed to be accurate.
The findings shed light on the expected performance of current LLMs for the specific use case of DET and provide a methodology for similar evaluations in the future.
Report structure
- Section 1 provides an introduction
- Section 2 presents an exploratory data analysis of the dataset provided.
- Section 3 discusses the evaluation methodologies used for reference-based, reference-less and back-translation-based evaluation.
- Section 4 outlines the qualitative results.
- Section 5 discusses a two-tier error analysis.
- Section 6 presents a cost-accuracy analysis, concludes the report and provides pointers for future work.
The code for the evaluation accompanies the report.