Low-Resource Languages Jailbreak GPT-4

Yong, Zheng-Xin; Menghini, Cristina; Bach, Stephen H.

Computer Science > Computation and Language

arXiv:2310.02446 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 27 Jan 2024 (this version, v2)]

Title:Low-Resource Languages Jailbreak GPT-4

Authors:Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach

View PDF

Abstract:AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.

Comments:	NeurIPS Workshop on Socially Responsible Language Modelling Research (SoLaR) 2023. Best Paper Award
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2310.02446 [cs.CL]
	(or arXiv:2310.02446v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.02446

Submission history

From: Zheng-Xin Yong [view email]
[v1] Tue, 3 Oct 2023 21:30:56 UTC (121 KB)
[v2] Sat, 27 Jan 2024 22:54:52 UTC (123 KB)

Computer Science > Computation and Language

Title:Low-Resource Languages Jailbreak GPT-4

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Low-Resource Languages Jailbreak GPT-4

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators