AI hallucination — khi ChatGPT bịa thông tin | Eng4IT

📖 Cấp độ: Upper-Intermediate ⏱️ Thời gian đọc: ~8 phút 📰 Chủ đề: AI / LLM Reliability

📰 Bài đọc (English)

A New York lawyer made headlines last month after it was discovered that he had submitted a legal brief containing six entirely fabricated court cases — all generated by ChatGPT. The incident has thrust the phenomenon known as AI hallucination into the public spotlight, raising urgent questions about the reliability of large language models.

Hallucination occurs when an AI model generates information that appears plausible but is entirely fictitious . Experts have explained that this happens because LLMs are fundamentally probabilistic systems — they predict the next most likely word in a sequence rather than reasoning from a database of verified facts. The models have been described as “eloquent parrots” that can produce remarkably fluent text without any genuine understanding.

The implications are particularly alarming in high-stakes domains. Medical professionals have warned that AI-generated health advice could be catastrophically wrong. In journalism, fabricated quotes and statistics risk undermining public trust. Legal scholars have argued that courts should establish clear guidelines for the use of AI-generated content in proceedings .

Several mitigation strategies have been put forward. OpenAI and Google have both invested heavily in retrieval-augmented generation (RAG), a technique that grounds model responses in verified external sources. Others have advocated for mandatory disclaimers warning users that AI output may contain errors.

Yet researchers have cautioned that hallucination may never be fully eliminated — it is, in many ways, a feature rather than a bug of how these models work. The onus, they argue, falls on users and organizations to verify AI-generated content before acting on it.

📚 Từ vựng chính

English	IPA	Tiếng Việt	Loại từ
brief	/briːf/	bản tóm tắt pháp lý	noun
fabricated	/ˈfæb.rɪ.keɪ.tɪd/	bịa đặt, ngụy tạo	adj
incident	/ˈɪn.sɪ.dənt/	sự cố, vụ việc	noun
hallucination	/həˌluː.sɪˈneɪ.ʃən/	ảo giác (AI bịa thông tin)	noun
reliability	/rɪˌlaɪ.əˈbɪl.ə.ti/	độ tin cậy	noun
plausible	/ˈplɔː.zə.bəl/	có vẻ hợp lý	adj
fictitious	/fɪkˈtɪʃ.əs/	hư cấu, không có thật	adj
probabilistic	/ˌprɒb.ə.bɪˈlɪs.tɪk/	dựa trên xác suất	adj
eloquent	/ˈel.ə.kwənt/	hùng biện, lưu loát	adj
alarming	/əˈlɑːr.mɪŋ/	đáng báo động	adj
catastrophically	/ˌkæt.əˈstrɒf.ɪ.kəl.i/	thảm khốc	adv
undermining	/ˌʌn.dərˈmaɪ.nɪŋ/	làm suy yếu	verb
proceedings	/prəˈsiː.dɪŋz/	thủ tục tố tụng	noun
mitigation	/ˌmɪt.ɪˈɡeɪ.ʃən/	giảm thiểu	noun
retrieval-augmented generation	/rɪˈtriː.vəl ˌɔːɡ.men.tɪd ˌdʒen.əˈreɪ.ʃən/	sinh nội dung kết hợp truy xuất	noun
disclaimers	/dɪsˈkleɪ.mərz/	tuyên bố miễn trừ	noun
eliminated	/ɪˈlɪm.ɪ.neɪ.tɪd/	loại bỏ hoàn toàn	verb
verify	/ˈver.ɪ.faɪ/	xác minh	verb

🇻🇳 Bản dịch tiếng Việt

Một luật sư ở New York trở thành tâm điểm chú ý tháng trước sau khi bị phát hiện đã nộp bản tóm tắt pháp lý chứa sáu vụ án hoàn toàn bịa đặt — tất cả do ChatGPT tạo ra. Vụ việc đã đẩy hiện tượng gọi là “ảo giác AI” vào tâm điểm dư luận, đặt ra những câu hỏi cấp bách về độ tin cậy của các mô hình ngôn ngữ lớn.

Ảo giác xảy ra khi mô hình AI tạo ra thông tin trông có vẻ hợp lý nhưng hoàn toàn hư cấu. Các chuyên gia giải thích rằng điều này xảy ra vì LLM về cơ bản là hệ thống dựa trên xác suất — chúng dự đoán từ tiếp theo có khả năng nhất trong chuỗi thay vì suy luận từ cơ sở dữ liệu thông tin đã được xác minh. Các mô hình được mô tả như “con vẹt hùng biện” có thể tạo ra văn bản lưu loát đáng kinh ngạc mà không hề hiểu thật sự.

Hệ quả đặc biệt đáng báo động trong các lĩnh vực quan trọng. Các chuyên gia y tế cảnh báo rằng lời khuyên sức khỏe do AI tạo ra có thể sai một cách thảm khốc. Trong báo chí, trích dẫn và thống kê bịa đặt có nguy cơ làm suy yếu niềm tin của công chúng. Các học giả pháp luật cho rằng tòa án nên thiết lập hướng dẫn rõ ràng cho việc sử dụng nội dung AI trong thủ tục tố tụng.

Một số chiến lược giảm thiểu đã được đề xuất. OpenAI và Google đều đầu tư mạnh vào RAG (sinh nội dung kết hợp truy xuất), một kỹ thuật neo phản hồi của mô hình vào nguồn bên ngoài đã được xác minh. Những người khác ủng hộ tuyên bố miễn trừ bắt buộc cảnh báo người dùng rằng đầu ra AI có thể chứa lỗi.

Tuy nhiên, các nhà nghiên cứu cảnh báo rằng ảo giác có thể không bao giờ được loại bỏ hoàn toàn — theo nhiều cách, nó là tính năng chứ không phải lỗi của cách các mô hình này hoạt động. Trách nhiệm, họ lập luận, thuộc về người dùng và tổ chức phải xác minh nội dung AI tạo ra trước khi hành động dựa trên nó.

📝 Phân tích ngữ pháp

Câu 1: “A New York lawyer made headlines last month after it was discovered that he had submitted a legal brief containing six entirely fabricated court cases — all generated by ChatGPT.”

Cấu trúc: S + V(past) + O + after + it was discovered + that + S + had + past participle + O (present participle) — dash + past participle phrase
Ngữ pháp: Passive discovery clause + Past Perfect (hành động trước hành động khác trong quá khứ) + reduced relative clause
Phân tích: “it was discovered that” = impersonal passive; “had submitted” = Past Perfect; “containing” = present participle; “all generated” = elliptical passive
Ví dụ tương tự: The student was expelled after it was discovered that she had submitted an essay containing plagiarized content — all copied from Wikipedia.

Câu 2: “The models have been described as ’eloquent parrots’ that can produce remarkably fluent text without any genuine understanding.”

Cấu trúc: S + have been + past participle + as + N + relative clause + without + N
Ngữ pháp: Present Perfect Passive với “describe as” + defining relative clause + “without” expressing absence
Phân tích: “eloquent parrots” = metaphor (ẩn dụ); “without any genuine understanding” = prepositional phrase diễn tả sự thiếu vắng
Ví dụ tương tự: The system has been described as a “brilliant calculator” that can process millions of records without any human intervention.

Câu 3: “Medical professionals have warned that AI-generated health advice could be catastrophically wrong.”

Cấu trúc: S + have warned + that + compound adj + N + could be + adv + adj
Ngữ pháp: Reported speech với “warn” + modal “could” diễn tả khả năng tiêu cực + adverb intensifier
Phân tích: “AI-generated” = compound adjective (past participle); “catastrophically wrong” = adverb cường điệu hóa tính từ
Ví dụ tương tự: Scientists have warned that self-driving algorithms could be dangerously unreliable in extreme weather.

Câu 4: “Several mitigation strategies have been put forward.”

Cấu trúc: S + have been + phrasal verb (past participle)
Ngữ pháp: Present Perfect Passive với phrasal verb “put forward” (đề xuất)
Phân tích: “put forward” = phrasal verb nghĩa là đề xuất; khi chuyển sang bị động, giới từ “forward” giữ nguyên vị trí
Ví dụ tương tự: Several proposals have been put forward to address the housing crisis.

Câu 5: “The onus, they argue, falls on users and organizations to verify AI-generated content before acting on it.”

Cấu trúc: S + parenthetical reporting clause + V + on + N + to-V + O + before + V-ing + on + pronoun
Ngữ pháp: Reporting clause chèn giữa (they argue) + “the onus falls on” idiom + infinitive of purpose + gerund after “before”
Phân tích: “the onus falls on” = idiom (trách nhiệm thuộc về); “acting on it” = phrasal verb “act on” (hành động dựa trên)
Ví dụ tương tự: The responsibility, experts insist, falls on developers to test their code thoroughly before releasing it.

✏️ Bài tập

Comprehension (Đọc hiểu)

What happened with the New York lawyer and ChatGPT?
Why do LLMs hallucinate according to the article?
What is RAG and how does it help reduce hallucination?

Vocabulary (Từ vựng)

Điền từ thích hợp:

The AI-generated report contained ___ statistics that were completely made up.
LLMs are ___ systems that predict the next most likely token.
The article included a ___ stating that the content was AI-generated.
Always ___ information from AI before using it in official documents.
The ___ of the error was not discovered until the document was reviewed.

✅ Đáp án

Comprehension:

He submitted a legal brief containing six fabricated court cases that were all generated by ChatGPT.
Because LLMs are probabilistic systems that predict the next most likely word rather than reasoning from verified facts.
RAG (Retrieval-Augmented Generation) grounds model responses in verified external sources, reducing the chance of fabricated information.

Vocabulary:

fabricated — bịa đặt, ngụy tạo
probabilistic — dựa trên xác suất
disclaimer — tuyên bố miễn trừ
verify — xác minh
incident — sự cố, vụ việc

📰 Bài đọc (English)#

📚 Từ vựng chính#

🇻🇳 Bản dịch tiếng Việt#

📝 Phân tích ngữ pháp#

Câu 1: “A New York lawyer made headlines last month after it was discovered that he had submitted a legal brief containing six entirely fabricated court cases — all generated by ChatGPT.”#

Câu 2: “The models have been described as ’eloquent parrots’ that can produce remarkably fluent text without any genuine understanding.”#

Câu 3: “Medical professionals have warned that AI-generated health advice could be catastrophically wrong.”#

Câu 4: “Several mitigation strategies have been put forward.”#

Câu 5: “The onus, they argue, falls on users and organizations to verify AI-generated content before acting on it.”#

✏️ Bài tập#

Comprehension (Đọc hiểu)#

Vocabulary (Từ vựng)#