Paper Details

Abstract

The cosmetics industry lacks personalized, science-backed advisory systems, particularly for low-resource languages like Vietnamese. This paper addresses this gap by introducing and evaluating a novel two-stage Retrieval-Augmented Generation (RAG) system. Our approach leverages a state-of-the-art architecture combining a high-recall Bi-Encoder Retriever with a high-precision Cross-Encoder Re-ranker, forming a robust pipeline for specialized information retrieval. To enable this research, we created the Vietnamese Cosmetics E-commerce Dataset (VCED), a new, publicly available corpus of 9,173 canonical products derived from 11,609 raw e-commerce listings via a rigorous ``Funnel Strategy'' for data cleaning and entity resolution. The system's core components are language-specific models fine-tuned for the Vietnamese context. Experimental results demonstrate the decisive advantage of this specialization, with the Retriever achieving 99.92\% triplet accuracy and the Re-ranker reaching 99.74\% Average Precision. Most critically, end-to-end evaluation confirms that the re-ranking stage is indispensable; its inclusion more than doubled the Mean Reciprocal Rank (MRR) to 0.585 and improved the Hits@1 score from zero to 0.473. Successfully deployed and validated as a Facebook Messenger chatbot, this work not only establishes a new performance benchmark for domain-specific conversational AI in Vietnamese but also provides a production-ready blueprint for applying advanced RAG architectures in non-English, low-resource environments.

Keywords
Retrieval-Augmented Generation (RAG) Two-Stage Information Retrieval Semantic Search Conversational AI Low-Resource Language
Contact Information
Le Anh Tien (Corresponding Author)
FPT University, Vietnam
0389081824

All Authors (5)

Vo Ngoc Minh Anh

Affiliation: University of Science, Ho Chi Minh City, Vietnam

Country: Vietnam

Email: vongocminhanh.vnma@gmail.com

Phone: 0389081824

Pham Le Duc Thinh

Affiliation: Vietnam National University, Ho Chi Minh City, Vietnam

Country: Vietnam

Email: pldthinh.ityu@gmail.com

Phone: 0389081824

Tan Le Duy

Affiliation: Vietnam National University, Ho Chi Minh City, Vietnam

Country: Vietnam

Email: ldtan@hcmiu.edu.vn

Phone: 0389081824

Le Anh Tien C

Affiliation: FPT University

Country: Vietnam

Email: tienla6@fe.edu.vn

Phone: 0389081824

Thuy Quang Dao

Affiliation: Ministry of Science and Technology

Country: Vietnam

Email: daoquangthuyukb@gmail.com

Phone: 0389081824