Paper Details
Abstract
Depression is a major mental health problem for university students, which often remains undiagnosed due to the limitations of traditional assessment tools, which depend solely on categorical responses. Recent developments in Natural Language Processing (NLP) offer an alternative approach which enables improved diagnostic accuracy. In this study, we explored the use of a hybrid architecture that combines Sentence-BERT (SBERT) for semantic sentence encoding and a Bidirectional Long Short-Term Memory (Bi-LSTM) network for sequence-based regression. The proposed approach aims to predict the total Patient Health Questionnaire-9 (PHQ-9) score based on free-text responses obtained from 250 university students. We evaluated the model using both regression metrics—Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)—as well as classification metrics including accuracy, precision, recall, and F1-score. The model achieved an average MAE of 1.579 and RMSE of 1.935, indicating high predictive accuracy and stability. In classification terms, it attained a weighted precision exceeding 95%, reflecting its reliability in categorizing depression severity levels. To further assess its effectiveness, we benchmarked the model against several baseline architectures. The hybrid SBERT + Bi-LSTM consistently outperformed these alternatives, particularly in classification tasks, confirming its robustness and practical utility for automated depression screening based on free-text input.