tartuNLP/reddit-anhedonia by huggingface-mirror (hf-mirror)
Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through re-annotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments. A mental health professional (MHP) read all the posts in the subset and labelled them for the presence of loss of interest or pleasure (anhedonia). The MHP assigned three labels to each post: a) 'mentioned' if the symptom is talked about in the text, but it is not possible to infer its duration or intensity; b) 'answerable' if there is clear evidence of anhedonia; c) 'writer's symptoms' which shows whether the author of the post discusses themselves or a third person. Additionally, the MHP selected the part of the text that supports the positive label.
The data is originally source from (Sun et al,2021). (Liu et al, 2023) processed the data to make it a dataset vis huggingface api with taining/validation/testing splitting
Psychology Wiki Datasetpsychology_wiki数据集的构建基于心理学领域的英文维基百科内容,通过系统化的数据采集与整理,确保了信息的广泛覆盖与深度挖掘。数据集中的每一篇文章均经过严格的筛选与标注,涵盖了标题、正文、相关性、受欢迎程度及排名等多个维度,为心理学研究提供了丰富的文本资源。
This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector—and what kinds of resources they rely upon to find help—so that more can be done to create a healthier working environment for all.