tartuNLP/Reddit Anhedonia Dataset - hf-mirror

tartuNLP/Reddit Anhedonia Dataset - hf-mirror

tartuNLP/reddit-anhedonia by huggingface-mirror (hf-mirror)

tartuNLP/Reddit Anhedonia Dataset - hf-mirror

Detailed Introduction

Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through re-annotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments. A mental health professional (MHP) read all the posts in the subset and labelled them for the presence of loss of interest or pleasure (anhedonia). The MHP assigned three labels to each post: a) 'mentioned' if the symptom is talked about in the text, but it is not possible to infer its duration or intensity; b) 'answerable' if there is clear evidence of anhedonia; c) 'writer's symptoms' which shows whether the author of the post discusses themselves or a third person. Additionally, the MHP selected the part of the text that supports the positive label.

More
Dataset

Hugging Face Dataset - NamelessAI Helply
View Details

Hugging Face Dataset - NamelessAI Helply

This paper discusses Helply - a synthesized ML training dataset focused on psychology and therapy, created by Alex Scott and published by NamelessAI. The dataset developed by Alex Scott is a comprehensive collection of synthesized data designed to train LLMs in understanding psychological and therapeutic contexts. This dataset aims to simulate real-world interactions between therapists and patients, enabling ML models to learn from a wide range of scenarios and therapeutic techniques.

Mental Health Large Model Lingxin (SoulChat)
View Details

Mental Health Large Model Lingxin (SoulChat)

Lingxin (SoulChat) is a psychological health large model fine-tuned with millions of Chinese long-text instructions and multi-turn empathetic dialogue data in the field of psychological counseling.

HuggingFaceFW/fineweb-2
View Details

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

More Categories

Keywords

Reddit AnhedoniaDatasetHugging FacePRIMATEtartuNLP

Share