tartuNLP/Reddit Anhedonia Dataset - hf-mirror

tartuNLP/Reddit Anhedonia Dataset - hf-mirror

tartuNLP/reddit-anhedonia by huggingface-mirror (hf-mirror)

tartuNLP/Reddit Anhedonia Dataset - hf-mirror

Detaylı Giriş

Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through re-annotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments. A mental health professional (MHP) read all the posts in the subset and labelled them for the presence of loss of interest or pleasure (anhedonia). The MHP assigned three labels to each post: a) 'mentioned' if the symptom is talked about in the text, but it is not possible to infer its duration or intensity; b) 'answerable' if there is clear evidence of anhedonia; c) 'writer's symptoms' which shows whether the author of the post discusses themselves or a third person. Additionally, the MHP selected the part of the text that supports the positive label.

Daha fazla
Veri seti

Cam-CAN Data Access Portal | Cambridge Centre for Ageing and Neuroscience (Cam-CAN) | University of Cambridge
Detayları Görüntüle

Cam-CAN Data Access Portal | Cambridge Centre for Ageing and Neuroscience (Cam-CAN) | University of Cambridge

The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) uses epidemiological, behavioral, and neuroimaging data to understand how individuals can best retain cognitive abilities into old age. The Cam-CAN Data Access Portal provides access to datasets from the Cambridge Centre for Ageing and Neuroscience, including neuroimaging and cognitive data from participants aged 18-90.

HuggingFaceFW/fineweb-2
Detayları Görüntüle

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

Ithaka 2006 Survey of US Higher Education Faculty Attitudes and Behaviors
Detayları Görüntüle

Ithaka 2006 Survey of US Higher Education Faculty Attitudes and Behaviors

This study surveys the attitudes and behaviors of US higher education faculty members regarding online resources, the library, and related topics. It covers a wide range of issues, including faculty dependence on electronic scholarly resources, the transition from print to electronic journals, publishing preferences, e-books, and the preservation of scholarly journals.

Anahtar Kelimeler

Reddit AnhedoniaDatasetHugging FacePRIMATEtartuNLP

Paylaş