An evolving list of electronic media datasets used to model mental health status. This repository curates a variety of datasets from different sources, including social media platforms, online forums, and academic studies, to support research in mental health modeling and AI applications.
The Mental Health Datasets repository is a curated list of datasets that can be used to model and analyze mental health status. It includes datasets from various sources such as Reddit, Twitter, and online support forums, covering a wide range of mental health conditions like depression, anxiety, and suicidal ideation. This resource is invaluable for researchers and developers working on AI models for mental health support and intervention.For an overview of existing datasets, please consider reading the paper 'On the State of Social Media Data for Mental Health Research'.
Every veteran knows and has had a 'Gunny': Semper Fidelis. This dataset is designed for conversational AI systems to assist veterans from various military branches, including U.S. and U.K. armed forces.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.
This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector—and what kinds of resources they rely upon to find help—so that more can be done to create a healthier working environment for all.