This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector—and what kinds of resources they rely upon to find help—so that more can be done to create a healthier working environment for all.
The dataset tracks key measures such as age, gender, and country to determine overall prevalence, along with responses surrounding employee access to care options; whether mental health or physical illness are being taken as seriously by employers; whether or not anonymity is protected with regards to seeking help; and how coworkers may perceive those struggling with mental illness issues such as depression or anxiety. With an ever-evolving landscape due to new technology advancing faster than ever before – these statistics have never been more important for us to analyze if we hope to remain true promoters of a healthy world inside and outside our office walls.
The Emotional First Aid Dataset is a comprehensive Chinese psychological counseling QA corpus, featuring 20,000 multi-turn dialogues. It is designed to support the development of AI applications in the field of psychological counseling and is available for research purposes.
This repository provides code and data for automatic depression detection using a GRU/BiLSTM-based model. It includes an emotional audio-textual corpus designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post-traumatic stress disorder.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.