The Substance Abuse and Mental Health Data Archive (SAMHDA) provides a comprehensive collection of data sets related to mental health and substance use. It includes ongoing studies, population surveys, treatment facility surveys, and client-level data, offering valuable insights for researchers and policymakers.
SAMHDA is a valuable resource for researchers and professionals interested in mental health and substance use data. It provides a wide range of data sets, including the National Mental Health Services Survey (N-MHSS), Mental Health Client-Level Data (MH-CLD), and the National Survey on Drug Use and Health (NSDUH). These data sets cover various aspects of mental health and substance use, from treatment facilities to individual-level data, and are essential for understanding and addressing related issues.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.
The data is originally source from (Sun et al,2021). (Liu et al, 2023) processed the data to make it a dataset vis huggingface api with taining/validation/testing splitting
The WHO report on adolescent mental health describes actions undertaken by international development organizations to address adolescents’ mental health needs at the country level. It highlights the inadequacy of current efforts and the need for more coordinated and comprehensive interventions.