The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) uses epidemiological, behavioral, and neuroimaging data to understand how individuals can best retain cognitive abilities into old age. The Cam-CAN Data Access Portal provides access to datasets from the Cambridge Centre for Ageing and Neuroscience, including neuroimaging and cognitive data from participants aged 18-90.
Cam-CAN is a research project at the University of Cambridge focused on understanding cognitive decline and healthy ageing. It aims to improve our understanding of how people can maintain cognitive abilities as they age. The portal offers a range of data from the Cam-CAN project, including MRI and MEG scans, cognitive assessments, and demographic information. Researchers can apply for access to these freely available datasets to study cognitive ageing.
The Substance Abuse and Mental Health Data Archive (SAMHDA) provides a comprehensive collection of data sets related to mental health and substance use. It includes ongoing studies, population surveys, treatment facility surveys, and client-level data, offering valuable insights for researchers and policymakers.
The Chinese Psychological QA DataSet is a collection of 102,845 community Q&A pairs related to psychological topics., providing a rich source of data for research and development in psychological counseling and AI applications. Each entry includes detailed question and answer information, making it a valuable resource for understanding user queries and generating appropriate responses.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.