The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) uses epidemiological, behavioral, and neuroimaging data to understand how individuals can best retain cognitive abilities into old age. The Cam-CAN Data Access Portal provides access to datasets from the Cambridge Centre for Ageing and Neuroscience, including neuroimaging and cognitive data from participants aged 18-90.
Cam-CAN is a research project at the University of Cambridge focused on understanding cognitive decline and healthy ageing. It aims to improve our understanding of how people can maintain cognitive abilities as they age. The portal offers a range of data from the Cam-CAN project, including MRI and MEG scans, cognitive assessments, and demographic information. Researchers can apply for access to these freely available datasets to study cognitive ageing.
The ToM QA Dataset is designed to evaluate question-answering models' ability to reason about beliefs. It includes 3 task types and 4 question types, creating 12 total scenarios. The dataset is inspired by theory-of-mind experiments in developmental psychology and is used to test models' understanding of beliefs and inconsistent states of the world.
The Lothian Diary Project consists of 125+ audio/video recordings collected from residents of Edinburgh and the Lothian counties in Scotland. Participants discuss their experiences during different stages of the Covid-19 pandemic. The recordings are accompanied by transcriptions and demographic information.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.