The Lothian Diary Project consists of 125+ audio/video recordings collected from residents of Edinburgh and the Lothian counties in Scotland. Participants discuss their experiences during different stages of the Covid-19 pandemic. The recordings are accompanied by transcriptions and demographic information.
The Lothian Diary Project is a unique collection of personal experiences during the Covid-19 pandemic. It includes over 125 audio and video recordings from residents of Edinburgh and the Lothian counties in Scotland. Each recording is accompanied by a transcription and demographic information, providing a rich resource for social and health research. The project aims to document the impact of the pandemic on individuals and communities.
Every veteran knows and has had a 'Gunny': Semper Fidelis. This dataset is designed for conversational AI systems to assist veterans from various military branches, including U.S. and U.K. armed forces.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.
An evolving list of electronic media datasets used to model mental health status. This repository curates a variety of datasets from different sources, including social media platforms, online forums, and academic studies, to support research in mental health modeling and AI applications.