The Lothian Diary Project consists of 125+ audio/video recordings collected from residents of Edinburgh and the Lothian counties in Scotland. Participants discuss their experiences during different stages of the Covid-19 pandemic. The recordings are accompanied by transcriptions and demographic information.
The Lothian Diary Project is a unique collection of personal experiences during the Covid-19 pandemic. It includes over 125 audio and video recordings from residents of Edinburgh and the Lothian counties in Scotland. Each recording is accompanied by a transcription and demographic information, providing a rich resource for social and health research. The project aims to document the impact of the pandemic on individuals and communities.
The Weibo User Depression Detection Dataset is a large-scale dataset for detecting depression in Weibo users. It includes user profiles, tweets, and labels indicating whether the user is depressed. The dataset is useful for researchers working on mental health and social media analysis.
APA PsycInfo is the premier abstracting and indexing database covering the behavioral and social sciences. It provides over 5,000,000 peer-reviewed records, 144 million cited references, and spans 600 years of content. The database is updated twice-weekly and includes research in 30 languages from 50 countries.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.