Every veteran knows and has had a 'Gunny': Semper Fidelis. This dataset is designed for conversational AI systems to assist veterans from various military branches, including U.S. and U.K. armed forces.
Every veteran knows and has had a 'Gunny': Semper Fidelis. This dataset is designed for conversational AI systems to assist veterans from various military branches, including U.S. and U.K. armed forces. The dataset uses multiple personas from different branches (9) to be exact, each dedicated to providing support for veterans dealing with PTSD and transitioning to civilian life. The personas offer advice rooted in discipline, accountability, and mental resilience, while maintaining the appropriate tone and ethos of each military branch. Each persona emphasizes the importance of seeking professional help when necessary, without substituting for therapy, but there is no guarentee. All data was generated using Meta's - Llama-3.2-3B-Instruct.
FineWeb is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. It is optimized for LLM performance and processed using the datatrove library. The dataset aims to provide high-quality data for training large language models and outperforms other commonly used web datasets.We’re on a journey to advance and democratize artificial intelligence through open source and open science.
This dataset contains 20,000 labelled English tweets of depressed and non-depressed users. The data is collected using the Twitter API and includes feature extraction techniques such as topic modelling and emoji sentiment analysis. It is designed for mental health classification at the tweet level.
The Lothian Diary Project consists of 125+ audio/video recordings collected from residents of Edinburgh and the Lothian counties in Scotland. Participants discuss their experiences during different stages of the Covid-19 pandemic. The recordings are accompanied by transcriptions and demographic information.