The Emotional First Aid Raw Dataset is a collection of raw, unannotated psychological counseling Q&A data, designed to support research in AI applications for mental health. It contains over 172,000 topics with 2,381,273 messages, totaling 44,514,786 characters, providing a rich source of data for natural language processing and AI development.
This dataset is a valuable resource for researchers and developers working on AI-powered psychological counseling tools. It includes a wide range of topics and detailed messages, making it suitable for tasks such as data preprocessing, model training, and dialogue generation. The data is sourced from public websites and has been anonymized and desensitized for privacy protection.
This dataset contains 20,000 labelled English tweets of depressed and non-depressed users. The data is collected using the Twitter API and includes feature extraction techniques such as topic modelling and emoji sentiment analysis. It is designed for mental health classification at the tweet level.
FineWeb is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. It is optimized for LLM performance and processed using the datatrove library. The dataset aims to provide high-quality data for training large language models and outperforms other commonly used web datasets.We’re on a journey to advance and democratize artificial intelligence through open source and open science.
MentalManip数据集是由Wang等人(2024b)引入的,专门用于检测和分类心理操纵的对话数据集。该数据集包含4000个多轮虚构对话,来源于在线电影剧本,并进行了多层次的标注,包括操纵的存在、操纵技巧和目标脆弱性。数据集的创建旨在通过高质量的标注确保数据的一致性和准确性,从而支持心理操纵检测的研究。