MentalManip数据集是由Wang等人(2024b)引入的,专门用于检测和分类心理操纵的对话数据集。该数据集包含4000个多轮虚构对话,来源于在线电影剧本,并进行了多层次的标注,包括操纵的存在、操纵技巧和目标脆弱性。数据集的创建旨在通过高质量的标注确保数据的一致性和准确性,从而支持心理操纵检测的研究。
MentalManip数据集是一个高质量的对话数据集,专门用于检测和分类心理操纵行为。该数据集包含4000个多轮虚构对话,来源于在线电影剧本,并进行了多层次的标注,包括操纵的存在、操纵技巧和目标脆弱性。该数据集主要应用于心理健康领域,旨在通过早期检测心理操纵行为,保护个体的心理健康。
tartuNLP/reddit-anhedonia by huggingface-mirror (hf-mirror)
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.
The ToM QA Dataset is designed to evaluate question-answering models' ability to reason about beliefs. It includes 3 task types and 4 question types, creating 12 total scenarios. The dataset is inspired by theory-of-mind experiments in developmental psychology and is used to test models' understanding of beliefs and inconsistent states of the world.