This project implements the conversion algorithm from the ToMi dataset to the T4D (Thinking is for Doing) dataset, as introduced in the paper https://arxiv.org/abs/2310.03051. It filters examples with Theory of Mind (ToM) questions and adapts the algorithm to account for second-order false beliefs.
The t4d project is a direct implementation of the conversion algorithm from the ToMi dataset to the T4D dataset. It is designed to filter and process examples that involve Theory of Mind questions, providing a valuable resource for researchers working on cognitive and social AI models. The project is built to convert a predefined dataset A (ToMi) to dataset B (T4D) and is licensed under the Apache License, Version 2.0.
Dataset Card for Psychology Therapy Dataset : This dataset card aims to provide information about a dataset focused on psychology therapy conversations. Language(s) (NLP): Turkish (tr)
HeartLink is an empathetic psychological model that uses a large language model fine-tuned on a large empathetic Q&A dataset. It can perceive users' emotions and experiences during conversations and provide empathetic responses using rich psychological knowledge, aiming to understand, comfort, and support users. The responses include emoji expressions to bridge the gap with users, offering psychological support and help during consultations.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.