Dataset Card for Psychology Therapy Dataset : This dataset card aims to provide information about a dataset focused on psychology therapy conversations. Language(s) (NLP): Turkish (tr)
This dataset contains 450 random entries within the scope of psychology therapy. Each entry consists of a user input describing a psychological issue and a therapist response aimed at providing support and guidance.
FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.
The Emotional First Aid Dataset is a comprehensive Chinese psychological counseling QA corpus, featuring 20,000 multi-turn dialogues. It is designed to support the development of AI applications in the field of psychological counseling and is available for research purposes.
This project implements the conversion algorithm from the ToMi dataset to the T4D (Thinking is for Doing) dataset, as introduced in the paper https://arxiv.org/abs/2310.03051. It filters examples with Theory of Mind (ToM) questions and adapts the algorithm to account for second-order false beliefs.