t4d: Conversion Algorithm from ToMi to T4D Dataset

This project implements the conversion algorithm from the ToMi dataset to the T4D (Thinking is for Doing) dataset, as introduced in the paper https://arxiv.org/abs/2310.03051. It filters examples with Theory of Mind (ToM) questions and adapts the algorithm to account for second-order false beliefs.

詳細な紹介

The t4d project is a direct implementation of the conversion algorithm from the ToMi dataset to the T4D dataset. It is designed to filter and process examples that involve Theory of Mind questions, providing a valuable resource for researchers working on cognitive and social AI models. The project is built to convert a predefined dataset A (ToMi) to dataset B (T4D) and is licensed under the Apache License, Version 2.0.

Visit Website

もっと
データセット

Psych-101: Human Psychological Experiment Transcripts Dataset

Psych-101 is a dataset of natural language transcripts from human psychological experiments, comprising trial-by-trial data from 160 experiments and 60,092 participants, making 10,681,650 choices. It provides valuable insights into human decision-making processes and is available under the Apache License 2.0.

SmartFlowAI/EmoLLM: Psychology LLM、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2

Psychology LLM、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2 - SmartFlowAI/EmoLLM

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

ウェブサイトURL

https://github.com/sachith-gunasekara/t4d

カテゴリー

データセット AI LLM

キーワード

t4dToMiT4DTheory of MindConversion AlgorithmAIResearch