Emotional First Aid Raw Dataset: Psychological Counseling QA Raw Corpus

Emotional First Aid Raw Dataset: Psychological Counseling QA Raw Corpus

The Emotional First Aid Raw Dataset is a collection of raw, unannotated psychological counseling Q&A data, designed to support research in AI applications for mental health. It contains over 172,000 topics with 2,381,273 messages, totaling 44,514,786 characters, providing a rich source of data for natural language processing and AI development.

Emotional First Aid Raw Dataset: Psychological Counseling QA Raw Corpus

Detailed Introduction

This dataset is a valuable resource for researchers and developers working on AI-powered psychological counseling tools. It includes a wide range of topics and detailed messages, making it suitable for tasks such as data preprocessing, model training, and dialogue generation. The data is sourced from public websites and has been anonymized and desensitized for privacy protection.

More
Dataset

Mental Health Large Model Lingxin (SoulChat)
View Details

Mental Health Large Model Lingxin (SoulChat)

Lingxin (SoulChat) is a psychological health large model fine-tuned with millions of Chinese long-text instructions and multi-turn empathetic dialogue data in the field of psychological counseling.

ISSP: International Social Science Survey Program
View Details

ISSP: International Social Science Survey Program

The ISSP is a cross-national collaboration program conducting annual surveys on diverse topics relevant to social sciences. Established in 1984, it includes members from various cultures around the globe. Over one million respondents have participated in ISSP surveys, and all collected data and documentation are available free of charge.

HuggingFaceFW/fineweb-2
View Details

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

More Categories

Keywords

Emotional First Aid Raw DatasetPsychological CounselingQA CorpusRaw DataResearch

Share