Hugging Face Dataset - lsy641/PsyQA

Hugging Face Dataset - lsy641/PsyQA

The data is originally source from (Sun et al,2021). (Liu et al, 2023) processed the data to make it a dataset vis huggingface api with taining/validation/testing splitting

Hugging Face Dataset - lsy641/PsyQA

Szczegółowe wprowadzenie

The data is originally source from (Sun et al,2021). (Liu et al, 2023) processed the data to make it a dataset vis huggingface api with taining/validation/testing splitting

Więcej
Zestaw danych

Psych-101: Human Psychological Experiment Transcripts Dataset
Zobacz szczegóły

Psych-101: Human Psychological Experiment Transcripts Dataset

Psych-101 is a dataset of natural language transcripts from human psychological experiments, comprising trial-by-trial data from 160 experiments and 60,092 participants, making 10,681,650 choices. It provides valuable insights into human decision-making processes and is available under the Apache License 2.0.

Mental Health in Tech Survey
Zobacz szczegóły

Mental Health in Tech Survey

This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector—and what kinds of resources they rely upon to find help—so that more can be done to create a healthier working environment for all.

HuggingFaceFW/fineweb-2
Zobacz szczegóły

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

Keywords

Hugging FaceDatasetlsy641/PsyQA

Udostępnij