This paper discusses Helply - a synthesized ML training dataset focused on psychology and therapy, created by Alex Scott and published by NamelessAI. The dataset developed by Alex Scott is a comprehensive collection of synthesized data designed to train LLMs in understanding psychological and therapeutic contexts. This dataset aims to simulate real-world interactions between therapists and patients, enabling ML models to learn from a wide range of scenarios and therapeutic techniques.
The Helply dataset is a comprehensive synthetic ML training dataset created by Alex Scott and released by NamelessAI, focusing on the fields of psychology and therapy. The dataset is designed to train large language models (LLMs) to understand and simulate human psychological processes. By combining existing psychology literature, therapy session records, and patient self-report data, the Helply dataset covers a variety of treatment scenarios, such as cognitive behavioral therapy (CBT), internal family systems (IFS), and internet-based cognitive behavioral therapy (iCBT). In addition, the dataset emphasizes the dynamic interaction between patients and therapists, capturing communication details that affect treatment outcomes. Despite challenges such as ethical considerations and model generalization, the Helply dataset has revolutionary potential to change the understanding and application of therapeutic practices in digital environments.
The DS4C dataset is a structured collection of COVID-19 data from South Korea, based on reports from the Korea Centers for Disease Control & Prevention (KCDC) and local governments. It includes information on infections, patient routes, and various analyses. The dataset has been used for multiple research and visualization projects.
FineWeb is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. It is optimized for LLM performance and processed using the datatrove library. The dataset aims to provide high-quality data for training large language models and outperforms other commonly used web datasets.We’re on a journey to advance and democratize artificial intelligence through open source and open science.
The IC-AnnoMI repository contains source code and a synthetic dataset generated through in-context zero-shot LLM prompting for mental health and therapeutic counselling. IC-AnnoMI is a project that generates contextual MI dialogues using large language models (LLMs). The project contains source code and a synthetic dataset generated through zero-shot prompts, aiming to address the data scarcity and inherent bias problems in mental health and therapeutic consultation.