Data Science for COVID-19 (DS4C)

Data Science for COVID-19 (DS4C)

The DS4C dataset is a structured collection of COVID-19 data from South Korea, based on reports from the Korea Centers for Disease Control & Prevention (KCDC) and local governments. It includes information on infections, patient routes, and various analyses. The dataset has been used for multiple research and visualization projects.

Data Science for COVID-19 (DS4C)

Detailed Introduction

The Data Science for COVID-19 (DS4C) project provides a comprehensive dataset for analyzing the COVID-19 pandemic in South Korea. The dataset includes detailed information on infections, patient routes, and other relevant data. It has been used for various research and visualization projects, including competitions and academic studies. The data is sourced from the KCDC and local governments, ensuring accuracy and reliability.

More
Dataset

ISSP: International Social Science Survey Program
View Details

ISSP: International Social Science Survey Program

The ISSP is a cross-national collaboration program conducting annual surveys on diverse topics relevant to social sciences. Established in 1984, it includes members from various cultures around the globe. Over one million respondents have participated in ISSP surveys, and all collected data and documentation are available free of charge.

Mental Health Large Model Lingxin (SoulChat)
View Details

Mental Health Large Model Lingxin (SoulChat)

Lingxin (SoulChat) is a psychological health large model fine-tuned with millions of Chinese long-text instructions and multi-turn empathetic dialogue data in the field of psychological counseling.

HuggingFaceFW/fineweb-2
View Details

HuggingFaceFW/fineweb-2

FineWeb-2 is a dataset of over 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

More Categories

Keywords

DS4CCOVID-19South KoreaKCDCData ScienceVisualizationResearchPublic Health

Share