What's in the RedPajama-Data-1T LLM training set

By A Mystery Man Writer

Description

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, …

What's in the RedPajama-Data-1T LLM training set

RedPajama 7B now available, instruct model outperforms all open

Ahead of AI #8: The Latest Open Source LLMs and Datasets

Supervised Fine-tuning: customizing LLMs

Data analysis with SQLite and Python - Tutorial

Releasing 3B and 7B RedPajama-INCITE family of models including

How we built better GenAI with programmatic data development

SlimPajama: A 627B token, cleaned and deduplicated version of

RedPajama Project: An Open-Source Initiative to Democratizing LLMs

from per adult (price varies by group size)

What's in the RedPajama-Data-1T LLM training set

Related products

You may also like