What's in the RedPajama-Data-1T LLM training set
By A Mystery Man Writer
Description
RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, …
RedPajama 7B now available, instruct model outperforms all open
Ahead of AI #8: The Latest Open Source LLMs and Datasets
Supervised Fine-tuning: customizing LLMs
Data analysis with SQLite and Python - Tutorial
Releasing 3B and 7B RedPajama-INCITE family of models including
How we built better GenAI with programmatic data development
SlimPajama: A 627B token, cleaned and deduplicated version of
RedPajama Project: An Open-Source Initiative to Democratizing LLMs
RedPajama Project: An Open-Source Initiative to Democratizing LLMs
from
per adult (price varies by group size)