Version code, models, and datasets together in GitHub

XetData integration for GitHub upgrades your repos to handle over 100 terabytes

Same Git. Different scale.

Clone your repo from GitHub

git clone git@github.com@spock/mistral.git

Branch a new version

git checkout -b spock/baseline

Stage your changes

git add .

Commit code, large files, etc

git commit -am "Added 7B model"

Push all changes to GitHub

git push origin spock/baseline

Make GitHub your single source of truth for ML

Review dataset and model diffs in GitHub


The PRs you love, now scaled to support ML. The XetData app upgrades your GitHub experience.

When you use pull requests to review models alongside your code and data, the XetData app will show difference visualizations of  model architecture changes via Netron for improved understandability.

Track your ML models and deployments with Git hashes


Stop saving models with complicated naming conventions that combine folder names, dates, and version number. Let Git do the heavy lifting and rely on Git hashes to tell you what was made when.

Instead of reinventing the wheel, we opted for the same approach software teams have been using for decades.

Only upload changes to your large files


Our block-level deduplication algorithm minimizes time spent waiting for file uploads and downloads while saving on storage costs.

It's so novel that we wrote a paper on it (CIDR'23). The takeaway? We fast. Read more about our performance against Git LFS, DVC, and LakeFS in our benchmark blog post.

Frequently Asked Questions

Works with your existing data tools

Keep your existing file formats, libraries, ML frameworks, and IDEs.

Stop gitignoring your data

Install the XetData integration for GitHub today to version your data and models alongside your code.
Get started for free