Version code, models, and datasets together in GitHub
XetData integration for GitHub upgrades your repos to handle over 100 terabytes
Same Git. Different scale.
Clone your repo from GitHub
git clone git@github.com@spock/mistral.git
Branch a new version
git checkout -b spock/baseline
Stage your changes
git add .
Commit code, large files, etc
git commit -am "Added 7B model"
Push all changes to GitHub
git push origin spock/baseline
Make GitHub your single source of truth for ML
Review dataset and model diffs in GitHub
The PRs you love, now scaled to support ML. The XetData app upgrades your GitHub experience.
When you use pull requests to review models alongside your code and data, the XetData app will show difference visualizations of model architecture changes via Netron for improved understandability.


Track your ML models and deployments with Git hashes
Stop saving models with complicated naming conventions that combine folder names, dates, and version number. Let Git do the heavy lifting and rely on Git hashes to tell you what was made when.
Instead of reinventing the wheel, we opted for the same approach software teams have been using for decades.
Only upload changes to your large files
Our block-level deduplication algorithm minimizes time spent waiting for file uploads and downloads while saving on storage costs.
It's so novel that we wrote a paper on it (CIDR'23). The takeaway? We fast. Read more about our performance against Git LFS, DVC, and LakeFS in our benchmark blog post.

Frequently Asked Questions
Works with your existing data tools
Keep your existing file formats, libraries, ML frameworks, and IDEs.


