Creating a dataset
From Datasets → New, pick a source. Each source captures credentials once and remembers them for next time.- Local upload
- Code / model hubs
- Object storage
- From a ticket
| Source | Accepts |
|---|---|
| Upload ZIP | A zipped folder tree. Class name is taken from the parent folder. |
| Upload folder | Drag‑drop a directory from your machine. Same class convention. |
| Paste text | Plain‑text corpus, treated as a single text document. |
Dataset detail page (/datasets/:id)
The dataset page is your home base for one piece of data.
| Tab / section | What it does |
|---|---|
| Preview grid | Lazy‑loaded thumbnails for image datasets, zoom on hover, class filter. |
| File tree | The dataset’s files with size, type, and class. Right‑click for actions (rename, move, exclude). |
| Action history | Every upload, scan, healing, branch, and edit, attributed to a user, timestamped, exportable. |
| Branches | Virtual versions of the dataset, each backed by a manifest CSV. |
| Lineage | Graph of parent / child datasets produced by healing, branching, or imports. |
| Scan history | Past scans against this dataset, with severity and duration. |
| Scheduler | Assign a cron schedule. See Automation. |
| Settings | Rename, move between projects, configure webhooks, change webhook URL, delete. |
Branches
Branches let you maintain alternative versions of the same dataset without physically copying files. Each branch is a manifest CSV that declares which files are in or out, plus any label overrides. Typical uses:- A clean branch after healing, so you can A/B train on cured vs raw data without losing the original.
- A balanced branch with a downsampled subset for fast iteration.
- An experimental branch with tweaked labels for an A/B scan.
Create a branch
From the Branches tab, click New branch, give it a name,
optionally describe how it differs from the main branch.
Define what's in it
Either upload a manifest CSV (
path,split,label,include) or use
the inline editor to flip the include flag on individual files.Run a scan against it
The branch shows as a selector on the New scan screen. Pick
it and Antidote scans only the branch’s manifest.
Lineage
The lineage view is a directed graph showing where a dataset came from and what was produced from it.- Healing produces a child dataset with a
healed_fromedge. - Branching produces a child with a
branched_fromedge. - Playground publishes produce a child with a
playground_sessionedge. - Imports keep a reference to the original
source_uri.
What to expect
- Dataset uploads under ~5 GB usually appear within seconds.
- Larger uploads stream progress via the WebSocket and resume cleanly if the network drops.
- Statistics (
valid_images,invalid_images,classes,class_count) populate as soon as ingestion finishes; you don’t have to wait for a scan to see them.

