When to heal
- After a
mislabelormislabel_broadscan, to relabel suspects and drop OOD samples. - After a
poisoningscan, to drop tampered samples before training. - After a
text_analysisscan, to redact secrets and remove injection paragraphs from a fine‑tune corpus. - After a
bias_shortcutscan, to apply the recommended mitigations (cropping, dropping, rebalancing).
What the toggles do
| Toggle | Effect |
|---|---|
| Fix mislabeled | Rewrite labels according to the engine’s predictions. |
| Remove outliers | Drop OOD or outlier samples entirely. |
| Remove poisoned | Drop samples flagged by the poisoning engine. |
| Remove low quality | Drop borderline low‑confidence cases. |
| Confidence floor | Only act on findings above the chosen confidence threshold. |
How to heal
Open the source
Start from either the dataset detail page (Heal) or the scan
detail page (Heal from this scan). Healing from a scan
pre‑fills the dialog with that scan’s findings.
Pick what to apply
Flip the toggles you want. Set a confidence floor if you only
want to act on the highest‑confidence findings.
Pick the output
Healing always writes a result. You can choose:
- New child dataset, the original stays untouched. A lineage
edge
healed_fromlinks the child to the source. - New branch on the source, the cured contents live as a branch you can switch between.
(Optional) Auto‑rescan
Toggle Auto‑rescan after healing to immediately run the same
engine on the cured output. Useful for verifying the heal moved
the dataset out of
CRITICAL.Healing for text
Text healing is different from image healing because the unit of repair is the snippet, not the file.| Action | What it does |
|---|---|
| Redact secrets | Replace every detected secret span with [REDACTED_<type>] in place. The original document remains otherwise. |
| Strip injections | Remove paragraphs flagged as injection attempts. |
| Drop topic outliers | Remove paragraphs flagged as off‑topic. |
doc_id and offsets so the
audit trail can reconstruct what was changed.
What you get back
Every healing run produces:- A new dataset or branch with the cured contents.
- A zip download of the result.
- An action record in the audit trail (who triggered it, what toggles were set, which scan it descended from).
- A lineage edge linking the cured output back to the source dataset.
- (Optional) a follow‑up scan if you enabled auto‑rescan.
Common workflows
Clean up a public dataset before training
Clean up a public dataset before training
- Run
mislabel_broadandpoisoning. - Heal with Remove outliers, Remove poisoned, and Fix mislabeled at confidence floor 0.85.
- Enable Auto‑rescan and confirm the result is
HEALTHYorUNHEALTHY−. - Train on the cured branch.
Sanitize a fine‑tune corpus
Sanitize a fine‑tune corpus
- Run
text_analysis. - Heal with Redact secrets and Strip injections.
- Re‑run
text_analysisto confirm zeroCRITICALfindings remain. - Export the cured corpus.
Iterate on a noisy labelling effort
Iterate on a noisy labelling effort
- Run
mislabel. - Heal only the top 10% of suspects (high confidence floor) into a new branch.
- Have a reviewer compare the original and cured branches side by side.
- Lower the confidence floor for the next pass once the reviewer signs off.
What healing does not do
- It does not modify the original dataset unless you explicitly pick “main branch” as the output target.
- It does not run a fresh scan automatically (you have to opt in via Auto‑rescan).
- It does not delete the source scan, so you can re‑heal with different toggles at any time.

