Skip to main content
Healing takes a scan’s findings and produces a cured copy of the dataset with the bad samples either fixed or removed. You choose which fixes to apply; every action is captured in the audit trail.

When to heal

  • After a mislabel or mislabel_broad scan, to relabel suspects and drop OOD samples.
  • After a poisoning scan, to drop tampered samples before training.
  • After a text_analysis scan, to redact secrets and remove injection paragraphs from a fine‑tune corpus.
  • After a bias_shortcut scan, to apply the recommended mitigations (cropping, dropping, rebalancing).

What the toggles do

ToggleEffect
Fix mislabeledRewrite labels according to the engine’s predictions.
Remove outliersDrop OOD or outlier samples entirely.
Remove poisonedDrop samples flagged by the poisoning engine.
Remove low qualityDrop borderline low‑confidence cases.
Confidence floorOnly act on findings above the chosen confidence threshold.
Toggles are independent. You can drop poisoned samples without touching mislabels, or relabel mislabels without dropping anything.

How to heal

1

Open the source

Start from either the dataset detail page (Heal) or the scan detail page (Heal from this scan). Healing from a scan pre‑fills the dialog with that scan’s findings.
2

Pick what to apply

Flip the toggles you want. Set a confidence floor if you only want to act on the highest‑confidence findings.
3

Pick the output

Healing always writes a result. You can choose:
  • New child dataset, the original stays untouched. A lineage edge healed_from links the child to the source.
  • New branch on the source, the cured contents live as a branch you can switch between.
A zip download is always offered alongside.
4

(Optional) Auto‑rescan

Toggle Auto‑rescan after healing to immediately run the same engine on the cured output. Useful for verifying the heal moved the dataset out of CRITICAL.
5

Submit and watch

Healing runs as a background job with its own progress stream. The result appears in the dataset’s lineage graph.
When healing for the first time on a noisy dataset, set the confidence floor high (e.g. 0.9) and inspect the cured branch before relabeling everything. Lower the floor incrementally once you trust the engine’s calls.

Healing for text

Text healing is different from image healing because the unit of repair is the snippet, not the file.
ActionWhat it does
Redact secretsReplace every detected secret span with [REDACTED_<type>] in place. The original document remains otherwise.
Strip injectionsRemove paragraphs flagged as injection attempts.
Drop topic outliersRemove paragraphs flagged as off‑topic.
Redacted documents keep their original doc_id and offsets so the audit trail can reconstruct what was changed.

What you get back

Every healing run produces:
  • A new dataset or branch with the cured contents.
  • A zip download of the result.
  • An action record in the audit trail (who triggered it, what toggles were set, which scan it descended from).
  • A lineage edge linking the cured output back to the source dataset.
  • (Optional) a follow‑up scan if you enabled auto‑rescan.

Common workflows

  1. Run mislabel_broad and poisoning.
  2. Heal with Remove outliers, Remove poisoned, and Fix mislabeled at confidence floor 0.85.
  3. Enable Auto‑rescan and confirm the result is HEALTHY or UNHEALTHY−.
  4. Train on the cured branch.
  1. Run text_analysis.
  2. Heal with Redact secrets and Strip injections.
  3. Re‑run text_analysis to confirm zero CRITICAL findings remain.
  4. Export the cured corpus.
  1. Run mislabel.
  2. Heal only the top 10% of suspects (high confidence floor) into a new branch.
  3. Have a reviewer compare the original and cured branches side by side.
  4. Lower the confidence floor for the next pass once the reviewer signs off.

What healing does not do

  • It does not modify the original dataset unless you explicitly pick “main branch” as the output target.
  • It does not run a fresh scan automatically (you have to opt in via Auto‑rescan).
  • It does not delete the source scan, so you can re‑heal with different toggles at any time.