Skip to main content
Antidote Data Integrity scans your datasets for the problems that silently break ML training: wrong labels, outliers, poisoned samples, hidden biases, leaked secrets, prompt injections, and compliance gaps. It then helps you triage, fix, report, and audit the results. Use this page to learn the vocabulary; every other page in this section assumes it.

Getting started

1

Sign in

Open the workspace URL in your welcome email and sign in with the owner credentials your account manager sent you. Invited teammates receive an Accept Invite email that sets their password and drops them straight into the workspace.
2

Set workspace defaults

Open Settings, General to set your workspace name and timezone. The timezone controls how schedules and audit timestamps render across the app.
3

Connect a data source

Open Settings, Integrations and connect at least one source you plan to scan. The supported list lives on the Integrations page.
4

Create a project, upload a dataset, scan it

Projects organise everything. From there, upload a dataset and launch a scan from the dataset detail page. The Quickstart walks the full first run.

Core concepts

ConceptWhat it means
ProjectWorkspace container. Holds datasets, scan history, report templates, integration settings, schedules.
DatasetThe actual data being analyzed: image folder, NIfTI volume, text corpus, detection set.
ScanA single run of one engine against one dataset version. Produces results, artifacts, and a severity.
ResultA per‑item finding produced by a scan (one image, one paragraph, one case).
HealingThe remediation phase. Produces a cured copy of the dataset with the bad samples fixed or removed.
BranchA virtual version of a dataset, driven by a manifest CSV. Lets you compare and iterate without copying.
ReportA generated document (PDF / HTML / JSON) describing findings, optionally in a compliance format.
SeverityA four‑level rollup: HEALTHY, UNHEALTHY−, UNHEALTHY+, CRITICAL.

The severity scale

Every engine rolls up to the same four‑level scale so dashboards, reports, and the audit trail stay consistent.
LabelWhen you’ll see it
HEALTHYNo issues found. Dataset is safe to use as is.
UNHEALTHY−A small number of issues (typically < 15% of samples). Review before training.
UNHEALTHY+Significant problems (15–50%). Training on this data is risky.
CRITICALMajority of samples flagged (≥ 50%) or severe leakage / poisoning. Do not train.
The same severity badge shows up everywhere a dataset appears: the dataset table, the dashboard widgets, the compliance report cover page, and webhook payloads. Treat it as the single‑number summary.

Where to next

Projects

Organize datasets, schedules, and report templates by team or initiative.

Datasets

Upload, branch, version, and explore your data.

Running scans

Launch a scan, monitor progress, and triage results.

Engines

The seven scanning engines and what each one finds.