Browser-faithful parse
A WHATWG/HTML5 tokenizer at 100% html5lib conformance. The tree you sanitize is the tree a browser builds — which closes parser-differential and mutation-XSS gaps by construction.
Zero dependencies. A WHATWG parser that matches the browser, deny-by-default behind an inviolable safe baseline — ~2.3× faster than sanitize-html and ~3 KB in the browser. Plus a byte-identical drop-in for sanitize-html.
The API is class-only by design — you build a Sanitizer with an explicit policy and reuse it. There is deliberately no one-shot sanitize(html): forcing an explicit policy means no implicit global default to misconfigure, and the policy compiles once so calls stay cheap.
import { Sanitizer } from 'neosanitize';
import * as presets from 'neosanitize/presets';
// Build once (compiles the policy), reuse everywhere.
const sanitizer = Sanitizer.builder(presets.ugc).allow('img', ['src', 'alt']).build();
sanitizer.sanitize('<p>hi <img src=x onerror=alert(1)> <script>bad()</script></p>');
// → '<p>hi <img src="x"> </p>'
// the onerror handler is stripped, <script> is dropped with its content.Even if your allow-list permits them, the baseline always strips known-dangerous constructs — mirroring the browser's native setHTML(). An allow-list can never re-introduce them; only the explicit sanitizeUnsafe() opts out.
const s = Sanitizer.builder({ tags: ['a'], attrs: { a: ['href', 'onclick'] } }).build();
s.sanitize('<a href="javascript:alert(1)" onclick="x()">click</a>');
// → '<a>click</a>' ← javascript: URL and on* handler stripped despite being allow-listedWant to know exactly what was removed and why? Use report mode:
const { html, removed } = s.sanitizeWithReport('<a href=javascript:alert(1) onclick=x>y</a>');
// html → '<a>y</a>'
// removed → [
// { kind: 'url', name: 'href', reason: 'dangerous-url' },
// { kind: 'attr', name: 'onclick', reason: 'event-handler' },
// ]// The new engine — browser-faithful WHATWG parser, deny-by-default:
import { Sanitizer } from 'neosanitize';
// A byte-identical drop-in for sanitize-html 2.x — same API, same output:
import sanitize from 'neosanitize/legacy';
sanitize('<img src=x onerror=alert(1) />', { allowedTags: ['img'], allowedAttributes: { img: ['src'] } });
// → '<img src="x" />' (exactly what sanitize-html produces)Both engines are faster than the original sanitize-html — legacy (a byte-identical streaming drop-in) and modern (the default, browser-faithful WHATWG engine). Throughput in ops/sec, higher is better (pnpm bench:3way):
| Scenario | sanitize-html | legacy | modern |
|---|---|---|---|
| xss-attack | 2,022 | 4,745 | 5,095 |
| entity-heavy | 973 | 5,149 | 3,600 |
| attribute-filtering | 1,492 | 2,906 | 3,695 |
| style-filtering | 1,956 | 3,330 | 4,400 |
| large-document (113 KB) | 274 | 834 | 561 |
| geomean (13 scenarios) | 1.00× | 2.75× | 2.28× |
legacy is the fastest (a lean htmlparser2-style port, same output as sanitize-html); modern is ~2.3× the original while doing a full WHATWG parse + tree construction — the price of browser-faithfulness. See the full performance page.
| Build | Min+gzip | Min+brotli | Notes |
|---|---|---|---|
modern, browser | ~3.2 KB | ~2.9 KB | native DOMParser, no bundled parser |
modern, Node / default | ~27 KB | ~23 KB | bundled WHATWG parser + full entity table |
legacy | ~21 KB | ~18 KB | single-file sanitize-html port |
Zero runtime dependencies. ESM.
sideEffects: falsewith subpath exports — you ship only what you import.