Skip to content

neosanitizeThe browser-faithful HTML sanitizer

Zero dependencies. A WHATWG parser that matches the browser, deny-by-default behind an inviolable safe baseline — ~2.3× faster than sanitize-html and ~3 KB in the browser. Plus a byte-identical drop-in for sanitize-html.

neosanitize
2.3×
faster than sanitize-html (geomean, 13 scenarios)
~3 KB
browser build, brotli — zero parser bytes
100%
html5lib tokenizer conformance (6946/6946)
0
mXSS holes across a 20,000-case adversarial fuzz

Build a sanitizer, then sanitize

The API is class-only by design — you build a Sanitizer with an explicit policy and reuse it. There is deliberately no one-shot sanitize(html): forcing an explicit policy means no implicit global default to misconfigure, and the policy compiles once so calls stay cheap.

ts
import { Sanitizer } from 'neosanitize';
import * as presets from 'neosanitize/presets';

// Build once (compiles the policy), reuse everywhere.
const sanitizer = Sanitizer.builder(presets.ugc).allow('img', ['src', 'alt']).build();

sanitizer.sanitize('<p>hi <img src=x onerror=alert(1)> <script>bad()</script></p>');
// → '<p>hi <img src="x"> </p>'
//   the onerror handler is stripped, <script> is dropped with its content.

The inviolable safe baseline

Even if your allow-list permits them, the baseline always strips known-dangerous constructs — mirroring the browser's native setHTML(). An allow-list can never re-introduce them; only the explicit sanitizeUnsafe() opts out.

ts
const s = Sanitizer.builder({ tags: ['a'], attrs: { a: ['href', 'onclick'] } }).build();

s.sanitize('<a href="javascript:alert(1)" onclick="x()">click</a>');
// → '<a>click</a>'   ← javascript: URL and on* handler stripped despite being allow-listed

Want to know exactly what was removed and why? Use report mode:

ts
const { html, removed } = s.sanitizeWithReport('<a href=javascript:alert(1) onclick=x>y</a>');
// html    → '<a>y</a>'
// removed → [
//   { kind: 'url',  name: 'href',    reason: 'dangerous-url'  },
//   { kind: 'attr', name: 'onclick', reason: 'event-handler'  },
// ]

Two engines, one package

ts
// The new engine — browser-faithful WHATWG parser, deny-by-default:
import { Sanitizer } from 'neosanitize';

// A byte-identical drop-in for sanitize-html 2.x — same API, same output:
import sanitize from 'neosanitize/legacy';
sanitize('<img src=x onerror=alert(1) />', { allowedTags: ['img'], allowedAttributes: { img: ['src'] } });
// → '<img src="x" />'   (exactly what sanitize-html produces)

Performance

Both engines are faster than the original sanitize-htmllegacy (a byte-identical streaming drop-in) and modern (the default, browser-faithful WHATWG engine). Throughput in ops/sec, higher is better (pnpm bench:3way):

Scenariosanitize-htmllegacymodern
xss-attack2,0224,7455,095
entity-heavy9735,1493,600
attribute-filtering1,4922,9063,695
style-filtering1,9563,3304,400
large-document (113 KB)274834561
geomean (13 scenarios)1.00×2.75×2.28×

legacy is the fastest (a lean htmlparser2-style port, same output as sanitize-html); modern is ~2.3× the original while doing a full WHATWG parse + tree construction — the price of browser-faithfulness. See the full performance page.

Bundle size

BuildMin+gzipMin+brotliNotes
modern, browser~3.2 KB~2.9 KBnative DOMParser, no bundled parser
modern, Node / default~27 KB~23 KBbundled WHATWG parser + full entity table
legacy~21 KB~18 KBsingle-file sanitize-html port

Zero runtime dependencies. ESM. sideEffects: false with subpath exports — you ship only what you import.

Released under the MIT License.