Code Paste Cleaner - Remove Zero-Width from Source Code

Zero-width spaces in string literals

A JSON key that looks identical to production yet fails parse. A test assertion where expected and actual "match" visually. A regex that never fires. Developers burn hours on these before discovering U+200B in a pasted literal.

Paste the suspect string here before committing fixtures, .env samples, or API mocks. The cleaner preserves visible code while stripping watermark-class bytes.

String equality in JavaScript, Python, Go, and Rust compares code points, not glyphs. Two literals that render identically in your editor can fail strict equality when one carries U+200B. The bug surfaces in config loaders, i18n files, and API contract tests — never in syntax highlighting.

JSON.parse and YAML loaders are unforgiving. A zero-width space inside a key name produces “Unexpected token” errors that point at the wrong line because the parser sees a different character than your eyes do. Cleaning the pasted payload before it enters your repo prevents those red herrings.

Regular expressions fail silently too. A pattern that should match a URL or UUID will not when the haystack includes invisible joiners. Paste the failing string here, copy the cleaned output into your test fixture, and rerun. If the regex fires, you have confirmed a clipboard issue rather than a logic bug.

AI-assisted coding workflows

Copilot, ChatGPT, and Claude paste into IDE buffers carry the same invisible markers as prose. Clean before saving when the snippet will become a constant, test fixture, or localization string.

For whole files, scan suspicious regions rather than entire repositories — focus on strings that failed equality or parsing.

AI coding assistants optimize for plausible-looking output, not byte-accurate literals. A generated .env.example, Docker Compose snippet, or GitHub Actions YAML block can include zero-width spaces in hostnames, image tags, or secret placeholders. Clean before commit so CI pipelines do not fail on invisible differences.

Test-driven workflows amplify the pain. You paste an expected API response from a chat into a Jest or pytest fixture; the test fails with a diff that shows identical strings. Developers blame flapping tests or async timing when the real culprit is U+200B in the expected value.

Localization and CMS string tables are long-lived. A contaminated translation key copied from AI-assisted docs can break production builds months later when a new locale imports the same bad bytes. Clean at paste time when the string first enters your codebase.

Focus scans on high-risk regions: quoted strings, .env lines, JSON keys, SQL WHERE clauses, and regex patterns. Pasting an entire thousand-line file works within word limits, but targeted scans make it easier to see which literal caused the failure.

Prevention in team standards

Add clipboard hygiene to code review checklists for config changes copied from tickets. Pair with pre-commit hooks for BOM detection on new files.

Code review should flag config-only diffs that introduce new string literals from external sources. Ask one question: did this value pass through a clipboard cleaner? Thirty seconds of reviewer habit prevents merge of contaminated fixtures that break staging environments.

Pre-commit hooks detecting UTF-8 BOMs catch one class of invisible bytes; this cleaner catches zero-width spaces and soft hyphens hooks miss. Use both — hooks on file save, cleaner on paste from Slack, email, or AI chat.

On-call runbooks benefit from a clean step when reproducing customer payloads. Paste the reported JSON or curl command through this tool before adding it to a repro script. Incidents close faster when the repro matches production bytes exactly.

Document the workflow in your engineering wiki: AI snippet → code paste cleaner → IDE → test → commit. New hires learn once and avoid the recurring “mystery string” debugging sessions that waste senior engineer time across every team.

How to clean pasted code for invisible Unicode

Checking a piece of AI-generated text for invisible watermarks takes less than a minute:

Copy your AI-generated text. Copy the suspect string, config snippet, or code block from docs, Stack Overflow, or an AI chat.
Paste into the checker. Paste the text into the input box on this page.
Run the check. Click Check for watermarks. The tool scans for invisible Unicode characters and hidden formatting markers in seconds.
Copy the cleaned output. Review the detection report, then copy the cleaned, watermark-free version of your text.

Invisible characters that break code

AI systems can hide two broadly different kinds of signal in their output. Our checker is specifically built to detect and remove the first kind — invisible Unicode characters. The second kind, statistical watermarks, requires rewriting to neutralise.

Invisible Unicode watermarks

These are real characters inserted between visible letters that don't render on screen. They travel with copy-paste, get carried into Word documents, Google Docs and CMS fields, and can fingerprint text back to the model that produced it. The checker scans for:

Zero-width space (U+200B)
Zero-width non-joiner (U+200C) and zero-width joiner (U+200D)
Word joiner (U+2060)
Soft hyphen (U+00AD)
Variation selectors (U+FE00 - U+FE0F)
Left-to-right and right-to-left marks (U+200E / U+200F)
Byte order mark / ZWNBSP (U+FEFF)
Other non-printing formatting characters commonly used as covert channels

Statistical (cryptographic) watermarks

These are patterns in which words the model chooses. They are imperceptible in any one sentence and only emerge over many words. A Unicode scan cannot remove them — to neutralise a statistical watermark you typically need to lightly rewrite the text. Our guide to natural AI writing techniques covers how to do this without losing meaning.

Frequently asked questions

Will indentation be preserved?

Visible whitespace including tabs and spaces remains. Only invisible watermark bytes are removed.

Can I clean JSON and YAML?

Yes. Paste the payload text, copy cleaned output back into your editor.

Does this fix all syntax errors?

No. It fixes invisible-character issues. Typos and logic bugs need normal debugging.

Safe for production secrets paste?

Process locally for detection; follow your security policy on pasting secrets into web tools.

Is this watermark checker free?

Yes. You can scan up to 500 words without an account. Sign in for longer documents, full cleaned text, and a character-level breakdown of every hidden marker removed.

Is my text stored when I use the checker?

We process your text only to return a detection report and cleaned output. We do not retain the content of your pasted text for any other purpose.

Code Paste Cleaner for Developers

Your Text

Need to pass AI detection?

What are AI Watermarks?

Unicode Watermarks

Character Detection