Files
trail-mate/docs/specs/text-encoding-integrity.md
T
vicliu adf33068d7 v0.1.25-alpha (#30)
* Add uConsole GTK shell with SQLite map cache

* Improve uConsole GTK overview dashboard

* Detect uConsole hardware endpoints

* Add uConsole hardware binding and map fallback

* Improve uConsole settings and map UI

* feat: adapt uConsole Linux shell

* docs: document GPS settings and T-Deck UART noise

* style: apply clang-format

* site: update 0.1.25 release highlights

---------

Co-authored-by: vicliu624 <vicliu@outlook.com>
2026-05-12 01:24:04 +08:00

4.0 KiB

Text Encoding Integrity Specification

Status: baseline Updated: 2026-05-11

This specification defines a repository-wide text integrity rule for Trail Mate. It exists because AI/tool edits have repeatedly corrupted text into mojibake. The failure is not limited to Chinese localization files. It can affect any text file: C++ source, Python scripts, shell scripts, CMake files, packaging metadata, Markdown docs, templates, .wolf memory, release notes, and generated text assets.

1. Baseline Rule

All repository text files are UTF-8 unless a file explicitly documents another encoding. Editing a file must preserve valid UTF-8 and must not introduce mojibake, replacement characters, or corrupted punctuation.

Encoding integrity is a correctness requirement. A change that compiles but corrupts text is not complete.

2. Current Confusions

  • "There is no Chinese in this file" does not make encoding risk disappear.
  • "The compiler still accepts the file" does not prove the text is intact.
  • "Only comments/docs changed" does not make corruption harmless.
  • "Only punctuation changed" is still a defect when punctuation carries meaning, as in version-policy prose, arrows, ranges, or command examples.
  • "Generated by an agent" is not an excuse for bypassing the UTF-8 contract.

3. Invalid Editing Paths

The following edit paths are invalid unless the engineer has verified the encoding behavior:

  • rewriting an entire existing text file to change a few lines;
  • using shell redirection or ad hoc scripts that depend on platform default encodings;
  • copying text through a terminal or toolchain that silently changes Unicode characters;
  • normalizing line endings or file contents as a side effect of an unrelated change;
  • accepting mojibake inside .wolf memory, specs, or generated docs because the file is "only for agents".

4. Required Workflow

Before editing:

  • identify whether the file already contains non-ASCII text, special punctuation, or existing mojibake;
  • choose a targeted patch instead of a whole-file rewrite whenever practical;
  • preserve unrelated bytes outside the intended edit region.

After editing:

  • scan all touched text files for replacement characters and common mojibake artifacts;
  • inspect any file that had non-ASCII content before the edit;
  • fix encoding corruption before running broader implementation work;
  • mention unresolved pre-existing mojibake separately instead of silently treating it as part of the current change.

5. Acceptance Checks

A text edit is acceptable only when:

  • the intended content change is present;
  • untouched text remains readable;
  • no new replacement characters or mojibake artifacts were introduced;
  • UTF-8 content still round-trips through the local tools used by the project;
  • the diff does not contain unrelated line-ending or whole-file churn.

6. Relationship to Version and Release Specs

Version and release policy text is especially sensitive. A rule such as "the version must flow outward from CMakeLists.txt through automated mechanisms" must not be duplicated into independent version constants, and it also must not be corrupted by encoding damage. Corrupting the punctuation or wording of such policy text is a specification failure because future agents may misread the release contract.

No C++ source, Python script, shell script, packaging metadata, documentation template, or .wolf memory file may define release semantics in a way that is both independent from the canonical specification and vulnerable to silent text corruption.

7. Future Automation

This specification should eventually be backed by automated checks:

  • repository scan for invalid UTF-8;
  • scan for replacement characters and common mojibake sequences;
  • pre-commit or CI guard for touched text files;
  • targeted allowlist for third-party files where pre-existing encoding artifacts are intentionally left untouched.

Until that automation exists, every agent must perform the manual acceptance checks above before declaring a text-editing task complete.