shivr.dev

Most conversations about AI reliability focus on the model. Better prompts, smarter RAG systems, bigger context windows, fine-tuning. It's way too easy to overlook deterministic, hacky fixes. They feel inelegant and boring.

Distracted boyfriend meme: the boyfriend (labeled 'Engineer') looks back at 'Prompt Engineering' while his girlfriend 'Regex' looks on disapprovingly

So often, AI output is good enough to be demoed and bad enough to be unusable for real work. But a surprising number of problems can be fixed with a few dozen lines of deterministic code that runs after the AI call. No model, tokens, or latency.

LLMs Can't Count Their Own Brackets

I had an LLM returning JSON. The content was correct. The structure was correct. But it kept dropping trailing closing brackets. Valid JSON minus the last one or two } or ] characters. The model got 99% of the way there, but couldn't reliably finish the last mile.

The prompt engineering approach: retry the call, add more instructions about bracket matching, burn more tokens, add latency, hope. Each retry costs money and time and has no guarantee of working.

The deterministic approach: count the open brackets, count the close brackets, append the difference, validate. If the result parses and matches the expected schema, you're done. Microseconds and no API call.

You get back something like this:

json

{
  "name": "Mark",
  "taxonomy": [
    "Artificial Entity",
    "Autonomous Humanoid Construct",
    "ZUCK-BOT Series"

The fix appends the missing brackets:

json

{
  "name": "Mark",
  "taxonomy": [
    "Artificial Entity",
    "Autonomous Humanoid Construct",
    "ZUCK-BOT Series"
  ]
}

I don't know for sure, but I'd wager this is part of how OpenRouter's Response Healing feature works.

Transcription Is Unusable Without Post-Processing

In the last few months, transcription has become much more important for my workflow than my keyboard. I hit a hot key, audio goes in, text comes out. The transcription providers — Deepgram, OpenAI Whisper, others — are impressive at general speech. They fall apart the moment you say something specific to your workflow.

Project names. Filenames. Domain jargon. The name of a tool you built last week. The API has no basis to get these right, and it doesn't. A transcription model trained on the entire internet still has no idea what you named your side project.

The fix is a list of deterministic corrections. A lookup table that maps what the model heard to what I actually said. Nothing clever. Just “this string becomes that string,” repeated a few dozen times.

Here's what some of those corrections actually look like:

text

# Literal replacements
Quinn => Qwen
..files => dotfiles
.files => dotfiles
Sveld => svelte
Laura => LoRA
Lauras => LoRAs

# Regex replacements

# "item underscore a" -> "item_a"
s/\b([[:alnum:]_]+)\s+underscore\s+([[:alnum:]_]+)\b/$1_$2/g

# "some slash folder" -> "some/folder"
s/\b([[:alnum:]_.-]+)\s+slash\s+([[:alnum:]_.-]+)\b/$1\/$2/g

# "some dash name" -> "some-name"
s/\b([[:alnum:]_.-]+)\s+dash\s+([[:alnum:]_.-]+)\b@$1-$2@g

# ^ These are applied repeatedly, so longer forms like
# "some underscore rule underscore name" become "some_rule_name".

This is the difference between a system I demo and a system I use. I can dictate filenames. I can say project names that no transcription service would ever resolve correctly. I can ramble on and mumble and still get voice input that's actually faster than typing.

When I run into a new bad transcription, I add it to the rules file. It's pretty rare, now.

Format Conversion Is Not a Prompt Problem

I needed Slack-compatible markdown (confusingly named "mrkdwn") from an LLM. Slack's markdown dialect is different from standard markdown — different bold syntax, different link format, different escaping rules. Telling the model “format this for Slack” could work a little bit, depending on the model. Grok 4.1 Fast in particular steadfastly refused to change how it spoke - and for some reason I find that endearing.

The fix: let the LLM write standard markdown. It's good at that. Then convert deterministically. The conversion from one markdown dialect to another is a deterministic problem. Treat it like one.

text

Standard Markdown          Slack mrkdwn
─────────────────          ────────────
**bold** or __bold__  →    *bold*
*italic* or _italic_  →    _italic_
[text](url)           →    <url|text>
~~strikethrough~~     →    ~strikethrough~

When a model consistently ignores an instruction, stop instructing and write the code.

Why This Pattern Keeps Working

Deterministic post-processing has a few properties that compound over time.

It's fast. No round-trip, token cost, or waiting for an API that might be slow, rate-limited, or down.

It's debuggable. When a deterministic fix breaks, you read the code and see exactly what happened. Add a regression test case and you'll never see that specific failure again.

It composes. Stack corrections in a pipeline. Each one handles a specific failure mode. Keep them commutative and they don't interfere with each other. You can add one, remove one, reorder them. The pipeline is transparent.

Prompt engineering has diminishing returns. Your first round of prompt improvements buys a lot. The tenth round buys almost nothing. And all your work can be wiped out when that specific model release is deprecated.

Each deterministic correction rule has constant returns. Every rule works every time, on every call, forever. None of this is glamorous. A pipeline of find-and-replace rules doesn't make it into the architecture diagram. A blog post on a bracket-counting function won't go viral. But boring makes the difference between a system that works and a system that almost works. Almost working isn't a product. It's a demo.

The Most Reliable Part of My AI Stack Is a Find-and-Replace File

LLMs Can't Count Their Own Brackets

Transcription Is Unusable Without Post-Processing

Format Conversion Is Not a Prompt Problem

Why This Pattern Keeps Working