DH.
All posts

December 30, 2025

Year in Review 2025

How I spent 2025 building AI systems that remove humans from repetitive workflows — and what I learned doing it.

systemsaillmautomationdevopsworkflows

Year in Review 2025

2025 was the year I stopped using AI as a shortcut and started using it as architecture.

The question driving every project:

Where is a human doing something a system could own permanently?

The answer kept showing up in the same places: unstructured data, inconsistent records, internal knowledge that never reaches customers. Three production systems later, the pattern is clear — generative AI earns its place when it sits inside a pipeline, not on top of one.


AI Systems Built

Automated Service Ticket Analysis

Hundreds of free-form maintenance tickets per month. No standard format. No way to run trend analysis or identify recurring failures.

I built a two-stage extraction pipeline using Llama 3.2 that ingests raw technician notes and outputs a structured JSON record: issue date, affected equipment, fault classification, severity. The two-stage design — extract first, then structure — dramatically reduced hallucination compared to asking for JSON directly.

The outcome that mattered: a recurring door controller failure that had been written off as isolated incidents was surfaced automatically within weeks of deployment. Manual review would have caught it in a quarter, maybe two.

Read the technical breakdown →


LLM-Enhanced Customer Data Matching

Two internal databases describing the same customers differently. "ABC Security LLC" in one system, "A.B.C. Security, Lancaster" in the other. Exact-match queries fail. Rule-based fuzzy matching gets most of the easy cases, but breaks on subsidiaries, rebrandings, and address variations.

I built a three-layer matching system:

  1. Fuzzy pre-filter (rapidfuzz) narrows candidate pairs — no LLM budget burned on obvious non-matches
  2. Llama 3.2 decision engine evaluates ambiguous pairs with a structured seven-step analysis prompt
  3. Post-processing layer flags low-confidence results for human review instead of auto-applying them

Precision above 94% on the validation set. No false positives in the HIGH confidence bucket. Manual reconciliation that previously took weeks per quarter now runs automatically on new account creation.

Read the technical breakdown →


Automated Service Note Transformation

Field technicians write for other technicians. "VOX devices offline, running remote diagnostics" is accurate — it's also alarming if a customer receives it verbatim.

This system sits between the internal ticketing workflow and the customer communication layer. It polls a MySQL table for untransformed notes, runs each through Llama 3.5 with alert-type-specific prompting, and writes back a professional customer message — no human in the loop.

The key design insight was that different alert types require fundamentally different messaging strategies. The branching logic lives in Python, not in the model — cheap, transparent, editable by anyone on the team.

Eliminated a daily manual rewrite task. Customer confusion calls related to alarm events dropped measurably in the months after deployment.

Read the technical breakdown →


What These Projects Have in Common

All three share the same underlying structure:

  • Unstructured input that humans were processing manually
  • An LLM as the decision layer, not a magic wand
  • Deterministic scaffolding (pre-filters, output parsers, retry logic, validation) around the non-deterministic core
  • A measurable outcome that justifies the operational cost

This is what production AI integration actually looks like. The model is 20% of the problem. The other 80% is pipeline design, prompt engineering, error handling, and making sure the system degrades gracefully when the model produces garbage.


Supporting Infrastructure

AI systems don't run in a vacuum. Two other projects this year built the foundation they run on.

Enterprise Access Control System

A cloud-connected door management interface using a persistent SSH reverse tunnel — no VPN, no exposed ports. The door controllers never touch the internet. This same tunnel pattern has since been reused for two other on-premises systems.

Read the technical breakdown →

Observability Stack

A containerized monitoring and logging pipeline (Docker, Zabbix, Elasticsearch, Kibana, Filebeat) deployed on a resource-constrained Ubuntu server. The goal wasn't just metrics — it was making systems explain their own behavior. Faster root-cause analysis, less guesswork during outages.


Patterns That Held Across Everything

The model is not the product

Every LLM project this year succeeded because of what surrounded the model: pre-filters that kept costs down, structured prompts that constrained output, parsers that handled malformed responses, confidence thresholds that kept humans in the loop for edge cases. Strip away the scaffolding and the model is unreliable. The scaffolding is the engineering.

Systems fail at the edges

Not inside components — between them. API boundaries, human input, assumptions between tools. This showed up everywhere: in CRM data that didn't survive a handoff between systems, in UDP discovery packets that got dropped, in ticket formats that no two technicians wrote the same way.

Determinism around non-determinism

The most reliable AI systems I built this year were the ones where the non-deterministic part (the model) was tightly bounded by deterministic infrastructure. Known inputs, explicit schemas, inspectable behavior. Auto-discovery and implicit assumptions compound fast in production.


Skill Evolution

Deepened this year

  • LLM prompt engineering for structured output extraction
  • Multi-stage pipeline design for AI workflows
  • Integrating generative AI with existing databases and operational systems
  • Exponential backoff, rate limiting, and resilience patterns for LLM APIs
  • System architecture across infra, application, and network layers

Expanded into

  • Agent-based workflow design
  • Real-time audio processing pipelines (Whisper, Azure Speech)
  • Feedback-driven architectures
  • Designing for human-in-the-loop vs. fully automated decisions

Direction: 2026

The next problem is not more AI features — it's AI systems that maintain themselves.

Focus areas:

  • Agent orchestration — systems that delegate tasks to specialized sub-agents and reconcile their outputs
  • Real-time pipelines — audio, logs, events processed as they happen, not in overnight batches
  • Self-observing systems — infrastructure that generates its own diagnostic context
  • Interactive interfaces over AI backends — giving non-technical users visibility into what the system is doing and why

If you're building something in this space and want an engineer who thinks about the full system — not just the model call — let's talk.


The quality of a system is defined by how little it demands from the people using it.