Agentic Coding Analysis: orientman-blog
Date: 2026-03-09 Scope: All work performed with OpenCode on the orientman-blog repository Data sources: OpenCode session database (~/.local/share/opencode/opencode.db), git history, spec-kit specs (001--006), OpenSpec change archive
1. Project Overview
A WordPress-to-static-site blog migration built entirely with an AI coding agent (OpenCode + Claude) over 12 calendar days.
| Metric | Value |
|---|---|
| Total sessions | 230 |
| Productive sessions (code) | 101 (44%) |
| Exploration/planning sessions | 129 (56%) |
| Total messages exchanged | 6,633 |
| Git commits (non-merge) | 156 |
| Spec-kit specs (001--006) | 6 |
| Archived OpenSpec changes | 29 |
| Total tracked changes | 35 |
| Date range | Feb 26 -- Mar 9, 2026 |
| Source code (TS/TSX) | ~4,012 lines |
| Content files (MDX) | 63 posts |
2. Work Classification
All 29 completed changes plus additional fixes, classified by type and whether human correction was required.
2.1 Content Migration & Transformation
Agent effectiveness: EXCELLENT
| Change | Corrections needed? |
|---|---|
| WordPress blog migration (001) | Minor -- gist embeds, code formatting in ~5 MDX files |
| LinkedIn content migration | None -- clean 3-post import |
| LibraryThing reviews migration | Minor -- one book title was wrong language |
| Book cover images | None -- bulk 25-post update |
| Word-wrap long prose lines | None -- mechanical reformatting |
| Fix typos/grammar across 19 posts | None |
| Normalize/enrich post tags | None |
7 changes, ~2 needed minor correction. Agent excels at bulk mechanical transforms across hundreds of files.
2.2 Small Feature Addition
Agent effectiveness: VERY GOOD
| Change | Corrections needed? |
|---|---|
| Add Giscus comments | None -- drop-in widget |
| Add social links | None |
| Add social share buttons | None |
| GoatCounter analytics | None -- simple script injection |
| Search link in topbar | None |
| Search post titles (Pagefind) | None |
| Goodreads reading list widget | None |
| Add Goodreads external links | Yes -- links pointed to book page, not review page |
| Add review cover images | None |
| Show star rating | None |
| Longer post summaries | None |
| Rich post excerpts | None |
| Add Gravatar avatar/favicon | Yes -- 4+ sessions; user finally suggested the simple solution |
13 changes, 2 needed correction. Agent is reliable for well-scoped, clearly specified integrations.
2.3 Bug Fixes
Agent effectiveness: GOOD at diagnosis, MIXED at fixing
| Change | Corrections needed? |
|---|---|
| Fix old blog URLs | None |
| Fix quote post nested blockquotes | Yes -- first fix partial; needed "remove also outer blockquote" |
| Fix comments undefined NaN | Yes -- required follow-up for datetime format |
| Dark mode search visibility | Yes -- "Still does not work - see screenshot" |
| Blog title click -> first page | Yes -- over-engineered; user said "Maybe just point to 1st page?" |
5 changes, 4 needed correction. Agent diagnoses well but often over-engineers fixes or makes partial fixes.
2.4 Infrastructure & Tooling
Agent effectiveness: GOOD
| Change | Corrections needed? |
|---|---|
| ESLint + Prettier setup | Yes -- applied too broadly to openspec dirs, needed revert |
| Upgrade Next.js v16 | None -- clean framework upgrade |
| Changelog setup | None |
| GitHub Pages deploy (002) | Yes -- CSS/paths broken on first deploy |
4 changes, 2 needed correction. Config scoping and deploy verification are weak spots.
2.5 Visual Design & Styling
Agent effectiveness: MIXED -- requires heavy iteration
| Change | Corrections needed? |
|---|---|
| Personal visual style | Yes -- multiple rounds of color/styling negotiation |
| Visual style improvements | Yes -- link order inconsistency, readability issues |
| Chateau visual style | Yes -- match-from-screenshot required 82 messages |
| CV update | Minor -- role formatting preferences |
| Remove tag mapping | None |
| Tags index page | None |
| Weighted tag cloud | Yes -- "tag text should be centered inside clouds" |
| URL-aware pagination | None |
| AI badge in header | Yes -- 4+ sessions to get exact styling right |
9 changes, 6 needed correction. Visual/aesthetic work is the agent's weakest area.
3. Failure Mode Taxonomy
Seven distinct failure patterns identified from user message analysis:
| # | Failure Mode | Count | Examples |
|---|---|---|---|
| 1 | Collateral damage | 3 | "revert mdx changes not related to comments datetime"; "Exclude openspec from eslint-prettier-setup and revert" |
| 2 | Over-engineering | 4 | Pagination fix -> user said "just point to 1st page?"; Favicon -> user suggested Gravatar after 4 sessions |
| 3 | Partial fix | 3 | "remove also outer blockquote"; "Still does not work - see screenshot" |
| 4 | Data accuracy | 3 | Goodreads links to book not review; wrong book title language; HTML entities not decoded |
| 5 | Visual judgment | 6 | Header styling iterations; link ordering; tag centering; CRT scanline visibility |
| 6 | Self-verification | 3 | "There are some lint errors. Fix them"; CSS broken on deploy; dark mode not tested |
| 7 | Session multiplication | 4 | Favicon (4 sessions); worktree creation (7+ attempts); AI badge (multiple sessions) |
3.1 Collateral Damage
The agent modifies files outside the requested scope. Happens most often during search-and-replace or linting operations. The fix is always a revert, which wastes a round-trip.
3.2 Over-engineering
The agent builds a complex, "complete" solution when a simple one exists. This is arguably the signature agent failure mode -- humans naturally reach for the simplest solution; agents reach for the most architecturally thorough one.
The favicon saga is the canonical example: four sessions of complex favicon generation approaches before the user said "Can't you just use Gravatar links like before?"
3.3 Partial Fix
The agent addresses the visible symptom but misses the root cause. Often requires a second round where the user points out the remaining issue. Common in CSS/styling fixes where multiple DOM elements contribute to the visual problem.
3.4 Data Accuracy
The agent gets URLs, titles, or factual content wrong. This is particularly dangerous in content migration because errors propagate to published content and may not be caught by automated checks.
3.5 Visual Judgment
The agent cannot evaluate whether something "looks right." Every aesthetic decision requires human review, and often 2-3 iterations. Sessions involving visual work average 2-3x more messages than functional work.
3.6 Self-verification Gap
The agent did not reliably verify its own output before presenting it. Three times the user had to ask the agent to run lint or check deployed results. This was partially addressed by adding a lint requirement to AGENTS.md mid-project. A broader lesson: agents need explicit verification gates in their workflow, not just generation capabilities.
3.7 Session Multiplication
Some tasks require repeated restarts because the agent gets stuck or the environment (worktree, git) enters an unrecoverable state. Tool/environment integration is the weakest link in the agent workflow.
4. Correction Rate by Category
| Category | Total | Clean | Corrected | Clean Rate |
|---|---|---|---|---|
| Content migration | 7 | 5 | 2 | 71% |
| Small features | 13 | 11 | 2 | 85% |
| Bug fixes | 5 | 1 | 4 | 20% |
| Infrastructure | 4 | 2 | 2 | 50% |
| Visual/design | 9 | 3 | 6 | 33% |
| Total | 38 | 22 | 16 | 58% |
5. Effectiveness Spectrum
EXCELLENT ████████████████████████ Bulk content transforms, mechanical refactoring
VERY GOOD ██████████████████████ Drop-in integrations, well-scoped features
GOOD ████████████████ Framework upgrades, CI setup, search/replace
MIXED ██████████████ Bug diagnosis (good) -> fix (sometimes partial)
WEAK ████████████ Config scoping, deploy verification
POOR ██████████ Visual design, aesthetic judgment
POOR ████████ Tool/environment issues (worktree, git state)
6. Key Insights
6.1 Exploration Dominance
56% of sessions produced no code. The agent's role as a thinking partner -- exploring ideas, writing specs, planning changes -- was its most-used function. This is underappreciated: spec-driven development with an agent may deliver more value through structured thinking than through code output.
6.2 Small Feature Reliability
The 85% clean rate for well-scoped features is remarkable. For clearly specified, self-contained additions (drop-in widget, script injection, new UI component), the agent is nearly as reliable as a senior developer. The key predictor of success is specification clarity, not task complexity.
6.3 Bug Fix Paradox
Bug fixes have the worst clean rate (20%), despite debugging being a perceived AI strength. The issue is not diagnosis -- the agent consistently identified root causes correctly. The problem is in the fix: agents tend to over-engineer solutions or address symptoms rather than causes. Human course-correction was needed for 4 out of 5 bug fix changes.
6.4 Visual Iteration Cost
Visual/styling work required 2-3x more messages per change. The "Chateau visual style" session had 82 messages -- the most of any productive session. Aesthetic judgment cannot be delegated. The most efficient pattern was the user providing a screenshot and iterating on specifics, rather than describing the desired look in words.
6.5 Self-verification Gap
The agent did not reliably verify its own output before presenting it. Three times the user had to ask the agent to run lint or check deployed results. This was partially addressed by adding a lint requirement to AGENTS.md mid-project. A broader lesson: agents need explicit verification gates in their workflow, not just generation capabilities.
6.6 Over-engineering as Signature Failure
When the agent fails, it almost never fails by doing too little. It fails by doing too much -- building elaborate solutions when simple ones exist. The human's most common correction was simplification, not addition. This inverts the common assumption that AI coding assistants are "lazy" or produce minimal solutions.
7. Raw Data
7.1 Top 10 Sessions by Message Count
| Title | Messages | Adds | Dels | Files |
|---|---|---|---|---|
| Next.js static site from repo (no DB) | 346 | 1,673 | 0 | 8 |
| Add syntax highlighting for code blocks | 285 | 173 | 0 | 2 |
| Clarification workflow for spec verification | 260 | 1,099 | 1 | 7 |
| Strikethrough formatting not working | 192 | 1,978 | 758 | 526 |
| OpenSpec implementation workflow | 158 | 9,789 | 5,665 | 290 |
| Fix comments showing undefined NaN | 140 | 909 | 0 | 8 |
| Fix gist embeds in WordPress migration | 133 | 264 | 0 | 3 |
| OpenSpec implementation workflow | 131 | 11,457 | 2,095 | 110 |
| Implement OpenSpec change tasks | 106 | 830 | 68 | 38 |
| OpenSpec task generation | 104 | 4,269 | 1,050 | 537 |
7.2 Session Productivity Split
- 101 sessions (44%) -- produced code changes
- 129 sessions (56%) -- exploration, planning, spec writing only
- Average messages per productive session: ~40
- Average messages per exploration session: ~20
7.3 All 29 Archived OpenSpec Changes (Chronological)
add-gravatar-avatar(2026-03-03)fix-old-blog-urls(2026-03-04)remove-tag-mapping(2026-03-04)tags-index-page(2026-03-04)weighted-tag-cloud(2026-03-04)chateau-visual-style(2026-03-05)eslint-prettier-setup(2026-03-05)fix-quote-post-nested-blockquotes(2026-03-05)longer-post-summaries(2026-03-05)related-posts(2026-03-05)rich-post-excerpts(2026-03-05)url-aware-pagination(2026-03-05)book-cover-images(2026-03-06)librarything-reviews-migration(2026-03-06)linkedin-content-migration(2026-03-06)show-star-rating(2026-03-06)upgrade-nextjs-v16(2026-03-06)add-giscus-comments(2026-03-07)add-goodreads-external-links(2026-03-07)add-review-cover-images(2026-03-07)add-social-links(2026-03-07)add-social-share-buttons(2026-03-07)goodreads-reading-list(2026-03-07)search-link-in-topbar(2026-03-07)search-post-titles(2026-03-07)changelog(2026-03-08)cv-update(2026-03-08)personal-visual-style(2026-03-08)visual-style-improvements(2026-03-08)
7.4 All 6 Spec-kit Specs (Chronological)
001-wordpress-blog-migration(Feb 26 -- Mar 1) -- 11 artifacts, 55 tasks (53 checked)002-gh-pages-deploy(Feb 28 -- Mar 1) -- 7 artifacts, 8 tasks (7 checked)003-fix-gist-embeds(Mar 1 -- Mar 2) -- 6 artifacts, 10 tasks (0 checked)004-fix-gfm-strikethrough(Mar 1 -- Mar 2) -- 6 artifacts, 10 tasks (0 checked)005-syntax-highlighting(Mar 1 -- Mar 2) -- 8 artifacts, 30 tasks (0 checked)006-fix-comments-undefined-nan(Mar 2) -- 7 artifacts, 13 tasks (0 checked)
8. Spec-kit Era (Feb 26 -- Mar 2)
Before the project adopted OpenSpec, the first 5 days used the
spec-kit workflow -- a heavier specification system driven by
.specify/ templates and /speckit.* slash commands.
8.1 Overview
Spec-kit produced a deep artifact pipeline for every change:
spec.md -> clarifications.md -> plan.md -> research.md -> data-model.md
-> contracts/ -> quickstart.md -> tasks.md -> checklists/
Each spec could generate up to 9 distinct artifact types plus subdirectories for contracts and checklists. A total of 45 files were produced across 6 specs in 5 days.
8.2 Spec Inventory
| # | Spec | Files | Tasks | Checked | Artifact types |
|---|---|---|---|---|---|
| 001 | WordPress blog migration | 11 | 55 | 53 | spec, plan, research, data-model, quickstart, tasks, contracts/2, checklists, migration-audit, data/XML |
| 002 | GitHub Pages deploy | 7 | 8 | 7 | spec, plan, research, data-model, quickstart, tasks, checklists |
| 003 | Fix gist embeds | 6 | 10 | 0 | spec, plan, research, data-model, tasks, checklists |
| 004 | Fix GFM strikethrough | 6 | 10 | 0 | spec, plan, research, quickstart, tasks, checklists |
| 005 | Syntax highlighting | 8 | 30 | 0 | spec, plan, research, data-model, quickstart, tasks, contracts/1, checklists |
| 006 | Fix comments undefined NaN | 7 | 13 | 0 | spec, plan, research, data-model, quickstart, tasks, checklists |
8.3 Observations
Task tracking broke after spec 002. Specs 001 and 002 had diligent task tracking -- 53/55 and 7/8 tasks checked off respectively. Specs 003--006 had zero tasks checked despite all work being completed (the features shipped). The ceremony of updating checkboxes in tasks.md was abandoned once the overhead exceeded its value.
Artifact volume scaled inversely with problem complexity.
Spec 005 (syntax highlighting) generated 30 tasks across 7 phases
including a contracts/ directory -- for what ultimately required
installing rehype-pretty-code and configuring Shiki.
Spec 004 (GFM strikethrough) produced 6 artifacts including a full
plan.md and research.md -- for a fix that amounted to adding
remark-gfm to the MDX pipeline.
Early specs were higher fidelity. Spec 001 (WordPress migration) was the most elaborate: 11 files including a migration-audit.md, the raw WordPress XML export, and two contract documents defining component and route interfaces. This level of specification was justified -- the migration was genuinely complex, touching 63 posts across multiple content types. The same level of specification for a 1-line fix (spec 004) was not.
The agent generated the specifications enthusiastically.
Spec-kit used /speckit.clarify, /speckit.plan, /speckit.research
and similar commands.
The agent never pushed back on generating artifacts -- it produced full
research.md files and data-model.md documents even when the problem
was trivially understood.
This is a form of over-engineering (failure mode #2) applied to the
workflow itself rather than to code.
9. Workflow Comparison: Spec-kit vs OpenSpec
The migration from spec-kit to OpenSpec (commit fb2c279, Mar 2) was
the most significant process change in the project.
9.1 Side-by-side
| Dimension | Spec-kit (days 1--5) | OpenSpec (days 5--12) |
|---|---|---|
| Changes completed | 6 | 29 |
| Calendar days | 5 | 7 |
| Velocity (changes/day) | 1.2 | 4.1 |
| Artifacts per change | ~7.5 (45 files / 6 specs) | ~3 (proposal + design + tasks) |
| Total artifact files | 45 | ~87 |
| Artifact pipeline | spec -> clarify -> plan -> research -> data-model -> contracts -> quickstart -> tasks -> checklists | proposal -> design -> tasks |
| Task tracking fidelity | Abandoned after spec 002 | Consistently maintained |
| Max tasks in a single spec | 55 (001) | ~8--12 typical |
| Lightest change possible | 6 files minimum | 3 files |
9.2 Why Spec-kit Was Abandoned
Three factors drove the migration:
1. Over-specification overhead. Generating 6--11 artifact files for every change -- including bug fixes that needed one line of code -- created a spec-to-code ratio that was unsustainable. The agent was spending more sessions writing specifications than writing code.
2. Tracking fidelity collapsed. The checkbox-based task tracking in tasks.md was only maintained for the first two specs. Once the user and agent stopped updating checklists, the tracking artifacts became dead weight -- files that existed but conveyed no accurate status information.
3. Artifact types didn't match the work. data-model.md was generated for bug fixes that had no data model. contracts/ directories were created for features that had no API surface. The spec-kit pipeline was designed for greenfield service development, not for iterative blog feature work.
9.3 What OpenSpec Changed
OpenSpec addressed all three issues:
- Fewer artifacts: 3 per change instead of ~8, with each one serving a distinct purpose (what -> how -> do).
- Lighter tasks: Task lists averaged 8--12 items instead of 30--55, making them practical to track.
- Flexible depth: Simple changes got thin proposals; complex ones got detailed designs. The artifact weight scaled with problem complexity instead of being fixed.
9.4 Impact on Agent Effectiveness
The velocity improvement (1.2 -> 4.1 changes/day) is striking but partially misleading -- OpenSpec changes were on average smaller than spec-kit specs. A more meaningful comparison: spec-kit's 6 specs represent roughly the same functional scope as OpenSpec's first 10 changes (the migration, deploy, and initial bug fixes).
The real impact was on feedback loop speed. Under spec-kit, the agent spent multiple sessions generating artifacts before implementation began. Under OpenSpec, the proposal-to-implementation cycle was typically completed within a single session. Faster feedback loops meant corrections were caught earlier and cost less to fix.
9.5 Meta-lesson: The Agent Over-engineered Its Own Process
The spec-kit workflow itself was an instance of the agent's signature failure mode -- over-engineering (Section 3.2). When asked to help design a specification workflow, the agent produced the most thorough, artifact-heavy system it could conceive. It took the human 5 days to recognise that the specification overhead exceeded its value and migrate to something lighter.
This mirrors the code-level pattern exactly: the agent builds the most architecturally complete solution, and the human's primary role is simplification. The difference is that a workflow affects every subsequent change, so the cost of over-engineering compounds. Migrating from spec-kit to OpenSpec was, in effect, the same corrective action as the user saying "Maybe just point to the 1st page?" -- but applied to the development process instead of a single feature.