The New SDLC With Vibe Coding — Mind Map Notes
Paper 1 of 5 · Google / Kaggle Agentic Engineering Course - Intensive Vibe Coding Course With Google
These are AI generated mind maps / notes from white paper shared for Day 1 learnings.
Documentating for myself for future review and references.
0. Core Shift: Syntax → Intent
Old model: programming = translation (problem → abstract → syntax). Every step adds friction.
New model: developer expresses what to build; the machine handles how.
"The most profound shift since high-level languages" — not a new language, framework, or cloud.
2026 data: 85% of devs use AI coding agents · 51% daily · 41% of new code is AI-generated.
1. Evolution: Autocomplete → Autonomy
~2021 Autocomplete — token prediction
~2022 Inline suggestions — whole functions, patterns not tokens
~2023 Chat-based generation — natural language → working code; conversation becomes the interface
~2024–25 Coding agents — multi-file edits, tool calling, self-correction loop
~2025–26 Autonomous agents — clone repo, plan, sandbox, test, raise PR (no keystrokes)
Each generation preserved what came before and raised the ceiling on what one engineer could do.
2. AI Agent = A Self-Running Loop
Loop: Perceive goal → Plan → Act (tools) → Observe → iterate. Unsatisfactory result → re-plan. Termination → output delivered.
vs chatbot: a chatbot waits for the next prompt; an agent runs its own loop.
Five parts of every agent (Nov 2025 "Intro to Agents" whitepaper):
Model — reasoning engine (next thought / tool / message)
Tools — APIs, code, databases, other agents
Memory — past interactions, project rules, cross-session state
Orchestration — runs the loop, assembles context, dispatches tools
Deployment — hosting, identity, observability (turns prototype into a service)
3. What Is Vibe Coding
Karpathy (Feb 2025): "fully give in to the vibes… forget the code exists."
Mode: describe in natural language, accept output, paste errors back, ask AI to fix.
Went viral → over-applied → lost meaning.
Early 2026: Karpathy coins "agentic engineering" for the disciplined end.
CTO test: "We're vibe coding the payment system" = alarm bells. "We practice agentic engineering under constraints with test coverage" = a different conversation.
4. The Spectrum (Not a Binary)
Key insight: the differentiator is not whether you use AI — it's how outputs get verified.
Vibe Coding
Intent spec: casual natural language
Verification: “Seems to work?”
Codebase understanding: minimal
Error handling: paste error to AI
Scope: prototypes, scripts
Risk: high (disposable)
Structured AI-Assisted
Intent spec: detailed prompts + constraints
Verification: manual / spot-check
Codebase understanding: selective review
Error handling: dev diagnoses, AI fixes
Scope: features in established code
Risk: moderate
Agentic Engineering
Intent spec: formal specs, architecture docs, memory files
Verification: automated tests, CI/CD gates, LM judges
Codebase understanding: comprehensive
Error handling: agents self-diagnose within bounds
Scope: production, team-scale
Risk: low (systematic verification)
Right position depends on the stakes. The skill is knowing where to draw the line per task.
Two verification mechanisms:
Tests → deterministic parts (input X → output Y). Checked by code.
Evals → non-deterministic parts (trajectory, tool choice, output quality). Checked by labelled datasets, rubrics, LM judges.
Without both, it’s always vibe coding — no matter how clever the prompts.
5. Context Engineering — The Real Skill
Quality depends less on clever prompts, more on the quality of context.
Six types of context:
Instructions — role, goals, boundaries
Knowledge — docs, architecture diagrams, domain data
Memory — short-term (session) + long-term (persistent project state)
Examples — few-shot demos, codebase reference patterns
Tools — precise API/script/service definitions
Guardrails — hard constraints, formatting, safety validations
Static vs Dynamic (a first-class architectural decision):
Static — always loaded: system instructions, rule files (AGENTS/CLAUDE/GEMINI.md), global memory, core guardrails. Token cost high; never forgotten.
Dynamic — on demand: skills (task-matched), tool results, RAG docs, windowed history. Token cost low per turn; pay only for what's used.
Too much static = token waste + signal dilution. Too little = forgets critical rules.
Agent Skills = best pattern for dynamic context:
Structured, portable packages of procedural knowledge, loaded on task match (progressive disclosure).
Agent stays a lightweight generalist, flexes into specialist; carries dozens, pays for the active one.
Solves: context rot, no procedural memory for LLMs, multi-agent overhead, vendor lock-in.
Reframe "prompt engineering" → "context engineering": give the AI what a new team member would need to know.
6. The New SDLC
Traditional iterative cycle = weeks. AI-driven iteration = minutes → hours.
AI compresses unevenly: implementation weeks → hours; requirements, architecture, verification stay human-paced.
New bottleneck = specification quality. Developer shifts from implementor → system designer + quality arbiter.
Per phase:
Requirements — AI: user stories, edge cases, API schemas, prototypes in minutes. Human: business trade-offs. Becomes a conversation, not a handoff doc.
Architecture — most human-centric. Trade-offs need business/org context AI can't grasp. AI excels at implementing decisions once made.
Implementation — 25–39% productivity gains. But METR study: experienced devs 19% slower on some tasks (verify/debug overhead). Work shifts writing → reviewing/guiding/verifying.
Testing/QA — output eval (final artifact) + trajectory eval (full tool-call sequence). A fluent output that skipped verification is more dangerous than a visible error. Quality flywheel compounds.
Code Review / Deploy — AI = first-pass reviewer (bugs, style, security, perf). Humans keep design/strategy. Deploy: AI monitors health, auto-rollback, predicts risk.
Maintenance — most underestimated. "Too risky to touch" legacy code now refactorable/migratable.
Pace note: this is a mid-2026 snapshot. Boundaries may shift in 12 months. The constant is human judgment, taste, and verification skill.
7. The Factory Model
The developer's output is not code — it's the system that produces code.
Developer Zone: Define specs → Design guardrails → Review & approve.
Factory Floor: Specs + context → Planning agent → Coding agent → Tests/verification → Verified output (failures loop back).
Guardrails: token limits, security policies, style rules, architectural constraints.
Developer = factory manager (designs the line + QC), not widget-assembler. Give success criteria, not steps.
8. Harness Engineering — What Surrounds the Model
Agent = Model + Harness. Model ≈ 10%. Harness ≈ 90%. The model is the engine. The harness is the car, the road, and the traffic laws.
What's in the harness:
Instructions / rule files (AGENTS/CLAUDE/GEMINI.md, skills, sub-agent prompts) — who the agent is + what it's forbidden from doing
Tools — functions, MCP servers, APIs + prose on when/how to call them
Sandboxes / execution envs — what it can and can't reach
Orchestration — sub-agent spawning, model routing, hand-offs
Guardrails / Hooks — deterministic code at lifecycle points (pre-tool-call, post-edit, pre-commit)
Observability — logs, traces, evals, cost/latency
Proof the harness matters:
Terminal Bench 2.0: outside Top 30 → Top 5 by changing only the harness (no model change).
LangChain: +13.7 points on the same benchmark by tweaking only system prompt/tools/middleware (fixed model).
Most agent failures, examined honestly, are configuration failures — a missing tool, a vague rule, an absent guardrail, a noisy context window.
Harness across the SDLC:
Requirements/Plan/Arch → configure the harness
Implementation → run the harness (sandboxes, tools = boundary)
Testing/QA → feedback loop (auto self-correction: think → act → observe)
Review/Deploy/Maintenance → observe (hooks block bad behavior; observability tracks why)
9. The Developer's Evolving Role — Conductor vs Orchestrator
Not either/or — both, depending on the task.
Conductor (hands-on, real-time, in-IDE):
Keystroke-level control, single-file, developer always in the loop
Tools: GitHub Copilot, Gemini Code Assist, Cursor, Windsurf
Best: exploratory coding, debugging, unfamiliar codebases
Risk: becomes a bottleneck — directing every keystroke caps throughput
Orchestrator (async, high-level, multi-agent):
Goal-level control, multi-file, reviews outcomes not keystrokes
Tools: Google Jules, Copilot agent mode, Cursor background agents, Claude Code
Best: bug fixes, feature implementation, migrations, test generation
Needs: specification, decomposition, evaluation, system design
The 80% Problem:
AI generates ~80% fast. The last 20% — edge cases, error handling, integration, subtle correctness — needs deep context models lack.
Errors evolved: syntax typos → conceptual failures (wrong business-logic assumptions, missed edge cases, architecture debt). Harder to catch — code looks right, passes basic tests.
Best posture: AI for well-specified tasks; reserve human attention for ambiguity, trade-offs, correctness.
10. Coding Agents in Practice
Three places agents show up (devs use all three in a day):
In the editor — inline completion, chat panels, whole-codebase awareness (Copilot, Cursor, Windsurf, JetBrains AI). Stays in flow.
In the terminal — goal in plain language, full FS access, multi-file, run tools/tests (Claude Code, Codex CLI, Antigravity CLI, OpenCode, Cline).
In the background — autonomous, cloud sandboxes, runs hours, outputs a PR (Google Jules, Copilot agent mode, Cursor bg agents, AlphaEvolve).
Building agents as products:
Support bots, research assistants, compliance monitors need their own tools/memory/eval/deploy.
Google Agents CLI — works with any coding agent; one install → 7 skills for the full ADK lifecycle (scaffold, write, eval, deploy, observe). No new SDK.
Prototype on a laptop yesterday → production agent today, no rewrite.
Scales 1 → many: ADK graph + multi-agent workflows; coordination via shared session state, MCP (tools), A2A (cross-agent delegation).
Anthropic experiment (early 2026): agent teams built a working C compiler in Rust over two weeks — humans set direction, didn't write the implementation.
11. The Economics of AI Development
Think Total Cost of Ownership, not velocity. OpEx is dictated by the token economy.
Vibe coding = Low CapEx, High OpEx (hidden debt):
Token burn — unstructured prompting loops, low first-pass success
Prompting tax — re-feeding the same context
Maintenance tax — AI spaghetti takes days to reverse-engineer later
Security remediation — fast code gen = fast vuln gen; prod fixes cost exponentially more
Crossover point: vibe coding costs 3–10x more per feature over time.
Agentic engineering = High CapEx, Low OpEx:
Upfront: API schemas, deterministic test suites, structured context
Marginal cost per feature drops sharply at scale
Levers:
Context engineering as a financial lever — dense AGENTS.md → better first-pass → fewer retry loops
Intelligent model routing — frontier models for complex work, cheaper models for deterministic tasks (test gen, review, CI/CD)
12. Where to Start
Individual developers:
Set up an AGENTS.md — 10 lines: stack, conventions, hard rules, workflow. Add a rule each time the agent misbehaves.
Install skills (e.g. Agents CLI) to build, eval, deploy, optimize.
Pick one repetitive workflow → make it your first agent, end-to-end.
Write tests + evals before code — the contract with the AI.
Review every line going to production — check imports, verify error handling.
Maintain your own skills — AI amplifies expertise, doesn't replace it.
Engineering leaders:
Make context engineering first-class — rule files, prompts, evals, skills as code (reviewed, versioned, owned).
Set the bar at the eval, not the demo — rubric: task success, tool-use quality, trajectory compliance, hallucination, response quality.
Re-shape code review for AI code — hallucinated deps, weak error handling, subtle correctness gaps.
Distinguish prototype vs production in team norms — or prototypes ship by accident.
Invest in the harness as a shared team asset — build once, refine many.
Organizations:
Treat AI dev as an engineering investment, not a productivity feature.
Invest in the production substrate before scale — evals in CI, traces, scoped permissions, security review.
Adopt open standards — MCP (tools), A2A (cross-agent).
Plan hybrid teams — humans set direction, agents implement, clear handoff.
Reframe hiring around judgment, not implementation.
13. Conclusion — Intent as the New Interface
Three durable principles:
Structure scales, vibes don't. The gap between "seems to work" and "works under all conditions" is where outages, vulnerabilities, and maintenance nightmares live.
AI amplifies your engineering culture. A force multiplier of both strengths and weaknesses.
The human role is evolving, not diminishing. Implementation → judgment; writing code → designing the systems that produce code.
Generation is solved. Verification, judgment, and direction are the new craft.
Most Quotable Lines
"Agent = Model + Harness." Model ~10%, harness ~90%.
"The model is the engine. The harness is the car, the road, and the traffic laws."
"Most agent failures are configuration failures."
"The developer's output is not code — it's the system that produces code."
"Generation is solved. Verification, judgment, and direction are the new craft."