The New SDLC With Vibe Coding — Mind Map Notes Narveer Rathore

The New SDLC With Vibe Coding — Mind Map Notes

Paper 1 of 5 · Google / Kaggle Agentic Engineering Course - Intensive Vibe Coding Course With Google

These are AI generated mind maps / notes from white paper shared for Day 1 learnings.

Documentating for myself for future review and references.


0. Core Shift: Syntax → Intent

  • Old model: programming = translation (problem → abstract → syntax). Every step adds friction.

  • New model: developer expresses what to build; the machine handles how.

  • "The most profound shift since high-level languages" — not a new language, framework, or cloud.

  • 2026 data: 85% of devs use AI coding agents · 51% daily · 41% of new code is AI-generated.


1. Evolution: Autocomplete → Autonomy

  • ~2021 Autocomplete — token prediction

  • ~2022 Inline suggestions — whole functions, patterns not tokens

  • ~2023 Chat-based generation — natural language → working code; conversation becomes the interface

  • ~2024–25 Coding agents — multi-file edits, tool calling, self-correction loop

  • ~2025–26 Autonomous agents — clone repo, plan, sandbox, test, raise PR (no keystrokes)

Each generation preserved what came before and raised the ceiling on what one engineer could do.


2. AI Agent = A Self-Running Loop

  • Loop: Perceive goal → Plan → Act (tools) → Observe → iterate. Unsatisfactory result → re-plan. Termination → output delivered.

  • vs chatbot: a chatbot waits for the next prompt; an agent runs its own loop.

Five parts of every agent (Nov 2025 "Intro to Agents" whitepaper):

  • Model — reasoning engine (next thought / tool / message)

  • Tools — APIs, code, databases, other agents

  • Memory — past interactions, project rules, cross-session state

  • Orchestration — runs the loop, assembles context, dispatches tools

  • Deployment — hosting, identity, observability (turns prototype into a service)


3. What Is Vibe Coding

  • Karpathy (Feb 2025): "fully give in to the vibes… forget the code exists."

  • Mode: describe in natural language, accept output, paste errors back, ask AI to fix.

  • Went viral → over-applied → lost meaning.

  • Early 2026: Karpathy coins "agentic engineering" for the disciplined end.

CTO test: "We're vibe coding the payment system" = alarm bells. "We practice agentic engineering under constraints with test coverage" = a different conversation.


4. The Spectrum (Not a Binary)

Key insight: the differentiator is not whether you use AI — it's how outputs get verified.

Vibe Coding

  • Intent spec: casual natural language

  • Verification: “Seems to work?”

  • Codebase understanding: minimal

  • Error handling: paste error to AI

  • Scope: prototypes, scripts

  • Risk: high (disposable)

Structured AI-Assisted

  • Intent spec: detailed prompts + constraints

  • Verification: manual / spot-check

  • Codebase understanding: selective review

  • Error handling: dev diagnoses, AI fixes

  • Scope: features in established code

  • Risk: moderate

Agentic Engineering

  • Intent spec: formal specs, architecture docs, memory files

  • Verification: automated tests, CI/CD gates, LM judges

  • Codebase understanding: comprehensive

  • Error handling: agents self-diagnose within bounds

  • Scope: production, team-scale

  • Risk: low (systematic verification)

  • Right position depends on the stakes. The skill is knowing where to draw the line per task.

  • Two verification mechanisms:

    • Tests → deterministic parts (input X → output Y). Checked by code.

    • Evals → non-deterministic parts (trajectory, tool choice, output quality). Checked by labelled datasets, rubrics, LM judges.

  • Without both, it’s always vibe coding — no matter how clever the prompts.


5. Context Engineering — The Real Skill

  • Quality depends less on clever prompts, more on the quality of context.

Six types of context:

  1. Instructions — role, goals, boundaries

  2. Knowledge — docs, architecture diagrams, domain data

  3. Memory — short-term (session) + long-term (persistent project state)

  4. Examples — few-shot demos, codebase reference patterns

  5. Tools — precise API/script/service definitions

  6. Guardrails — hard constraints, formatting, safety validations

Static vs Dynamic (a first-class architectural decision):

  • Static — always loaded: system instructions, rule files (AGENTS/CLAUDE/GEMINI.md), global memory, core guardrails. Token cost high; never forgotten.

  • Dynamic — on demand: skills (task-matched), tool results, RAG docs, windowed history. Token cost low per turn; pay only for what's used.

  • Too much static = token waste + signal dilution. Too little = forgets critical rules.

Agent Skills = best pattern for dynamic context:

  • Structured, portable packages of procedural knowledge, loaded on task match (progressive disclosure).

  • Agent stays a lightweight generalist, flexes into specialist; carries dozens, pays for the active one.

  • Solves: context rot, no procedural memory for LLMs, multi-agent overhead, vendor lock-in.

Reframe "prompt engineering" → "context engineering": give the AI what a new team member would need to know.


6. The New SDLC

  • Traditional iterative cycle = weeks. AI-driven iteration = minutes → hours.

  • AI compresses unevenly: implementation weeks → hours; requirements, architecture, verification stay human-paced.

  • New bottleneck = specification quality. Developer shifts from implementor → system designer + quality arbiter.

Per phase:

  • Requirements — AI: user stories, edge cases, API schemas, prototypes in minutes. Human: business trade-offs. Becomes a conversation, not a handoff doc.

  • Architecture — most human-centric. Trade-offs need business/org context AI can't grasp. AI excels at implementing decisions once made.

  • Implementation — 25–39% productivity gains. But METR study: experienced devs 19% slower on some tasks (verify/debug overhead). Work shifts writing → reviewing/guiding/verifying.

  • Testing/QA — output eval (final artifact) + trajectory eval (full tool-call sequence). A fluent output that skipped verification is more dangerous than a visible error. Quality flywheel compounds.

  • Code Review / Deploy — AI = first-pass reviewer (bugs, style, security, perf). Humans keep design/strategy. Deploy: AI monitors health, auto-rollback, predicts risk.

  • Maintenance — most underestimated. "Too risky to touch" legacy code now refactorable/migratable.

Pace note: this is a mid-2026 snapshot. Boundaries may shift in 12 months. The constant is human judgment, taste, and verification skill.


7. The Factory Model

The developer's output is not code — it's the system that produces code.

  • Developer Zone: Define specs → Design guardrails → Review & approve.

  • Factory Floor: Specs + context → Planning agent → Coding agent → Tests/verification → Verified output (failures loop back).

  • Guardrails: token limits, security policies, style rules, architectural constraints.

  • Developer = factory manager (designs the line + QC), not widget-assembler. Give success criteria, not steps.


8. Harness Engineering — What Surrounds the Model

Agent = Model + Harness. Model ≈ 10%. Harness ≈ 90%. The model is the engine. The harness is the car, the road, and the traffic laws.

What's in the harness:

  • Instructions / rule files (AGENTS/CLAUDE/GEMINI.md, skills, sub-agent prompts) — who the agent is + what it's forbidden from doing

  • Tools — functions, MCP servers, APIs + prose on when/how to call them

  • Sandboxes / execution envs — what it can and can't reach

  • Orchestration — sub-agent spawning, model routing, hand-offs

  • Guardrails / Hooks — deterministic code at lifecycle points (pre-tool-call, post-edit, pre-commit)

  • Observability — logs, traces, evals, cost/latency

Proof the harness matters:

  • Terminal Bench 2.0: outside Top 30 → Top 5 by changing only the harness (no model change).

  • LangChain: +13.7 points on the same benchmark by tweaking only system prompt/tools/middleware (fixed model).

Most agent failures, examined honestly, are configuration failures — a missing tool, a vague rule, an absent guardrail, a noisy context window.

Harness across the SDLC:

  1. Requirements/Plan/Arch → configure the harness

  2. Implementation → run the harness (sandboxes, tools = boundary)

  3. Testing/QA → feedback loop (auto self-correction: think → act → observe)

  4. Review/Deploy/Maintenance → observe (hooks block bad behavior; observability tracks why)


9. The Developer's Evolving Role — Conductor vs Orchestrator

Not either/or — both, depending on the task.

Conductor (hands-on, real-time, in-IDE):

  • Keystroke-level control, single-file, developer always in the loop

  • Tools: GitHub Copilot, Gemini Code Assist, Cursor, Windsurf

  • Best: exploratory coding, debugging, unfamiliar codebases

  • Risk: becomes a bottleneck — directing every keystroke caps throughput

Orchestrator (async, high-level, multi-agent):

  • Goal-level control, multi-file, reviews outcomes not keystrokes

  • Tools: Google Jules, Copilot agent mode, Cursor background agents, Claude Code

  • Best: bug fixes, feature implementation, migrations, test generation

  • Needs: specification, decomposition, evaluation, system design

The 80% Problem:

  • AI generates ~80% fast. The last 20% — edge cases, error handling, integration, subtle correctness — needs deep context models lack.

  • Errors evolved: syntax typos → conceptual failures (wrong business-logic assumptions, missed edge cases, architecture debt). Harder to catch — code looks right, passes basic tests.

  • Best posture: AI for well-specified tasks; reserve human attention for ambiguity, trade-offs, correctness.


10. Coding Agents in Practice

Three places agents show up (devs use all three in a day):

  • In the editor — inline completion, chat panels, whole-codebase awareness (Copilot, Cursor, Windsurf, JetBrains AI). Stays in flow.

  • In the terminal — goal in plain language, full FS access, multi-file, run tools/tests (Claude Code, Codex CLI, Antigravity CLI, OpenCode, Cline).

  • In the background — autonomous, cloud sandboxes, runs hours, outputs a PR (Google Jules, Copilot agent mode, Cursor bg agents, AlphaEvolve).

Building agents as products:

  • Support bots, research assistants, compliance monitors need their own tools/memory/eval/deploy.

  • Google Agents CLI — works with any coding agent; one install → 7 skills for the full ADK lifecycle (scaffold, write, eval, deploy, observe). No new SDK.

  • Prototype on a laptop yesterday → production agent today, no rewrite.

  • Scales 1 → many: ADK graph + multi-agent workflows; coordination via shared session state, MCP (tools), A2A (cross-agent delegation).

  • Anthropic experiment (early 2026): agent teams built a working C compiler in Rust over two weeks — humans set direction, didn't write the implementation.


11. The Economics of AI Development

Think Total Cost of Ownership, not velocity. OpEx is dictated by the token economy.

Vibe coding = Low CapEx, High OpEx (hidden debt):

  • Token burn — unstructured prompting loops, low first-pass success

  • Prompting tax — re-feeding the same context

  • Maintenance tax — AI spaghetti takes days to reverse-engineer later

  • Security remediation — fast code gen = fast vuln gen; prod fixes cost exponentially more

Crossover point: vibe coding costs 3–10x more per feature over time.

Agentic engineering = High CapEx, Low OpEx:

  • Upfront: API schemas, deterministic test suites, structured context

  • Marginal cost per feature drops sharply at scale

Levers:

  • Context engineering as a financial lever — dense AGENTS.md → better first-pass → fewer retry loops

  • Intelligent model routing — frontier models for complex work, cheaper models for deterministic tasks (test gen, review, CI/CD)


12. Where to Start

Individual developers:

  1. Set up an AGENTS.md — 10 lines: stack, conventions, hard rules, workflow. Add a rule each time the agent misbehaves.

  2. Install skills (e.g. Agents CLI) to build, eval, deploy, optimize.

  3. Pick one repetitive workflow → make it your first agent, end-to-end.

  4. Write tests + evals before code — the contract with the AI.

  5. Review every line going to production — check imports, verify error handling.

  6. Maintain your own skills — AI amplifies expertise, doesn't replace it.

Engineering leaders:

  1. Make context engineering first-class — rule files, prompts, evals, skills as code (reviewed, versioned, owned).

  2. Set the bar at the eval, not the demo — rubric: task success, tool-use quality, trajectory compliance, hallucination, response quality.

  3. Re-shape code review for AI code — hallucinated deps, weak error handling, subtle correctness gaps.

  4. Distinguish prototype vs production in team norms — or prototypes ship by accident.

  5. Invest in the harness as a shared team asset — build once, refine many.

Organizations:

  1. Treat AI dev as an engineering investment, not a productivity feature.

  2. Invest in the production substrate before scale — evals in CI, traces, scoped permissions, security review.

  3. Adopt open standards — MCP (tools), A2A (cross-agent).

  4. Plan hybrid teams — humans set direction, agents implement, clear handoff.

  5. Reframe hiring around judgment, not implementation.


13. Conclusion — Intent as the New Interface

Three durable principles:

  1. Structure scales, vibes don't. The gap between "seems to work" and "works under all conditions" is where outages, vulnerabilities, and maintenance nightmares live.

  2. AI amplifies your engineering culture. A force multiplier of both strengths and weaknesses.

  3. The human role is evolving, not diminishing. Implementation → judgment; writing code → designing the systems that produce code.

Generation is solved. Verification, judgment, and direction are the new craft.


Most Quotable Lines

  • "Agent = Model + Harness." Model ~10%, harness ~90%.

  • "The model is the engine. The harness is the car, the road, and the traffic laws."

  • "Most agent failures are configuration failures."

  • "The developer's output is not code — it's the system that produces code."

  • "Generation is solved. Verification, judgment, and direction are the new craft."

Whitepaper - https://www.kaggle.com/whitepaper-the-new-SDLC-with-vibe-coding?utm_medium=email&utm_source=gamma&utm_campaign=learn-intensive-assignment1-june-2026

Course - https://www.kaggle.com/competitions/5-day-ai-agents-intensive-vibecoding-course-with-google/discussion?sort=hotness