Skip to content
Advanced 4-20 hours 6 Steps

Refactor Legacy Code with AI

Legacy code refactoring is one of the highest-leverage things you can do for a codebase — and one of the riskiest if done wrong. AI accelerates every phase of the process: auditing what's there, ident...

What You'll Build

6
Steps
4-20h
Time
3
Tools
5
Prompts
Difficulty Advanced
Best for
refactoringlegacy codesoftware architecturetesting

Step-by-Step Guide

Follow this 6-step workflow to complete in about 4-20 hours.

Assess theIdentify RefactoringWrite CharacterizationExecute thePolish YourVerify Behavior
1

Assess the Codebase

Before refactoring anything, you need an honest picture of what you're dealing with. AI can analyze a codebase and produce a structured health assessment — but you need to give it real code, not descriptions. The output tells you where the real problems are, not where you think they are.

Prompt Template
Analyze this legacy codebase and produce a structured health assessment. **About this codebase:** - Language/Framework: [e.g., Python 3.8 / Node.js 12 / PHP 7 / Java 8 + Spring] - Age: [e.g., '6 years old, grown from a side project'] - Size: [e.g., '~15,000 lines across 80 files'] - Current state: [Honest assessment — e.g., 'Works but nobody wants to touch it,' 'Frequent production bugs,' 'Adding features takes 3x longer than it should'] - Known problems: [What you already know is bad — e.g., 'God class with 2,000 lines,' 'No tests,' 'Everything reads from global state'] - Team knowledge: [Who understands this code — e.g., 'Original author left 2 years ago, limited documentation'] **Most important files/modules (paste the most problematic ones):** [File 1 — e.g., the main god class or entry point] ```[language] [Code] ``` [File 2 — paste another problematic file] ```[language] [Code] ``` For additional context, here is the full file listing: ``` [Output of `find . -name '*.py' | head -60` or equivalent] ``` Produce a structured assessment covering: 1. **Structural Problems** (rank by severity): God classes/modules, circular dependencies, hidden coupling, inappropriate responsibilities 2. **Code Quality Metrics** (estimate based on what you see): Average function length, max nesting depth, duplicated logic, unclear naming patterns 3. **Test Coverage**: Estimate based on any test files visible. What percentage of critical behavior is likely untested? 4. **Technical Debt Hotspots**: Which 3-5 specific files or functions are causing the most pain? Why? 5. **Risk Areas**: Where is a bug most likely to hide? Where would a change most likely break something unexpected? 6. **Dependencies**: Any outdated or deprecated dependencies visible? Any security-relevant packages? 7. **Effort Estimate**: Rough estimate of refactoring effort by category: Quick wins (< 1 day each), Medium tasks (1-3 days), Major overhauls (1+ week) 8. **What NOT to refactor**: Any code that's old and ugly but works and shouldn't be touched without strong justification? Be specific and blunt. Name the actual files and line numbers where problems are worst.
Tip: Read the full assessment before touching a single line. The biggest mistake in legacy refactoring is starting with whatever you personally find most annoying rather than whatever poses the most risk or creates the most day-to-day friction. The assessment maps the terrain; let it inform your priority list before instinct takes over.
2

Identify Refactoring Targets and Sequence

Refactoring everything at once is how projects go off the rails. You need a sequenced plan that starts with the changes that enable other changes, not the ones that look most impressive. AI can help you think through the dependency graph and sequence your work correctly.

Prompt Template
Based on the codebase assessment, help me create a prioritized, sequenced refactoring plan. **Assessment summary:** [Paste the key findings from Step 1 — the hotspots, structural problems, and effort estimates] **My constraints:** - Team size working on this: [e.g., '1 person / 2 engineers part-time'] - Time available: [e.g., '2-3 hours per week alongside feature work' or '2 dedicated weeks'] - Can I freeze feature development? [Yes for X weeks / No, must run in parallel] - Tolerance for risk: [e.g., 'This is internal tooling, some breakage is ok' or 'This is customer-facing, zero breakage tolerance'] - Must-not-touch areas: [e.g., 'The payment processing module — works fine and is compliance-audited'] **My goals for this refactoring:** [What specific outcome do you want? e.g., 'I want to be able to add a new feature type in 2 hours instead of 2 days' / 'I want to be able to onboard a new engineer without a week of knowledge transfer' / 'I want to stop having production incidents caused by this module'] Create a refactoring plan with: 1. **Phase 0: Safety Net** — What tests must I write BEFORE touching any code? (These prove current behavior so I can verify nothing breaks after changes.) 2. **Phase 1: Quick Wins** — Changes that improve things immediately with minimal risk. Criteria: can be done in < 4 hours each, easily reversible, don't change external interfaces. List 5-10 specific tasks. 3. **Phase 2: Structural Changes** — The bigger moves. For each one: - What specifically to do - What it enables (why this sequence position, not earlier or later) - Dependencies (must X be done before this?) - Estimated effort - Risk level and how to mitigate it 4. **Phase 3: Architecture Improvements** — Only if Phases 1-2 are complete and stable. What longer-term architectural goals should be on the roadmap? 5. **Stop Conditions**: When is 'good enough'? What's the point at which continuing to refactor provides diminishing returns and I should stop? 6. **Rollback Strategy**: For each phase, how do I roll back if something goes wrong in production? For each task, give me a one-line description I can put in a ticket. Be specific enough that someone else could do the task from the description.
Tip: Sequencing matters more than scope. Extracting a utility function before breaking up the god class that calls it means you refactor the god class against a cleaner interface. Extracting interfaces before implementing new modules means those modules are testable from day one. Ask AI specifically: 'What has to be done first to make the next step easier?'
3

Write Characterization Tests Before Touching Code

You cannot safely refactor code that has no tests. Characterization tests — also called golden master tests — capture the current behavior of your code, whatever it is. Even if that behavior has bugs. They give you a safety net: if your refactoring accidentally changes behavior, a test fails. AI can generate these tests from your code.

Prompt Template
Generate characterization tests for this code I'm about to refactor. These tests must capture existing behavior — including edge cases and even behaviors that might be bugs — so I can verify nothing changes during refactoring. **Code to test:** ```[language] [Paste the function, class, or module you're about to refactor] ``` **Testing framework available:** [pytest / Jest / JUnit / RSpec / Go test] **What this code is supposed to do** (based on my understanding): [Describe the intended behavior — e.g., 'Calculates a user's invoice total, applying discounts and taxes'] **Known inputs and outputs** (any examples I already know): [List any examples you know — e.g., 'empty cart should return 0' / 'our QA engineer confirmed that passing X returns Y'] **External dependencies this code has:** [List what it calls — e.g., 'Reads from database via db.query(),' 'Calls external payment API,' 'Reads from global config object'] Generate: 1. **Characterization tests that call this code with varied inputs and record the outputs.** For each test: - A descriptive name of what scenario it covers - The input - The expected output (derive from the code's actual logic — NOT from what you think is correct) - A comment flagging if this output looks like it might be a bug I should be aware of 2. **Coverage of these input categories:** - Normal/happy path inputs (3-5 tests) - Boundary values (empty, null, zero, max) - Edge cases visible in the code (anything that triggers a specific branch or condition) - Error cases (what happens when dependencies fail) 3. **Mocks/stubs for external dependencies** so tests run in isolation and don't hit the real database or external services 4. **A test that captures the full integration behavior** if the function is part of a larger flow (not just unit testing individual branches) After generating the tests, run them and make sure they all pass BEFORE I start refactoring. If any fail, tell me — that indicates a bug I need to understand before proceeding. **Important**: I want tests that describe what the code DOES, not what it SHOULD do. We'll add 'correct behavior' tests after the refactoring is complete.
Tip: If you can't write a characterization test because the code is too deeply entangled with a database, file system, or external service, that's important signal: the code is untestable, and making it testable IS the first refactoring. The Michael Feathers book 'Working Effectively with Legacy Code' calls this a 'seam' — the place where you can inject a fake dependency. Ask AI to help you find seams in your specific code.
4

Execute the Refactoring

With a plan and tests in place, you can refactor with confidence. AI generates the refactored code; you review it carefully before committing. Work in small increments — one function, one class, one module at a time — running tests after each change.

Prompt Template
Refactor this code. I have characterization tests in place that will tell me if behavior changes. **Code to refactor:** ```[language] [Paste the specific function or class to refactor] ``` **Refactoring goal for this task:** [Be specific — e.g., 'Extract the tax calculation logic into a separate TaxCalculator class' / 'Replace the 300-line process_order function with smaller single-responsibility functions' / 'Remove global state by passing configuration as a parameter'] **Constraints:** - **Do not change external interfaces**: The function signatures, return types, and any behavior that callers depend on must stay identical. I'll update callers separately if needed. - **Do not fix bugs** (unless explicitly listed): Even if you see logic errors, don't fix them — changing behavior (even to fix bugs) will break my characterization tests and make it hard to tell if refactoring caused a problem. - **Known bugs I'm intentionally preserving for now:** [List any bugs you're aware of that characterization tests encode] - **Style guide:** [e.g., 'Follow PEP 8 / Airbnb JS style / Google Java style'] **Specific refactoring patterns to apply:** [Choose from these or specify your own] - [ ] Extract Method: Pull [specific logic] into its own function named [suggested name] - [ ] Extract Class: Move [specific responsibilities] into a new class named [suggested name] - [ ] Replace Magic Numbers: [list the magic numbers] should become named constants - [ ] Replace Conditional with Polymorphism - [ ] Introduce Parameter Object: [list parameters] should be a [ClassName] object - [ ] Remove Dead Code: [identify dead code to remove] - [ ] Rename for Clarity: [old names] → [better names] **Provide:** 1. The refactored code 2. A list of every change made (one line per change) 3. Any new files or classes created and where they should live 4. Any callers of the original code that now need updating 5. Notes on anything you explicitly chose NOT to change and why After I apply this, I'll run the characterization tests. If any fail, I'll paste the failure and we'll debug together.
Tip: Commit after each successful small refactoring — not at the end. This means: refactor one thing, run tests, green, commit with a message like 'refactor: extract tax calculation into TaxCalculator class.' If you commit in large batches, a test failure means debugging an entire batch of changes instead of one. The commit history also becomes a readable story of how the code improved.
5

Polish Your Output with Coda One

Give your AI-generated content a final polish — fix grammar, improve readability, and make it sound more natural.

Tip: Free tools, no signup required. Just paste your text and go.
6

Verify Behavior and Update Tests

After refactoring, the characterization tests confirm you didn't break anything. But now you need a second pass: replace or supplement the characterization tests with properly written tests that describe intended behavior, and fix any real bugs that surfaced during the refactoring process.

Prompt Template
The refactoring is complete and all characterization tests pass. Now help me improve the test suite. **Refactored code:** ```[language] [Paste the final refactored code] ``` **Current characterization tests:** ```[language] [Paste the characterization tests from Step 3] ``` **Bugs I found during refactoring that I now want to fix:** 1. [Bug 1 — e.g., 'Tax is incorrectly applied to items marked as tax-exempt'] 2. [Bug 2 — e.g., 'Function returns None instead of empty list when no results found'] **Part 1: Upgrade Characterization Tests to Specification Tests** Review my characterization tests. For each test: - If it captures correct behavior: convert it to a proper specification test with a name that describes the INTENDED behavior (not just 'this is what the code does') - If it captures a bug I'm now fixing: update the expected output to the correct behavior - If it's redundant with another test: mark it for deletion - If there's a test gap — important behavior that isn't covered: add it Return an updated, clean test file. **Part 2: Add Tests for the Bugs I Fixed** For each bug listed above, write a regression test that: - Would have caught this bug BEFORE the fix (and fails against the old code) - Passes against the fixed code - Has a name that makes the bug scenario clear (so if this test ever fails again, the name tells you exactly what broke) **Part 3: Documentation** For the refactored module, generate: 1. A module-level docstring that explains: what this module does, key design decisions, anything non-obvious about the implementation 2. Docstrings for any public functions/methods that were unclear before 3. Inline comments for any logic that is still complex after refactoring **Part 4: Health Check** Looking at the final refactored code and test suite: what risks remain? What should be the next refactoring task if I were to continue? Is there anything I should monitor in production after deploying this change?
Tip: After a major refactoring, do a silent deployment: deploy to production but keep both the old and new code paths active, routing a small percentage of traffic to the new path while monitoring for errors or behavioral differences. This is called a canary release or dark launch. It catches bugs that tests missed because they depend on real production data patterns. Ask AI to help you add the feature flag that makes this possible.

Recommended Tools for This Scenario

MCP Servers for This Scenario

Browse all MCP servers →

Frequently Asked Questions

How do I refactor code when I don't understand what it does?
Start by making it understandable before trying to improve its structure. Paste the code into Claude and ask: 'Explain what this code does, step by step. Identify every external dependency and side effect. Describe what inputs it accepts and what outputs or side effects it produces.' Then ask it to add inline comments explaining non-obvious logic. Rename cryptic variables to descriptive names. These are low-risk changes (renaming and commenting) that build your mental model of the code. Once you understand it, the structural problems usually become obvious. Never refactor code you don't understand — you'll break behavior you didn't know existed.
Can AI refactor code automatically without me reviewing it?
Technically yes, but you shouldn't let it. AI-generated refactoring can introduce subtle behavioral changes — particularly around error handling, null/undefined behavior, and concurrency — that look structurally correct but are semantically different. The risk scales with code complexity: straightforward function extraction is usually safe to apply directly, while anything involving class hierarchies, state management, or asynchronous behavior needs careful review. The non-negotiable rule: always run tests after every AI-suggested change. If you have good test coverage, you can move fast. If you don't, slow down.
My manager won't give me time to refactor. How do I make the case?
Use concrete evidence, not abstract quality arguments. Track for two weeks: every time you add a feature, how long does it actually take vs. how long it would take in clean code? Every time there's a production bug, trace it back to the legacy code patterns that caused it. Then build a case around specific numbers: 'Feature X took 3 days; in a refactored codebase it would take half a day. We had 4 production incidents last month; 3 of them came from the authentication module's global state.' Ask AI to help you write a brief technical memo making this case — quantified pain is much more persuasive than 'the code is messy.'
Is AI better at refactoring some languages than others?
Yes, significantly. AI is strongest with JavaScript/TypeScript, Python, Java, and Go — these are the languages with the most training data and the most established refactoring pattern literature. It's reliable but slightly less thorough with Rust, Swift, and Kotlin. It can struggle with older languages like COBOL, Fortran, or legacy PHP 5, and with highly domain-specific languages or proprietary frameworks with little public documentation. For Python and JavaScript, Claude and GitHub Copilot are both excellent at structural refactoring. For Java, Claude tends to give better architectural guidance while Copilot is faster for mechanical transformations. For any language, always test AI output — it occasionally generates syntactically valid code with wrong semantics.

Try AI Grammar Checker

Find and fix grammar, spelling, and punctuation errors with detailed explanations.

Try Free
refactoringlegacy codesoftware architecturetestingcode qualitytechnical debtprogramming
Was this helpful?

Get More Scenarios Like This

New AI guides, top tools, and prompt templates — curated weekly.