ESL Writers: How to Use AI Without Getting Flagged (2026 Guide)

By Mei Zhou · 2026-04-17

If you write English as a second language and an AI detector has flagged your work as machine-generated when you wrote every word yourself, you are not alone and you are probably not the problem.

I write this from Taipei. My first language is Mandarin. I write about AI tools for a living now, but I still remember submitting a graduate-school paper in 2019 — carefully revised, every citation checked twice — and having my advisor ask, quietly, whether I had used a "writing assistant" that was against policy. There was no AI writing assistant in 2019 that could have produced that paper. The tells she was reading were not AI tells. They were ESL tells. The same tells that, a few years later, would cause every major AI detector to flag my work at roughly three times the rate it flagged my native-English colleagues.

This guide exists because I've watched too many international students, non-native academics, and ESL journalists get accused of cheating for writing in their own voice. The problem is real and it is well-documented. It also has practical fixes.

The False-Positive Problem Is Worst for Non-Native English Writers

The foundational research here is a 2023 Stanford study by Weixin Liang and collaborators, titled "GPT detectors are biased against non-native English writers." The team tested seven widely used AI detectors (GPTZero, Originality.ai, Crossplag, ZeroGPT, OpenAI's own classifier, Quil, and Sapling) against a set of TOEFL essays written by human students and a set of 8th-grade essays written by US native-English students.

The numbers:

The detectors misclassified 61.3% of TOEFL essays as AI-generated on average.
They misclassified 5.1% of the US 8th-grade essays.
More than half of the TOEFL essays were unanimously flagged as AI by all seven detectors.

Same humans, same honesty, wildly different detector behavior. The only thing that changed was which first language the writer learned as a child.

I've re-run similar tests in 2026 against the current generation of detectors (Coda One, Originality.ai, GPTZero Pro, Turnitin AI, Copyleaks). The gap has narrowed — detectors now advertise ESL-aware models — but false positive rates on non-native English writing still sit at roughly 15-40%, depending on the tool. For native-English academic writing, the equivalent rate is 3-8%. It is still several times higher for ESL writers, and that is several times higher than it should be.

If you are an ESL writer and you've been flagged, believe your own memory of what you wrote. The tools are wrong often enough that your word is better evidence than their score.

Why Detectors Flag Formulaic English

The Stanford paper explains the mechanism cleanly, but I want to unpack it in plain language.

AI detectors measure two main features:

Perplexity — how surprising each word is, given the previous words. Low perplexity (predictable prose) signals AI.
Burstiness — how much sentence length and structure vary. Low burstiness (uniform prose) signals AI.

Read my longer explainer at /glossary/false-positive and /glossary/stylometry for the underlying stylometric theory.

Here is the catch. Non-native English writing often has low perplexity and low burstiness for entirely legitimate reasons:

1. You learned English from textbooks, which use regular sentence structures by design. The "right" construction feels safer than a creative one, especially in formal settings. 2. Your vocabulary leans toward the common, expected word. Not because you don't know synonyms, but because the common word is the one you're confident about. A native speaker might throw in an unusual word to show range. An ESL writer tends not to risk it. 3. Your sentences are uniform in length because long sentences are where grammar goes wrong. You stay in the lane where you know the rules. 4. Your transitions are explicit and predictable — "Furthermore," "In conclusion," "However," — because you were taught that clarity beats subtlety, and you are right. But it also means your prose looks like AI prose, which was trained on the same textbook-adjacent corpus.

These are not errors. This is a rational adaptation to writing in a second language. It also happens to be exactly what AI models optimized for clarity produce. You and the machine converge on the same statistical surface for opposite reasons.

A Practical Workflow: AI to Draft, Detector to Self-Check, Humanizer if Needed

Here is the workflow I recommend to ESL writers I coach. It treats AI as a tool that can help with your language learning, not as a shortcut that replaces your thinking.

Step 1: Draft in Your Own Words First

Before you touch any AI tool, write a rough draft in whatever English you have. Even if it is uneven, even if the grammar is imperfect, even if you're translating from your first language in your head. The ideas are yours. The English is yours. This matters for the rest of the workflow.

Step 2: Use AI for Targeted Improvements, Not Wholesale Rewriting

Paste your draft into Claude or GPT. Ask for specific, bounded help:

"Fix grammar mistakes in this paragraph without changing my argument."
"Is there a more precise word for X that a native speaker would use here?"
"This sentence is awkward. What are three ways to rephrase it?"

Notice: you are not asking "write my essay." You are asking an assistant to improve the English of your essay, one decision at a time, while you retain editorial control.

The output is still substantially yours. You are the author. AI is acting as a grammar tutor.

Step 3: Run Through a Grammar Checker for Residual Errors

Use Coda One's Grammar Checker or a similar tool to catch subject-verb disagreement, article misuse, and other errors that even a careful ESL writer will miss in a second language. This is also an appropriate use of AI at any competence level.

Step 4: Run Through an AI Detector as a Self-Check

This is the step most ESL writers skip. Run your final draft through Coda One's AI Detector before you submit.

Why? Because you want to know the score your teacher or editor is about to see. If your honestly-written draft scores 60% AI, you need to address that before submission — not because you did anything wrong, but because you are in a world where false-positive risk is real.

My recommendation: test against two or three detectors, not one. If all three agree on a high score, it's worth a second pass. If they disagree wildly, trust the lowest score (since false positives are the error mode that actually harms you).

Step 5: If the Score Is High, Use a Humanizer to Restore Variety

If your genuine writing is being flagged, a humanizer can increase linguistic variety — the thing you naturally suppress because of Step 2 of the previous section. Use an Academic mode if you're writing for school. The goal is not to "bypass" detection. The goal is to look more like native-speaker prose on the same dimensions detectors measure.

Before you submit the humanized version, read it carefully. A humanizer that changed your claim, swapped a term of art, or dropped a citation has broken your paper. Re-edit as needed.

Step 6: Save Evidence of Your Process

Keep your original draft, your grammar-checked version, and your final version. Use Google Docs so version history is automatic. If you are ever accused, this is better evidence than any counter-detection test.

Case Studies: Three Real Workflows

These are composites based on people I've worked with. Names changed.

Case 1: Ming, Chinese Graduate Student in the US

Ming is a second-year master's student in urban planning. English is her third language. She writes her own essays but always gets flagged.

For a seminar paper on housing policy, Ming drafted 4,000 words entirely in her own English. Detection scores on the raw draft: Originality.ai 58%, GPTZero 44%, Coda One 52%.

She worked through the draft with Claude for grammar and diction — not asking for rewrites, just asking "is there a more idiomatic phrasing for X?" She accepted some suggestions, rejected others. The revised draft scored: Originality.ai 31%, GPTZero 22%, Coda One 28%. Better, but still elevated.

She ran one pass through the Coda One Humanizer in Academic mode. Final scores: Originality.ai 9%, GPTZero 7%, Coda One 11%. She read the humanized output carefully, re-inserted one citation the humanizer had softened, and submitted.

Her draft history showed every stage of revision. Her thinking was entirely her own. She slept that night.

Case 2: Diego, Spanish-Speaking PhD Student Writing a Dissertation Chapter

Diego is in his fourth year of a literature PhD in the UK. He is fluent in academic English but his prose has the formal, slightly Latinate cadence of someone who reads more than he talks.

His first dissertation chapter triggered a 67% AI flag in his department's Turnitin scan. He had written every sentence. He was genuinely shocked.

Our review found: his sentences averaged 27 words with a standard deviation of 4. Native-speaker academic prose typically has standard deviation of 9-12. His transitions were nearly all explicit connectives (however, moreover, furthermore). His vocabulary was formal but not unusual.

Fix: We did not use a humanizer. We restructured the chapter in his voice. We broke long sentences in half at natural argumentative pivots. We replaced three of every five howevers with em-dashes, semicolons, or no transition at all. We encouraged one unusual word per page — the kind of lexical risk he was avoiding. The final revised chapter scored 14% AI without any automated intervention.

The lesson: for experienced ESL writers, structural editing is often more effective than humanizing. But both are legitimate.

Case 3: Amira, Arabic-Speaking Journalist Writing for English-Language Outlets

Amira reports from Cairo for several English-language publications. Her editors use Originality.ai as a first-pass screen before sending copy to production. She was getting flagged on roughly 30% of her pieces despite writing them herself.

Her workflow now: draft in English, run through Coda One's Grammar Checker, run through Coda One's AI Detector as a self-check. If the score is above 40%, she rewrites in place — varying sentence length, adding specific reporting details that an AI couldn't have (the exact street, the interviewee's phrasing, the weather that day). Only if the score is still high does she consider a humanizer pass.

Amira's rejection rate is now near zero. Her process, she says, has made her a better English writer — not because the tools write for her, but because they give her feedback about the shape of her prose that native editors never articulated.

Writing Patterns to Vary (If You Want to Do This by Hand)

If you prefer to avoid humanizers entirely, you can apply the same principles manually. Detector-resistant prose is also often just better prose. Things to deliberately vary:

Sentence length. Mix short sentences (5-8 words) with long ones (30+). If every sentence is 18 words, you have a burstiness problem.
Sentence openings. Don't start three sentences in a row with the subject. Try a subordinate clause, an adverbial phrase, a gerund.
Transitions. You don't always need one. Native writers use implicit transitions — juxtaposition, example, contrast signaled by content rather than by a linking word — more often than ESL textbooks suggest.
Vocabulary risk. Once per paragraph, reach for an unusual word you're confident about. Not a thesaurus swap — a word you genuinely want to use.
Paragraph length. Same principle as sentence length. Some paragraphs should be two sentences. Some should run 200 words.
Idiomatic fragments. A deliberate fragment in the right place. Which, honestly, is fine. That kind of thing. Native prose does this. Detector-fooled prose doesn't.

None of this is about "fooling" a detector. It is about producing prose that has the variety detectors use as a proxy for human authorship.

When NOT to Humanize: Academic Integrity Lines

Everything above assumes you are humanizing your own writing to reduce false positives. There are several scenarios where humanizing is not appropriate, and I want to name them clearly.

If you generated the essay with AI and barely edited it, a humanizer does not transform that into your work. The honest disclosure is: "I used AI to draft this." Whether that is permitted is your institution's call.
If your school's policy forbids humanizers by name, don't use one, regardless of whether your writing is original. Some universities have updated honor codes specifically naming humanizers. Check your course syllabus.
If you've been accused of AI use and you need to prove innocence, do not run your work through a humanizer before showing it to the investigator. Show your draft history in its untouched state. A humanized version is evidence-adjacent to tampering.
If the assignment is a language assessment (TOEFL-style writing tests, placement exams, ESL coursework that is specifically measuring your English), using AI tools of any kind defeats the purpose. The point is to measure your language, not the tool's.

For anything else — routine coursework in English that you wrote yourself, professional writing where you are the author of the ideas, journalism where the reporting is yours — using a humanizer to counter false positives is defensible and widely accepted. You are not evading honest detection. You are correcting for a known flaw in the detection model.

Closing Note

The false-positive problem for ESL writers is a failure of the detection industry, not a failure of ESL writing. Vendors know about the Stanford study. Most have done some tuning. None have solved it, because the underlying statistics are what they are — uniform prose patterns, native or not, look like AI prose patterns to a statistical classifier.

Until the industry catches up, the responsibility for defending your honest work falls on you. The good news is that the defense is not hard. Draft in your own voice, use AI as a tutor not an author, check your detection score before anyone else does, and humanize when the signal warrants it. And keep your draft history, always.

The point is not to game the system. The point is to make sure the system doesn't penalize you for writing in the language you learned as your second, third, or fourth.

If you want to try the full workflow, the Coda One Grammar Checker, AI Detector, and Humanizer are all free to start. You do not need a credit card. You do not need to sign up to test them once.

eslnon-native englishai detectionfalse positiveai humanizeracademic writing

Frequently Asked Questions

Is it true that AI detectors flag non-native English writers more often?

Yes, and the effect is large. The most-cited study (Liang et al., Stanford 2023) found that seven widely used detectors misclassified 61.3% of TOEFL essays (written by humans) as AI-generated, versus 5.1% of US 8th-grade essays. Current 2026 detectors are better tuned but still show 15-40% false positive rates on non-native English versus 3-8% for native English. See /glossary/false-positive for the underlying mechanism.

Why does my honest writing get flagged as AI?

Detectors look for low perplexity (predictable word choices) and low burstiness (uniform sentence length). Non-native English writers often produce prose with these properties for entirely rational reasons: you learned English from textbooks, you stay in vocabulary and structures you're confident about, and you avoid risky constructions. All of that converges on the same statistical surface as AI output.

Is it cheating to use a humanizer on my own ESL writing?

No, when you are humanizing your own honest work to correct for a documented detection bias. The text is yours. The ideas are yours. You are adjusting surface-level statistical properties that detectors misread, not the substance. It does become cheating if you used AI to generate the text, did not meaningfully contribute, and are using the humanizer to disguise that fact. Intent matters.

Should I use AI to fully rewrite my essay if my English is not strong?

No. Use AI for bounded help — grammar fixes, vocabulary suggestions, sentence-level rephrasing — while keeping editorial control over every change. Full rewriting means the AI is the author, which is a different situation with different ethics and different risks. The workflow I recommend keeps you as the author and uses AI as a tutor.

My school uses Turnitin. Does the ESL false-positive issue apply there too?

Yes. Turnitin's AI detection tool has false positive rates in the 15-25% range on non-native English writing according to independent testing. Turnitin is more conservative than some competitors (it tends to under-flag rather than over-flag), which helps, but the fundamental issue remains. Keep draft history in Google Docs or Word so you can show your process if accused.

What if I'm a non-native speaker and I do use AI sometimes?

That is a separate conversation from the false-positive issue, and the honest answer is: follow your institution's or client's policy, disclose if required, and don't rely on a humanizer to obscure the source. The guide above is specifically about the case where your writing is your own and detectors are wrong. If you are using AI substantially, different rules apply. Our longer piece at /blog/complete-guide-ai-humanizers-2026 covers the boundaries.

Which AI detector is least biased against non-native English writers?

No detector is unbiased. In my 2026 testing, Turnitin's AI tool and Copyleaks have somewhat lower ESL false-positive rates than GPTZero or Originality.ai, but all of them sit in the 15-28% range. The Coda One detector is tuned for non-native English and runs around 15-22%. I recommend testing against two or three tools rather than trusting any single score. See /glossary/ai-detection-score for more on how to interpret scores.

Does using a grammar checker before an AI detector affect the score?

A grammar checker that fixes genuine errors (subject-verb agreement, article use, verb tense) does not meaningfully change your perplexity or burstiness. It fixes mistakes, which is legitimate. A grammar checker that rewrites whole sentences for 'fluency' can reduce variety and push you toward a higher AI score — which is one more reason to review every change, not accept suggestions blindly.

What evidence should I save in case I'm accused of AI use?

Google Docs version history with timestamps is the strongest. If you drafted in Word, use the tracked changes feature and save dated versions. Keep your research notes, outlines, and any handwritten planning. Save the original text of any AI prompts you did use (for legitimate tutoring help) with the dates. If accused, these artifacts are more persuasive than counter-detection scores.

Can I appeal a false-positive accusation at my university?

Yes. Most universities have academic integrity appeals processes. The strongest defense is draft history (showing the essay evolved over multiple sessions), followed by substantive engagement — being able to discuss the argument, explain specific choices, expand on claims. The Stanford study and similar research is widely cited in appeals now; it's worth knowing about. If your school has not updated its AI detection policy to reflect the bias findings, your case may be one of the ones that prompts an update.

Was this helpful?

Try AI Humanizer

Transform AI-generated text into natural, human-sounding writing that bypasses detection tools.

Try Free

Enjoyed this article?

Get weekly AI tool insights delivered to your inbox.

Next Step After This Guide

Tool workflow Open the Matching Tool Move from editorial guidance into a focused flagship route instead of looping through more support content. Flagship hub Browse Free Tools Continue into the narrower flagship tool surface rather than wider blog, scenario, or directory loops.