Email A/B Testing: What to Test and How to Read Results (2026)

Most email A/B tests are run incorrectly. Not because the testers do not care — but because the way most email platforms present A/B testing makes it deceptively easy to draw the wrong conclusions from real data.

You split your list 50/50. Variant A gets a 24% open rate. Variant B gets a 27% open rate. Variant B wins — you apply it to future campaigns. But your list was 800 subscribers. The difference was 24 opens on a test that needed 2,400 opens per variant to be statistically meaningful. You have just made a confident decision based on noise.

This guide covers email A/B testing the right way: what to test and in what order, how to structure tests so results are actually trustworthy, how to interpret statistical significance without a statistics degree, and how to build a compounding testing programme that consistently improves every performance metric over time.

What Is Email A/B Testing?

Email A/B testing — also called split testing — is the practice of sending two or more versions of an email to different portions of your list, measuring the response to each version, and using the results to identify which performs better.

The logic is straightforward: instead of guessing which subject line, send time, or CTA will perform better, you test both simultaneously with real subscribers and let the data decide.

The key distinction: A/B testing is not the same as sending two different campaigns and comparing them. A valid A/B test changes only one variable between the two versions and sends them simultaneously to randomly divided segments of the same list. Any other design introduces confounding variables — differences in send time, audience composition, or seasonal context — that make the results uninterpretable.

Why Most Email Tests Fail to Produce Useful Results

Before covering what to test, it is worth understanding why so many A/B tests produce misleading data — because the mistakes are systematic and avoidable.

Mistake 1: Testing With Too Small a Sample

The most common and most consequential mistake. Every A/B test produces a result — but not every result is meaningful. The question is not "which variant got a higher open rate" but "is the difference large enough to be confident it reflects a real preference rather than random chance?"

Statistical significance requires a minimum sample size that depends on your baseline conversion rate and the size of improvement you are trying to detect. For email open rates:

Baseline Open Rate	Detectable Lift	Minimum List Size Per Variant
20%	5% relative (20% → 21%)	~16,000 subscribers
20%	10% relative (20% → 22%)	~4,000 subscribers
20%	20% relative (20% → 24%)	~1,000 subscribers
30%	10% relative (30% → 33%)	~4,500 subscribers

Practical implication: Most businesses cannot run statistically valid tests for small improvements. Instead, test larger changes — dramatic differences in subject line approach, not minor word tweaks — so the signal-to-noise ratio is high enough to be detectable with realistic list sizes.

Mistake 2: Testing Too Many Variables Simultaneously

Testing subject line AND from name AND send time in the same test makes it impossible to know which change drove the result. This is multivariate testing — which requires exponentially larger sample sizes to produce valid results. A/B testing means one variable at a time.

Mistake 3: Stopping the Test Early

When one variant pulls ahead quickly, the temptation is to stop the test and declare a winner. This leads to a well-documented statistical phenomenon called "peeking" — the early leader is often just the beneficiary of early variance, and longer-running tests frequently reverse or narrow the gap. Let the test run to its predetermined sample size before reading results.

Mistake 4: Not Testing the Same Audience Over Time

Running a test on Monday to engaged subscribers and comparing it to Tuesday's test on your full list is not a valid A/B test — the audiences are different. Always test within the same send, to randomly divided segments of the same list, at the same time.

Mistake 5: Treating Open Rate as a Reliable Metric Since Apple MPP

Since Apple Mail Privacy Protection launched in 2021, open rates have been inflated for senders with significant Apple Mail audiences. MPP pre-fetches email content and registers an "open" regardless of whether the email was actually viewed. For subject line tests — where open rate is the primary metric — Apple MPP introduces noise. Supplement open rate testing with click rate testing on campaigns where the audience mix is uncertain.

What to Test and in What Order

Not all email variables are equally testable or equally impactful. This priority order reflects both the size of the potential improvement and the sample size required to detect it reliably.

Priority 1: Subject Line (Highest Impact, Easily Detectable)

Subject line is the most impactful variable in email marketing because it determines whether the email gets opened at all — and a 20–30% relative improvement in open rate from a better subject line has a compounding effect on every downstream metric.

Subject line tests are also the easiest to run because the metric (open rate) is available within hours and the sample size requirement is lower than for click or conversion tests (because open rates are higher and therefore easier to measure).

High-value subject line variables to test:

Variable	Option A	Option B
Format	Question: "Are you making this email mistake?"	Statement: "The email mistake costing you 30% of opens"
Length	Short (under 40 chars): "Your account needs attention"	Long (50+ chars): "Three things to check before your next email campaign"
Personalisation	With first name: "Sarah, your open rate dropped"	Without: "Your open rate dropped — here's why"
Urgency	Time-limited: "Offer ends tonight at midnight"	Value-first: "The deliverability fix most marketers miss"
Specificity	Specific number: "7 email tests worth running in 2026"	General: "Email tests worth running this year"
Tone	Formal: "Quarterly performance review available"	Conversational: "Quick question about your Q1 sends"

What makes a valid subject line test: The email body, send time, and from name must be identical. Only the subject line changes.

Priority 2: From Name (High Impact, Underused)

The from name is the second thing subscribers read after checking which tab the email landed in. Testing from name can reveal significant open rate differences — particularly the difference between a brand name ("Migomail") and a person name ("Hemant from Migomail").

Common from name test patterns:

Variant A	Variant B	Expected Finding
Brand name only: "Migomail"	Person + brand: "Hemant at Migomail"	Person name typically wins for newsletters and smaller brands
Full name: "Hemant Verma"	First name only: "Hemant"	First name only often feels more personal
Role-based: "Migomail Deliverability Team"	Personal: "Aisha from Migomail"	Personal typically outperforms role-based

Note: From name is tied to your From email address in most platforms. Changing the display name without changing the email address is possible — test the display name independently first.

Priority 3: Send Time and Day

Send time tests are operationally simple — same email, different dispatch times — but require careful execution to ensure the audience segments are randomly divided and not systematically biased by time zone.

Common patterns in US sender data from our email deliverability benchmarks:

B2B: Tuesday–Thursday, 10am–12pm local time tends to outperform Monday and Friday
B2C ecommerce: Saturday morning (8–10am) often outperforms weekday sends for promotional campaigns
Newsletters: Sunday evening (7–9pm) performs well for content-focused newsletters

These are averages. Your specific audience may behave differently. Run a 4-week send time test — split your list 50/50, send the same campaign at two different times across four consecutive sends, then aggregate the results. Four sends per variant reduces single-send noise significantly.

Priority 4: Email Length and Format

Long-form email vs short email, HTML-heavy vs plain text, image-led vs text-led — these tests measure engagement quality (click rate, replies) rather than just opens.

Format tests to run:

Variable	Option A	Option B
Length	Short (under 150 words)	Long (400+ words)
HTML vs plain text	Designed HTML email	Plain text with minimal formatting
Image usage	Image in the hero section	No images, text only
Single vs multi-column	Single column	Two-column layout

Plain text emails frequently outperform HTML emails for newsletters, re-engagement campaigns, and any sequence where personal connection is the goal. HTML emails outperform for product catalogues, promotional emails, and content where visual hierarchy matters. Test this for your specific use case rather than assuming one format always wins.

Priority 5: Call to Action (CTA)

CTA testing measures click rate — which is a more reliable metric than open rate in the Apple MPP era because clicks cannot be pre-fetched. However, click rates are lower than open rates (2–5% vs 20–30%), which means you need a larger sample size to detect meaningful differences.

CTA variables worth testing:

Variable	Option A	Option B
Button text	"Start your free trial"	"Try Migomail free for 14 days"
Button vs link	Designed button	Plain text hyperlink
CTA position	Above the fold	Below main content
Number of CTAs	Single CTA	Two CTAs (primary + secondary)
Colour / design	Blue button	Orange button
First person	"Start my free trial"	"Start your free trial"

First-person CTA text ("Start my free trial" vs "Start your free trial") is one of the most consistently replicated findings in email CTA testing — first person often outperforms second person by 5–15%, likely because it forces the reader to mentally inhabit the action rather than receiving an instruction.

Priority 6: Email Content and Body Copy

Content tests — testing different value propositions, different lead paragraphs, different proof points — are the most complex and most valuable tests in a mature email programme. They are also the hardest to isolate and require the largest sample sizes.

Content tests to run once the higher-priority variables are settled:

Opening line: question vs statement vs fact
Value proposition angle: benefit-led vs feature-led vs story-led
Social proof type: customer quote vs data/statistics vs customer story
Offer framing: percentage discount vs dollar amount discount vs free shipping
Urgency mechanism: countdown timer vs limited stock vs expiry date

How to Structure a Valid A/B Test

Step 1: Define Your Hypothesis

Every test should begin with a specific hypothesis: "I believe [Variable A] will outperform [Variable B] because [reason]." This disciplines you to test for a reason rather than testing randomly, and it provides a framework for interpreting the results even when they surprise you.

Example hypothesis: "I believe a subject line with a specific number ('7 email tests worth running') will outperform a general subject line ('Email tests worth running this year') because our audience has shown stronger click rates on specific, numbered content in the past."

Step 2: Choose Your Metric Before Sending

Decide the success metric before the test runs — not after you see the results. The most common choices:

Open rate: For subject line and from name tests
Click rate (CTR): For CTA, content, and format tests
Click-to-open rate (CTOR): For content quality tests — measures clicks per opener, removing open rate variance
Revenue per email: For offer and pricing tests — requires ecommerce integration

Choosing your metric after seeing results (also called HARKing — Hypothesising After Results are Known) systematically produces false positives.

Step 3: Calculate Required Sample Size

Use a sample size calculator (several free online tools exist, including tools from Evan Miller and AB Testguide) before running the test. Input:

Your current baseline metric (e.g., 22% open rate)
The minimum improvement you want to detect (e.g., 15% relative lift → 22% → 25.3%)
Your desired confidence level (95% is the standard for business decisions)

The calculator returns the minimum number of subscribers needed per variant. If your list is smaller than 2× this number, the test will not produce reliable results — consider testing larger changes to make the signal detectable with your available sample.

Step 4: Split Randomly

Most email platforms have built-in A/B testing that randomly assigns subscribers to variants. Use this rather than manually dividing your list — manual division introduces systematic biases (e.g., alphabetical order by name correlates with geographic and demographic patterns).

Step 5: Run the Test to Completion

Set a predetermined end point — typically when the required sample size is reached — and do not check results until then. If your platform sends the winning variant automatically after a test period, ensure the test period is long enough to accumulate the required sample before the winner is declared.

Step 6: Interpret Results With Appropriate Confidence

After the test completes, a result is valid if:

The sample size per variant met or exceeded your pre-calculated requirement
The difference between variants is larger than the margin of error at your chosen confidence level
The test ran simultaneously (not sequentially) and to a randomly divided audience

A result that meets all three criteria is a directional finding you can act on. A result that does not meet these criteria is interesting data — but not a basis for confident action.

Reading Results: Statistical Significance in Plain Language

Statistical significance sounds intimidating but the concept is simple: how confident are you that the observed difference is real rather than random chance?

A test result at 95% confidence means: if you ran this exact test 100 times, 95 of those times you would observe the winning variant performing better. 5 times you would not — those are false positives.

Most A/B testing tools display a confidence level or a p-value:

p < 0.05 = 95% confidence = statistically significant at the standard business threshold
p < 0.01 = 99% confidence = higher confidence, requires more data
p > 0.05 = not statistically significant = do not act on this result

The practical translation: If your email platform says "Variant B is the winner" but the confidence is 78%, the result is not reliable. A "winner" at 78% confidence means you have a 22% chance of being wrong — which is too high to make permanent changes to your programme. Wait for more data or test a bigger difference.

What to Do When Tests Are Inconclusive

An inconclusive test — where neither variant clearly wins — is not a failed test. It is meaningful information: the difference between the two variants is not large enough to matter to your audience. Options:

Test a more extreme version: If "Get started" vs "Start your free trial" was inconclusive, test "Get started free today" vs "See results in 14 days" — a larger, more conceptually different change
Accept the null hypothesis: Some variables genuinely do not move the needle for your specific audience. Document this and redirect testing effort to higher-impact variables
Aggregate across multiple tests: Run the same test across 3–4 campaigns and aggregate the results. The combined sample may produce statistical significance that a single test could not

Building a Compounding Testing Programme

Individual A/B tests are useful. A systematic testing programme — where every test builds on the last and creates documented institutional knowledge about your audience — is transformational.

The Testing Calendar

Commit to one meaningful test per campaign send. After 12 months of consistent testing, you will have:

A documented subject line playbook specific to your audience
A clear winner on from name format
Optimal send time and day confirmed by multiple rounds of data
A CTA format that consistently outperforms alternatives
Content direction validated by engagement data

This is the difference between an email programme that incrementally improves and one that plateaus.

Document Every Test

Maintain a simple testing log:

Date	Variable Tested	Variant A	Variant B	Sample Size Each	Winner	Confidence	Learning
2026-01-15	Subject line format	Question	Specific number	2,400	Specific number	97%	Numbers outperform questions for this list
2026-01-29	Send time	Tuesday 10am	Thursday 10am	2,400	Tuesday	91%	Directional — re-test
2026-02-12	From name	"Migomail"	"Hemant at Migomail"	2,400	"Hemant at Migomail"	98%	Person name wins — apply permanently

After six months, patterns emerge that are specific to your list and your audience. These patterns are more valuable than any generic best practice guide — including this one.

Segment-Level Testing

Once your overall list testing matures, test within segments rather than across your full list. Champions (your most engaged subscribers) may respond differently to subject line styles than Cooling subscribers. Your email list segmentation guide covers how to set up the engagement tiers that make segment-level testing possible.

Testing within the Champions segment produces results faster (higher engagement rates mean smaller required sample sizes) and more reliably (less noise from disengaged subscribers who open inconsistently).

A/B Testing for Automated Drip Sequences

A/B testing is not limited to broadcast campaigns — it is equally valuable for drip campaign sequences and automation workflows.

Testing within automation sequences:

Welcome series: test Email 1 subject line variations with new subscribers (every new subscriber is a new test participant)
Abandoned cart: test the 30-minute vs 60-minute send delay for Email 1
Re-engagement: test "Still want to hear from us?" vs "We saved something for you" as the Email 1 subject

Automation A/B tests accumulate sample over time as subscribers continuously trigger the sequence — a welcome series test running for 60 days with 50 new subscribers per day accumulates 3,000 per variant, which is sufficient for most subject line tests.

The advantage of automation testing:
Unlike broadcast campaign tests that run once, automation tests run continuously and accumulate statistical significance over weeks or months. They also test the same audience type consistently — every new subscriber in a welcome series test is a new subscriber, eliminating the audience composition variance that affects broadcast tests.

A/B Testing Checklist

Before the test

Hypothesis documented — specific prediction and reasoning
Success metric chosen before test runs — not after
Sample size calculated using a significance calculator
List size is at least 2× the required sample size per variant
Only one variable changes between variants
Both variants will send simultaneously to randomly divided segments

During the test

Test is not being checked until the required sample is reached
No other campaign changes happening to the same audience segment
Platform is splitting randomly (not manually)

After the test

Sample size requirement was met
Confidence level is 95% or above before declaring a winner
Result is documented in the testing log with learning noted
Winner applied to future sends (if conclusive result)
If inconclusive: test a more extreme variant or accept the null hypothesis

Ongoing programme

One test per campaign send scheduled consistently
Testing log reviewed quarterly for patterns
Segment-level tests running in parallel with full-list tests
Automation sequences have active A/B tests running

Frequently Asked Questions

What should I A/B test first in email marketing?
Start with subject lines. Subject line tests produce the largest and most easily detectable improvements, require the smallest sample sizes (because open rates are higher than click rates), and deliver results within hours of sending. A 15–20% relative improvement in open rate from a better subject line approach — for example, numbered lists vs general statements — compounds across every campaign you send going forward. After subject lines, test from name format (brand name vs person name), then send time, then CTA text and placement. Test one variable at a time, in this order, before moving to more complex content tests.

How many subscribers do I need to run a valid email A/B test?
It depends on your current open or click rate and the size of improvement you want to detect. For a subject line test with a 20% baseline open rate and a minimum detectable lift of 20% relative (20% → 24%), you need approximately 1,000 subscribers per variant — 2,000 total. For a smaller lift of 10% relative (20% → 22%), you need approximately 4,000 per variant — 8,000 total. Most businesses with lists under 2,000 subscribers cannot run statistically valid tests for small improvements. The solution: test larger, more dramatically different variants so the effect size is large enough to be detectable with your available list size.

How long should I run an email A/B test?
Run the test until both variants have accumulated the required sample size — not until a specific time has elapsed. For a broadcast campaign send, both variants receive their traffic simultaneously, so the test ends when the send completes. For automation sequence tests, the test runs continuously until the accumulated sample meets the threshold. The common mistake is stopping early when one variant pulls ahead — early leaders often have their lead narrow or reverse as more data accumulates. Set a predetermined sample size requirement and do not read results until it is reached.

What is the difference between A/B testing and multivariate testing in email?
A/B testing changes one variable between two versions of an email — subject line A vs subject line B, with everything else identical. Multivariate testing changes multiple variables simultaneously — subject line, from name, and CTA all tested at once across multiple combinations. Multivariate testing requires exponentially larger sample sizes (typically 50,000+ subscribers per variant combination) and is impractical for most email senders. A/B testing — one variable, two variants, tested sequentially over time — produces clear, actionable results and is appropriate for any list size above approximately 2,000 subscribers.

Does A/B testing improve email deliverability?
Not directly — A/B testing improves the engagement signals that influence inbox placement over time. When A/B testing leads to consistently higher open rates, click rates, and lower complaint rates (through better relevance), inbox providers see sustained positive engagement from your domain. Over months, this stronger engagement signal translates to better inbox placement. The most direct deliverability lever is authentication and list hygiene — covered in our email deliverability best practices guide. A/B testing is an engagement optimisation tool that, compounded over time, contributes positively to deliverability as a secondary effect.

Summary

Email A/B testing works — but only when done correctly. The failures that make most tests useless are systematic and avoidable: too small a sample, too many variables changed at once, stopping tests early, and reading results at inadequate confidence levels.

The right approach:

Test one variable at a time — subject line first, then from name, send time, CTA, format, content
Calculate required sample size before testing — not after observing results
Choose your metric before sending — not based on which metric happened to show a difference
Run to completion — do not peek at results until the required sample is reached
Require 95% confidence before declaring a winner
Document everything — the compounding value of a testing programme is in the patterns it reveals over months, not individual test results

A consistent one-test-per-campaign programme applied for 12 months produces a subject line playbook, a from name format, an optimal send time, and a CTA approach — all validated by your specific audience rather than generic industry averages. That documented knowledge is genuinely difficult to replicate and compounds with every campaign you send.

Start your free trial to access Migomail's built-in A/B testing — subject line, from name, send time, and content variant testing with random audience splitting, real-time significance tracking, and automatic winner application all built into the campaign builder.

GDPR Compliance

Shopify

WooCommerce

Zapier

HubSpot

Salesforce

WordPress

REST API

Webhooks

Get started fast!

Calculators

Courses & Guides

Getting started

Legal

Email A/B Testing: What to Test and How to Read Results (2026)

Email A/B Testing: What to Test and How to Read Results (2026)

What Is Email A/B Testing?

Why Most Email Tests Fail to Produce Useful Results

Mistake 1: Testing With Too Small a Sample

Mistake 2: Testing Too Many Variables Simultaneously

Mistake 3: Stopping the Test Early

Mistake 4: Not Testing the Same Audience Over Time

Mistake 5: Treating Open Rate as a Reliable Metric Since Apple MPP

What to Test and in What Order

Priority 1: Subject Line (Highest Impact, Easily Detectable)

Priority 2: From Name (High Impact, Underused)

Priority 3: Send Time and Day

Priority 4: Email Length and Format

Priority 5: Call to Action (CTA)

Priority 6: Email Content and Body Copy

How to Structure a Valid A/B Test

Step 1: Define Your Hypothesis

Step 2: Choose Your Metric Before Sending

Step 3: Calculate Required Sample Size

Step 4: Split Randomly

Step 5: Run the Test to Completion

Step 6: Interpret Results With Appropriate Confidence

Reading Results: Statistical Significance in Plain Language

What to Do When Tests Are Inconclusive

Building a Compounding Testing Programme

The Testing Calendar

Document Every Test

Segment-Level Testing

A/B Testing for Automated Drip Sequences

A/B Testing Checklist

Frequently Asked Questions

Summary

Ready to Improve Your Email Performance?