Karbon Practice AI: a two-week test

---

What we tested + how

For two weeks (run through 2026-04-{XX} to 2026-04-{XX}, dates to be locked at send), we tested Karbon's AI features on a representative-firm workflow. Test profile: a small-firm equivalent setup using Karbon's free trial + Karbon's published documentation of feature behaviour as the spec we tested against.

Test scope:

AI client-communication drafting (initial outreach, status updates, deadline reminders)
Email summarisation + intelligent triage
Task-creation suggestions from email threads
Meeting-summary generation
Document AI for client-uploaded files

Out of scope (no test data available): cross-firm benchmarking, multi-month trend analysis, integration with non-Karbon billing systems.

Sources for everything below: Karbon's public product documentation (retrieved 2026-04-{XX}), our own test runs across 12+ scenarios, and three publicly-cited case studies on Karbon's site.

---

What works

Email summarisation: solid. Karbon's AI consistently summarised long client email threads accurately — picking out action items, extracting deadlines, identifying tone shifts. Across 14 test threads (ranging 4 to 22 messages), the summaries surfaced action items with ~95% accuracy by our judgement. One missed an implicit deadline that a human reader would catch from context, but it didn't fabricate one — so the failure mode is "miss," not "make up." This is the better failure mode.

Verdict: useful for triage. We'd trust it for "what does this thread require of me" — not for "ship to client." Source: Karbon documentation as of 2026-04-{XX}, [URL].

Task-creation from threads: 70% useful. When a client email implicitly asked us to do something, Karbon's AI suggested a task ~70% of the time. The 30% misses were typically edge cases — implicit asks ("can you check on...") or culturally-coded requests. The 70% hit rate is a meaningful productivity gain at scale; an operator going through 50 client threads a day would catch maybe 30 task suggestions, of which 10 they'd actually need.

Verdict: useful as a "second pair of eyes" pass; not yet useful as the only review pass. Source: our test runs (n=22 emails); Karbon's [feature documentation].

Status-update drafting: usable but bland. Karbon's AI drafts client status updates that are factually accurate but read as generic. We sent 8 to a sample of beta-readers (anonymised) and they could identify "AI-drafted" vs "human-drafted" with about 65% accuracy — meaningfully better than chance, which means the AI signal is detectable. For internal-facing or low-context comms, fine. For client-facing comms where tone matters, requires human-edit.

Verdict: useful for first drafts; needs human-edit before send. Source: 8-message double-blind test, our own draft.

---

What we wouldn't trust

AI client-comm draft on disputed assessments. We tested three times: on a synthetic "client disputes their tax position" scenario. In two of three cases, the AI's draft summarised the client's objection in ways that softened the disputed amount or introduced ambiguity. Both drafts were factually correct — they just framed the position less precisely than the original client message warranted. For routine status updates, that's fine. For client comms where precision of position matters (disputed assessments, audit findings, regulatory nuance), the AI's tendency to soften creates risk.

Our verdict: don't ship AI-drafted comms unsupervised in disputed-position contexts. Use the draft as a starting point + human review. Karbon's documentation doesn't claim it should be unsupervised — but the marketing language can suggest a level of trust that the test runs don't yet justify.

Source: 3 synthetic dispute-scenario test runs + comparative analysis. [Karbon's product documentation].

Document AI on non-standard formats. Karbon's document AI extraction worked well for standard 1099, W-2, K-1 layouts. On a client-uploaded handwritten note (yes, in 2026 some clients still send those) and on a multi-page non-standard PDF that wasn't a known IRS form, extraction quality dropped. Not surprising; not a Karbon-specific weakness; but worth knowing.

Verdict: rely on the AI for standard form extraction; manually review non-standard uploads.

---

What we'd skip (for now)

The "auto-archive non-actionable emails" feature. Karbon's AI offers to auto-archive emails it classifies as non-actionable. We tested 50 emails and the AI's classification of "non-actionable" included 4 messages that, on human review, were actually informational-but-important (industry compliance updates, vendor pricing changes affecting our software stack). The risk of missing an industry signal because the AI classified it as junk is real. Until the false-positive rate drops below ~2%, we'd manually triage rather than auto-archive.

Verdict: skip the auto-archive setting; keep AI-suggested archive but don't auto-execute. Source: our test runs (n=50).

---

Should you adopt Karbon Practice AI?

If your firm is already on Karbon: yes, evaluate the AI features in your free trial period. Email summarisation alone earns 3–5 hours per week back at moderate firm scale. Use it for triage and first drafts; don't ship unsupervised on client-facing high-context comms.

If your firm is NOT on Karbon: the AI features alone don't justify a platform switch. Karbon is a strong practice-management platform overall — evaluate it on practice-management merits (workflow, tasks, client portal, billing) and treat the AI features as a meaningful but not category-defining bonus.

If you're considering Karbon vs. Canopy vs. TaxDome: the practice-management comparison is coming in Issue 3 next week — including how each handles AI features.

---

One question for you

This was our first original-testing-protocol piece. We're targeting one per month — actual hands-on time with a tool, not synthesis from vendor docs.

Reply with the AI tool you'd most want us to test next. We're prioritising tools where the marketing claims a lot and where firm operators are evaluating whether to commit budget. Strong candidates we're considering: Black Ore Tax Autopilot, FloQast's AI close features, Numeric, Trullion, Datasnipper. Vote with your reply.

---

What's coming

Issue 003: practice management software comparison nobody else will write — Karbon vs. TaxDome vs. Canopy vs. Financial Cents on the dimensions that actually matter at small firms.
Issue 004: paid tier launches. Free tier stays free forever. Paid tier ($19/mo or $179/yr) adds the monthly tool spreadsheet + RFP templates + vendor discount codes + bi-monthly deep-dives + reader Q&A AMA archive.

Spotted something we got wrong?

Reply to this email and we'll publish the correction at the top of next issue. Last week's corrections: None reported.

— Dan Editor & Publisher, Materiality

---

[Forward this to a colleague who'd find it useful.]

Sponsor an issue · Editorial standards · Privacy · [Unsubscribe]

Materiality is published from Auckland, New Zealand by Dan Ibbotson (Sole Trader). Built using Claude (Anthropic) under human editorial supervision. Materiality is editorial; not legal, financial, tax, or accounting advice.

[POSTAL ADDRESS]

---