Methodology

Five phases. Inherited from ERP. Re-tuned for AI.

Most AI projects fail because they're run like software pilots. We run them like ERP implementations — discovery, fit-gap, configure, train, hypercare — because that's the playbook that's already known to work in the room AI has to fit into. The differences are in what gets measured.

Phase 01· Week 1 of Assessment

Discovery. The week where we stop assuming.

A round of 1:1 interviews with the people who actually do the work — finance, ops, IT, and whoever owns the chart of accounts. The output is a system inventory, a process map for every candidate workflow, and a list of the disagreements between what your software is supposed to do and what your team actually does.

Inputs
Org chart · ERP access · prior pilot post-mortems
Activities
InterviewsInventoryProcess maps
Output
A landscape document. Not a deck — a working artifact the Build team references for the entire engagement.
Owned by
Firmcraft principal · with your CFO or COO as the lead sponsor
Discovery schedule · wk 15 of 7 scheduled
CF
CFO · J. Reyes
Org thesis · funding · audit posture
Done
CO
Controller · R. Tanaka
Close cadence · AP/AR pain · COA hygiene
Done
OP
Dir. Ops · M. Brennan
Work order flow · field-to-ERP gaps
Done
IT
IT lead · S. Park
Stack · access · sovereignty constraints
Thu
AP
AP team · A. Nguyen +2
Invoice intake · coding patterns
Fri
Phase 02· Week 2 of Assessment

Fit-gap. Where ERP discipline meets AI scope.

For every candidate workflow, three questions: does the existing system already do this (Fit), does it almost-but-not-quite (Gap), or is it a true greenfield (Missing). Only Gaps and Missings go forward into the scorecard. The Fits get a one-line "don't build this" — that answer alone usually pays for the Assessment.

Inputs
Discovery landscape · ERP capability matrix · sovereignty rules
Activities
Capability matrixScoringSovereignty pass
Output
The scorecard. Every candidate use case, ranked, with a feasibility × ROI × sovereign-fit score and a Build / Defer / Don't decision.
Owned by
Firmcraft principal · review with CFO + IT lead
Fit-gap scorecard · wk 218 candidates · 7 greenlit
Use case
Fit
Gap
Missing
AP invoice triageHigh ROI · on-prem
Work-order draftHigh ROI · on-prem
Vendor master mgmt.BC already covers
Voice agent (sched.)Hybrid sovereign-fit
Month-end commentaryDefer · low feasibility
3-way matchBC handles natively
Phase 03· Weeks 1–10 of Build

Configure. The Build, run like an implementation.

The Foundation goes in first — Hermes deployed, retrieval indexed, the messaging gateway wired, Langfuse observing. Then the vertical workflows stack on top, one at a time, each shipped to production behind a feature flag, each with its eval suite in place before traffic. The discipline is straight out of ERP go-lives.

Inputs
Scorecard · roadmap · vendor matrix · TCO model
Activities
Foundation installRAG indexingWorkflow buildIntegration
Output
A running system, behind feature flags, with the eval suite green on each workflow before it sees a real user.
Owned by
Same principal as Discovery · with the Build team behind the scenes
Workflow · ap.triage.v3flag · canary · 5%
TRIGGEREmail inboundHERMESClassifyRAGVendor lookupTOOLBC · codeREVIEWif > $5kTOOLBC · postEMITaudit + notify
Phase 04· Weeks 8–12 of Build

Train & eval. People and models, in that order.

Two parallel tracks. Your team learns the operator — what it can and can't do, how to handle reviews, where the audit log lives. Meanwhile the eval suite is locked in: a regression set for every workflow, scored on accuracy, latency, and cost, gating each release. We don't ship a workflow without it.

Inputs
Production workflows · golden test set from Discovery · audit requirements
Activities
Team trainingEval harnessRegression setCost budget
Output
A scored, regression-gated system and a team that can run it without us being on every standup.
Owned by
Firmcraft principal + your AP or Ops lead as champion
Eval run · ap.triage · v3.2passed · 18 / 20
Accuracy
94%
p95 latency
410ms
$ / run
$0.018
vendor.match.exact312mspass
coa.code.lookup288mspass
3way.match.partial510mswarn
duplicate.detect204mspass
currency.convert198mspass
approval.threshold410mspass
Phase 05· First 90 days post-launch

Hypercare. The first 90 days, run by us.

After cutover, we're on every standup for 30 days, on the channel for 90, and named owners on the runbook for the duration. Hypercare ends when the system has cleared an eval regression cycle without our intervention. After that, you're on an Operate retainer — or off our books entirely.

Inputs
Production system · runbook · eval & cost dashboards
Activities
Daily standupRunbook drillsEval regressionsIncident review
Output
A handoff packet, a passing eval cycle, and a named on-call rotation — yours or ours, depending on which Operate tier you signed.
Owned by
Firmcraft principal · then handoff to retained AI lead
Runbook · day 14on-track
Daily eval regression runscheduled · 06:00 CT
live
Cost budget reviewweekly · finance lead
Mon
Audit-log spot-check5 random invoices · controller
d 12
Incident drill · vendor outagerunbook §3.2
d 10
Retrospective with CFO + ops leadend of week 2
Fri
Eval regression — full suiteunattended · gates handoff
d 28
06 · Negative space

The methodology is also what we won't do.

Most of what makes an AI engagement go off the rails happens in the gaps between the steps above. So we name the disciplines we don't carry over from the rest of the consulting industry.

No 90-day "discoveries."

The Assessment is two to three weeks. If we need longer, scope is wrong — and the wrong scope kills a project faster than the wrong model.

No demo-driven sales.

We won't build a pretty PoC to win the room. The Assessment is the sales conversation, and it's billed.

No quiet routing to frontier models.

If a workflow leaves your walls for a frontier API, you'll know — it'll be in the architecture diagram, the contract, and the audit log.

No account-handoffs.

The principal who scoped your engagement is the one who's on the hypercare standup. We're sized around that constraint — not around scaling headcount.

Start here

Five phases, run by the same person, from week one.

Start with the Assessment. Three weeks, fixed-fee, refundable against Build.

Book the call