Kensink Labs
ClaudeLLM Models8-week engagement
ANTHROPIC CLAUDE · DIRECT INTEGRATION

Direct Claude integration. Eval-gated, vendor-neutral, full source ownership.

Anthropic's Claude is strong at long-context reasoning, careful instruction-following, and tool use. We integrate it directly against the API, with evals and a thin abstraction so you are never locked in.

LLM APIEval pipelinesTypeScriptPrompt governance
Cycle
8 weeks · fixed price
Stack
Claude API, direct
Output
Production code + eval suite
Handoff
Full source ownership
[THE SHORT VERSION]

A frontier model with a careful temperament.

Claude is particularly good at long documents, structured reasoning, and following nuanced instructions, with strong tool-use support. As with every model, the engineering that matters is around it: prompt design, evals, retries, cost control, and a vendor-neutral abstraction. We integrate directly, no LangChain in the path.

When it fits
  • Long-context tasks: documents, transcripts, large codebases
  • Agentic tool use and structured, careful reasoning
  • Workloads where instruction-following quality matters
When it does not
  • Cases where an open-weight model on-prem is mandated
  • Tasks a much cheaper or smaller model handles just as well
[HOW WE BUILD IT]

How we build with Claude.

01

Direct API, thin abstraction

We call the Claude API directly behind a small provider interface. Swapping to another model is a config change, not a rewrite.

02

Prompts as versioned artifacts

Prompts are code: version-controlled, reviewed, and tied to the eval suite that measures them.

03

Evals before you trust it

An eval set that reflects your real tasks. We measure quality and regressions on every prompt or model change.

04

Cost, latency, and fallback

Token budgets, caching, streaming, and a fallback model path. Observability on every call.

[WHAT YOU GET]

What the engagement leaves behind.

Direct
No orchestration framework
Eval-gated
Quality measured, not assumed
1 swap
Vendor change is config
Observed
Every call, cost and latency
[TIERS + VERSIONS]

Pick the tier that fits.

Fable, Opus, Sonnet, Haiku. We integrate Claude directly behind a vendor-neutral abstraction, then route by task difficulty. Swapping tiers or versions is a config change, not a rewrite. Eval-gated, either way.

Flagship4 Jun 2026

Claude Fable 5

claude-fable-5
Input
$10 / 1M tokens
Output
$50 / 1M tokens
  • Anthropic's most powerful model. A new tier that sits above Opus
  • 1M context, 128K max output. Frontier reasoning and agentic depth
  • Premium pricing. Route to it only for the hardest, highest-value work
Read the technical brief
Current28 May 2026

Claude Opus 4.8

claude-opus-4-8
Input
$5 / 1M tokens
Output
$25 / 1M tokens
  • Around 4× less likely than 4.7 to let flaws pass in code it writes
  • Dynamic workflows: hundreds of parallel subagents in one Claude Code session
  • Same standard pricing as 4.7. Fast mode is cheaper than before
Read the technical brief
PreviousQ1 2026

Claude Opus 4.7

claude-opus-4-7
Input
$5 / 1M tokens
Output
$25 / 1M tokens
  • Solid production baseline. Still supported in the API
  • Strong long-context reasoning and tool use
  • Move to 4.8 is a config change behind our vendor-neutral abstraction
Still supported · brief not yet published
CurrentQ1 2026

Claude Sonnet 4.6

claude-sonnet-4-6
Input
$3 / 1M tokens
Output
$15 / 1M tokens
  • The best balance of speed and intelligence in the family
  • Near-frontier on routine work at a fraction of Opus cost
  • Our default for high-volume steps. We route the hard ones to Opus
Still supported · brief not yet published
CurrentOct 2025

Claude Haiku 4.5

claude-haiku-4-5
Input
$1 / 1M tokens
Output
$5 / 1M tokens
  • Fastest and most cost-effective tier for simple, high-volume tasks
  • 200K context. Right for classification, extraction, and routing
  • We use it for the cheap steps inside a larger agentic workflow
Still supported · brief not yet published
[METHODOLOGY · K-FRAMEWORK]

Integrated through the
K-Framework.

Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.

Read the K-Framework
01

Foundations

Direct API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.

02

Amplification

An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.

03

Judgment

Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.

[OBSERVABILITY]

Observability your team can read.

A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.

Instrumented

Cost per call

Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.

Instrumented

Latency p50 / p95 / p99

Real distributions, not averages. We know which routes are slow, and why.

Instrumented

Eval pass rates

The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.

Instrumented

Prompt + completion logs

PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.

Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.

[COMMON QUESTIONS]

Questions we get asked.

Claude or GPT?
It depends on the task. We pick per workload using an eval set built from your real inputs, and the abstraction lets you run both or switch later. Long-context and careful tool use often favor Claude; we still measure rather than assume.
Do you use LangChain?
No. We integrate against the model API directly, the same way we integrate against Postgres. Frameworks add abstraction and breakage we do not need for production reliability.
APPLIED K-FRAMEWORK

Bring the problem.
We’ll bring the build.

Senior engineers, eval suite at handoff, full source ownership. Sprint, program, or ongoing. We shape the engagement to the work.