DocuExtract API

Send a document, get JSON back. No templates. No training. Works in 5 minutes.

DocuExtract converts unstructured documents — invoices, receipts, contracts, resumes, bank statements — into clean, validated JSON using Claude AI. You send a document (image, PDF, or URL), specify what you want extracted, and receive structured data in seconds.

Base URLhttps://docuextract.dev/v1
FeatureDetail
AuthenticationBearer token (API key)
Request formatJSON (Content-Type: application/json)
Response formatJSON
Max file size10 MB
Supported formatsPDF, PNG, JPG, WEBP (base64 or URL)
Default modelClaude Haiku 4.5 (fast)
Accurate modelClaude Sonnet 4.6 (complex documents)

Quick Start

Extract structured data from a document in 3 steps.

1

Get your API key

Sign up at your dashboard to get your free API key. It looks like dk_live_xxxxxxxxxxxxxxxx.

2

Make your first extraction

Send a document URL (or base64) to /v1/extract:

bash
curl https://docuextract.dev/v1/extract \
  -H "Authorization: Bearer dk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "https://example.com/invoice.pdf",
    "type": "invoice"
  }'
javascript
const response = await fetch('https://docuextract.dev/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer dk_live_YOUR_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    document: 'https://example.com/invoice.pdf',
    type: 'invoice',
  }),
});

const result = await response.json();
console.log(result.data);
// { vendor_name: "Acme Corp", total_amount: 1250.00, ... }
python
import requests

response = requests.post(
    'https://docuextract.dev/v1/extract',
    headers={
        'Authorization': 'Bearer dk_live_YOUR_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'document': 'https://example.com/invoice.pdf',
        'type': 'invoice',
    }
)

result = response.json()
print(result['data'])
# {'vendor_name': 'Acme Corp', 'total_amount': 1250.0, ...}
3

Use the structured data

The response contains the extracted fields, confidence score, and processing metadata:

json
{
  "data": {
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-0847",
    "invoice_date": "2024-03-15",
    "due_date": "2024-04-15",
    "subtotal": 1000.00,
    "tax_amount": 250.00,
    "total_amount": 1250.00,
    "currency": "USD",
    "line_items": [
      { "description": "Consulting services", "quantity": 10, "unit_price": 100.00, "total": 1000.00 }
    ]
  },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

Authentication

All API endpoints (except GET /v1/health) require authentication via a Bearer token in the Authorization header.

http
Authorization: Bearer dk_live_xxxxxxxxxxxxxxxxxxxxxxxx

API Key Format

API keys start with dk_live_ followed by 32 random characters. Keys are generated when you sign up and can be regenerated from your dashboard.

Keep your API key secretNever expose your API key in client-side code or public repositories. Use environment variables to store it securely.

Rate Limit Headers

Every authenticated response includes rate limit information:

HeaderDescription
X-RateLimit-Limit-MinuteMaximum requests per minute for your plan
X-RateLimit-Remaining-MinuteRemaining requests this minute
X-RateLimit-Limit-MonthMaximum extractions per month for your plan
X-RateLimit-Remaining-MonthRemaining extractions this month

POST /v1/extract

Extract structured data from a document. This is the core endpoint.

POSThttps://docuextract.dev/v1/extract

Request Body

FieldTypeRequiredDescription
documentstringYesDocument as base64-encoded string or a publicly accessible URL
typestringNoDocument type hint. One of: invoice, receipt, bank_statement, resume, contract, form, id_document. Auto-detected if omitted.
modelstringNo"fast" (default) or "accurate". Fast uses Claude Haiku; accurate uses Claude Sonnet for complex/multi-page documents.
schemaobjectNoCustom JSON schema describing the fields to extract. When provided, the extraction is guided by your schema.

Response

json
{
  "data": { /* extracted fields */ },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

POST /v1/detect

Detect the type of a document without extracting its data.

POSThttps://docuextract.dev/v1/detect

Response

json
{
  "type": "invoice",
  "confidence": 0.98
}

GET /v1/usage

Retrieve your current usage statistics for the billing period.

GEThttps://docuextract.dev/v1/usage

Response

json
{
  "used": 847,
  "limit": 10000,
  "plan": "pro",
  "period_end": "2024-04-24",
  "breakdown": [
    { "date": "2024-03-24", "count": 42 }
  ]
}

GET /v1/health

Health check endpoint. No authentication required.

GEThttps://docuextract.dev/v1/health
json
{ "status": "ok", "version": "1.0.0" }

POST /v1/billing/checkout

Create a Stripe Checkout session to subscribe to a paid plan. Returns a URL to redirect your user to for payment.

POSThttps://docuextract.dev/v1/billing/checkout

Request Body

FieldTypeRequiredDescription
planstringYesPlan to subscribe to. One of: starter, pro, scale

Response

json
{ "url": "https://checkout.stripe.com/c/pay/cs_live_..." }

Redirect the user to this URL. After payment, Stripe redirects back to your dashboard.


POST /v1/billing/portal

Create a Stripe Billing Portal session for subscription management.

POSThttps://docuextract.dev/v1/billing/portal
Requires an active Stripe subscription. Free plan users will receive a 400 error.

Response

json
{ "url": "https://billing.stripe.com/p/session/..." }

Document Types

DocuExtract automatically detects document types, or you can specify one explicitly.

TypeDescriptionKey Fields Extracted
invoiceVendor invoices and billing statementsvendor name, invoice number, dates, line items, totals, payment terms
receiptPurchase receipts from retail, restaurants, etc.merchant name, date, items purchased, subtotal, tax, total, payment method
bank_statementBank and credit card statementsaccount number, period, opening/closing balance, transactions
resumeCVs and resumesname, contact info, work experience, education, skills
contractLegal agreements and contractsparties, effective date, termination date, key obligations, governing law
formFilled forms (applications, surveys, intake forms)all labeled fields and their values
id_documentID cards, passports, driver's licensesname, date of birth, expiry, document number, issuing authority
unknownFallback for unrecognized typesbest-effort extraction of all visible structured data

Error Codes

All errors return a JSON response with an error object containing code and message fields.

json
{
  "error": {
    "code": "unauthorized",
    "message": "Invalid or missing API key"
  }
}

4xx Client Errors

401 — unauthorized
The API key is missing, malformed, or revoked.
400 — invalid_request
A required field is missing or a field value is invalid (includes inaccessible document URLs).
413 — file_too_large
The document exceeds the 10 MB size limit.
415 — unsupported_format
The file format is not supported. Use PDF, PNG, JPG, or WEBP.
422 — extraction_failed
The AI extraction failed after retrying. Try again. If it persists, the document may be corrupted or too complex.
429 — rate_limited
You've exceeded your per-minute or per-month rate limit. Check the Retry-After header. Upgrade your plan for higher limits.

5xx Server Errors

500 — internal_error
Unexpected server error. Please try again or contact support.

Pricing

Simple, transparent pricing. No credit multipliers, no enterprise-gating.

Free
$0/mo
100 extractions/mo
  • 10 req/min rate limit
  • Haiku model only
  • No credit card required
Get started free →
Best Value
Pro
$99/mo
10,000 extractions/mo
  • 60 req/min rate limit
  • Haiku + Sonnet
  • Priority support
Get started →
Scale
$249/mo
50,000 extractions/mo
  • 120 req/min rate limit
  • All models + Priority
  • SLA + dedicated support
Get started →

Overage Pricing

When you exceed your monthly quota, additional extractions are billed at $0.05 per extraction.

TipMonitor your usage with GET /v1/usage or check the X-RateLimit-Remaining-Month header.