DocuExtract API

Send a document, get JSON back. No templates. No training. Works in 5 minutes.

DocuExtract converts unstructured documents — invoices, receipts, contracts, resumes, bank statements — into clean, validated JSON using Claude AI. You send a document (image, PDF, or URL), specify what you want extracted, and receive structured data in seconds.

Base URLhttps://docuextract.dev/v1

Feature	Detail
Authentication	Bearer token (API key)
Request format	JSON (`Content-Type: application/json`)
Response format	JSON
Max file size	10 MB
Supported formats	PDF, PNG, JPG, WEBP (base64 or URL)
Default model	Claude Haiku 4.5 (fast)
Accurate model	Claude Sonnet 4.6 (complex documents)

Quick Start

Extract structured data from a document in 3 steps.

Get your API key

Go to your dashboard to get your free API key. It looks like dk_live_xxxxxxxxxxxxxxxx.

Make your first extraction

Send a document URL (or base64) to /v1/extract:

bash

curl https://docuextract.dev/v1/extract \
  -H "Authorization: Bearer dk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "https://example.com/invoice.pdf",
    "type": "invoice"
  }'

javascript

const response = await fetch('https://docuextract.dev/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer dk_live_YOUR_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    document: 'https://example.com/invoice.pdf',
    type: 'invoice',
  }),
});

const result = await response.json();
console.log(result.data);
// { vendor_name: "Acme Corp", total_amount: 1250.00, ... }

python

import requests

response = requests.post(
    'https://docuextract.dev/v1/extract',
    headers={
        'Authorization': 'Bearer dk_live_YOUR_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'document': 'https://example.com/invoice.pdf',
        'type': 'invoice',
    }
)

result = response.json()
print(result['data'])
# {'vendor_name': 'Acme Corp', 'total_amount': 1250.0, ...}

Use the structured data

The response contains the extracted fields, confidence score, and processing metadata:

json

{
  "data": {
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-0847",
    "invoice_date": "2024-03-15",
    "due_date": "2024-04-15",
    "subtotal": 1000.00,
    "tax_amount": 250.00,
    "total_amount": 1250.00,
    "currency": "USD",
    "line_items": [
      { "description": "Consulting services", "quantity": 10, "unit_price": 100.00, "total": 1000.00 }
    ]
  },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

Authentication

All API endpoints (except GET /v1/health) require authentication via a Bearer token in the Authorization header.

http

Authorization: Bearer dk_live_xxxxxxxxxxxxxxxxxxxxxxxx

API Key Format

API keys start with dk_live_ followed by 32 random characters. Keys are generated when you sign up and can be regenerated from your your dashboard page.

Keep your API key secretNever expose your API key in client-side code or public repositories. Use environment variables to store it securely.

Rate Limit Headers

Every authenticated response includes rate limit information:

Header	Description
`X-RateLimit-Limit-Minute`	Maximum requests per minute for your plan
`X-RateLimit-Remaining-Minute`	Remaining requests this minute
`X-RateLimit-Limit-Month`	Maximum extractions per month for your plan
`X-RateLimit-Remaining-Month`	Remaining extractions this month

POST /v1/extract

Extract structured data from a document. This is the core endpoint.

POSThttps://docuextract.dev/v1/extract

Request Body

Field	Type	Required	Description
`document`	string	Yes	Document as base64-encoded string or a publicly accessible URL
`type`	string	No	Document type hint. One of: `invoice`, `receipt`, `bank_statement`, `resume`, `contract`, `form`, `id_document`. Auto-detected if omitted.
`model`	string	No	`"fast"` (default) or `"accurate"`. Fast uses Claude Haiku; accurate uses Claude Sonnet for complex/multi-page documents.
`schema`	object	No	Custom JSON schema describing the fields to extract. When provided, the extraction is guided by your schema.

Response

json

{
  "data": { /* extracted fields */ },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

POST /v1/detect

Detect the type of a document without extracting its data.

POSThttps://docuextract.dev/v1/detect

Response

json

{
  "type": "invoice",
  "confidence": 0.98
}

GET /v1/usage

Retrieve your current usage statistics for the billing period.

GEThttps://docuextract.dev/v1/usage

Response

json

{
  "used": 847,
  "limit": 5000,
  "plan": "pro",
  "period_end": "2024-04-24",
  "breakdown": [
    { "date": "2024-03-24", "count": 42 }
  ]
}

GET /v1/health

Health check endpoint. No authentication required.

GEThttps://docuextract.dev/v1/health

json

{ "status": "ok", "version": "1.0.0" }

POST /v1/billing/checkout

Create a Stripe Checkout session to subscribe to a paid plan. Returns a URL to redirect your user to for payment.

POSThttps://docuextract.dev/v1/billing/checkout

Request Body

Field	Type	Required	Description
`plan`	string	Yes	Plan to subscribe to. One of: `starter`, `pro`, `scale`

Response

json

{ "url": "https://checkout.stripe.com/c/pay/cs_live_..." }

Redirect the user to this URL. After payment, Stripe redirects back to your dashboard.

POST /v1/billing/portal

Create a Stripe Billing Portal session for subscription management.

POSThttps://docuextract.dev/v1/billing/portal

Requires an active Stripe subscription. Free plan users will receive a 400 error.

Response

json

{ "url": "https://billing.stripe.com/p/session/..." }

Webhooks

Webhooks send real-time HTTP POST notifications to your server when events occur in DocuExtract — extractions complete, usage limits approach, or billing events happen.

Instead of polling the API, register an HTTPS endpoint and we'll push events to you. Each delivery is signed with HMAC-SHA256 so you can verify authenticity.

How it works

Register a webhook endpoint in your Dashboard → Webhooks
Select which events to subscribe to
Copy the signing secret (shown once)
We POST a JSON payload to your URL when subscribed events occur
Verify the signature and process the event

Webhooks use hybrid payloads: the webhook body contains a summary (event type, extraction ID, confidence, document type). To get the full extracted data, call GET /v1/extractions/{{extraction_id}} using the ext_ ID from the payload.

Event Types

DocuExtract supports 9 event types across three categories. Event access is gated by plan.

Event	Category	Description	Plans
`extraction.completed`	Core	Extraction finished successfully	All
`extraction.failed`	Core	Extraction encountered an error	All
`usage.limit.approaching`	Usage	Usage crossed 80% of monthly limit	Starter+
`usage.limit.reached`	Usage	Monthly extraction limit exhausted	Starter+
`subscription.created`	Billing	New subscription created	Pro+
`subscription.updated`	Billing	Subscription plan changed	Pro+
`subscription.cancelled`	Billing	Subscription cancelled	Pro+
`invoice.payment_succeeded`	Billing	Invoice payment processed	Pro+
`invoice.payment_failed`	Billing	Invoice payment failed	Pro+

Endpoint limits by plan

Plan	Endpoints	Available Events
Free	1	Core events only
Starter	3	Core + Usage
Pro	5	All events
Scale	10	All events

Payload Format

Every webhook delivery sends a JSON payload in the following envelope format:

json

{
  "id": "evt_a1b2c3d4e5f67890",
  "event": "extraction.completed",
  "created": 1775059200,
  "data": {
    "extractionId": "ext_9z8y7x42",
    "status": "success",
    "confidence": 0.9942,
    "documentType": "invoice"
  }
}

Headers

Header	Description
`X-DocuExtract-Signature`	HMAC-SHA256 signature: `sha256=<hex>`
`X-DocuExtract-Event`	Event type (e.g. `extraction.completed`)
`X-DocuExtract-Delivery`	Unique delivery ID (UUID) for idempotency
`Content-Type`	`application/json`
`User-Agent`	`DocuExtract-Webhooks/1.0`

Signature Verification

Every webhook delivery includes an X-DocuExtract-Signature header containing an HMAC-SHA256 signature of the request body using your endpoint's signing secret. Always verify this signature before processing events.

Node.js

javascript

const crypto = require('crypto');

function verifyWebhook(rawBody, signatureHeader, secret) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(rawBody)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signatureHeader)
  );
}

// Express example
app.post('/webhook', express.raw({ type: 'application/json' }), (req, res) => {
  const signature = req.headers['x-docuextract-signature'];
  if (!verifyWebhook(req.body, signature, process.env.WEBHOOK_SECRET)) {
    return res.status(401).send('Invalid signature');
  }
  const event = JSON.parse(req.body);
  console.log('Received:', event.event, event.data);
  res.status(200).send('OK');
});

Python

python

import hmac
import hashlib
import json
from flask import Flask, request, abort

app = Flask(__name__)
WEBHOOK_SECRET = "whsec_your_signing_secret"

def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode(), payload, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.route("/webhook", methods=["POST"])
def handle_webhook():
    signature = request.headers.get("X-DocuExtract-Signature", "")
    if not verify_signature(request.data, signature, WEBHOOK_SECRET):
        abort(401)
    event = json.loads(request.data)
    print(f"Received: {event['event']}", event["data"])
    return "OK", 200

Retry Policy

If your endpoint returns a non-2xx status code or times out, DocuExtract retries the delivery automatically:

Attempt	Delay	Timeout
1st (initial)	Immediate	10 seconds
2nd retry	~30 seconds	10 seconds
3rd retry	~5 minutes	10 seconds
4th retry (final)	~30 minutes	10 seconds

After 4 failed attempts, the delivery is marked as failed. You can view delivery history in your dashboard and use the "Send Test" button to verify connectivity.

Best Practices

Respond quickly. Return a 2xx status within 5 seconds. If you need to do heavy processing, acknowledge the webhook first, then process asynchronously.
Verify signatures. Always validate the X-DocuExtract-Signature header before processing. Never skip this step.
Handle duplicates. Use the X-DocuExtract-Delivery header (unique UUID per delivery attempt) as an idempotency key. Store processed delivery IDs and skip duplicates.
Use HTTPS only. Webhook endpoints must use HTTPS. HTTP URLs are rejected at registration time.
Fetch full data separately. Webhook payloads contain summaries. For full extraction results, call GET /v1/extractions/{id} with the extraction_id from the payload.
Monitor delivery health. Check your webhook delivery logs in the dashboard to catch failures early. Use the "Send Test" button after deploying endpoint changes.

Document Types

DocuExtract automatically detects document types, or you can specify one explicitly.

Type	Description	Key Fields Extracted
`invoice`	Vendor invoices and billing statements	vendor name, invoice number, dates, line items, totals, payment terms
`receipt`	Purchase receipts from retail, restaurants, etc.	merchant name, date, items purchased, subtotal, tax, total, payment method
`bank_statement`	Bank and credit card statements	account number, period, opening/closing balance, transactions
`resume`	CVs and resumes	name, contact info, work experience, education, skills
`contract`	Legal agreements and contracts	parties, effective date, termination date, key obligations, governing law
`form`	Filled forms (applications, surveys, intake forms)	all labeled fields and their values
`id_document`	ID cards, passports, driver's licenses	name, date of birth, expiry, document number, issuing authority
`unknown`	Fallback for unrecognized types	best-effort extraction of all visible structured data

Error Codes

All errors return a JSON response with an error object containing code and message fields.

json

{
  "error": {
    "code": "unauthorized",
    "message": "Invalid or missing API key"
  }
}

4xx Client Errors

401 — unauthorized

The API key is missing, malformed, or revoked.

400 — invalid_request

A required field is missing or a field value is invalid (includes inaccessible document URLs).

413 — file_too_large

The document exceeds the 10 MB size limit.

415 — unsupported_format

The file format is not supported. Use PDF, PNG, JPG, or WEBP.

422 — extraction_failed

The AI extraction failed after retrying. Try again. If it persists, the document may be corrupted or too complex.

429 — rate_limited

You've exceeded your per-minute or per-month rate limit. Check the Retry-After header. Upgrade your plan for higher limits.

5xx Server Errors

500 — internal_error

Unexpected server error. Please try again or contact support.

Pricing

Simple, transparent pricing. No credit multipliers, no enterprise-gating.

Free

$0/mo

50 extractions/mo

5 req/min rate limit
Haiku model only
No credit card required

Overage Pricing

When you exceed your monthly quota, additional extractions are billed at per-plan rates: $0.04/call (Starter), $0.025/call (Pro), $0.015/call (Scale). Free plan blocks requests at the limit.

TipMonitor your usage with GET /v1/usage or check the X-RateLimit-Remaining-Month header.