DocuExtract API

Send a document, get JSON back. No templates. No training. Works in 5 minutes.

DocuExtract converts unstructured documents — invoices, receipts, contracts, resumes, bank statements — into clean, validated JSON using Claude AI. You send a document (image, PDF, or URL), specify what you want extracted, and receive structured data in seconds.

Base URLhttps://docuextract.dev/v1
FeatureDetail
AuthenticationBearer token (API key)
Request formatJSON (Content-Type: application/json)
Response formatJSON
Max file size10 MB
Supported formatsPDF, PNG, JPG, WEBP (base64 or URL)
Default modelClaude Haiku 4.5 (fast)
Accurate modelClaude Sonnet 4.6 (complex documents)

Quick Start

Extract structured data from a document in 3 steps.

1

Get your API key

Go to your dashboard to get your free API key. It looks like dk_live_xxxxxxxxxxxxxxxx.

2

Make your first extraction

Send a document URL (or base64) to /v1/extract:

bash
curl https://docuextract.dev/v1/extract \
  -H "Authorization: Bearer dk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "https://example.com/invoice.pdf",
    "type": "invoice"
  }'
javascript
const response = await fetch('https://docuextract.dev/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer dk_live_YOUR_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    document: 'https://example.com/invoice.pdf',
    type: 'invoice',
  }),
});

const result = await response.json();
console.log(result.data);
// { vendor_name: "Acme Corp", total_amount: 1250.00, ... }
python
import requests

response = requests.post(
    'https://docuextract.dev/v1/extract',
    headers={
        'Authorization': 'Bearer dk_live_YOUR_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'document': 'https://example.com/invoice.pdf',
        'type': 'invoice',
    }
)

result = response.json()
print(result['data'])
# {'vendor_name': 'Acme Corp', 'total_amount': 1250.0, ...}
3

Use the structured data

The response contains the extracted fields, confidence score, and processing metadata:

json
{
  "data": {
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-0847",
    "invoice_date": "2024-03-15",
    "due_date": "2024-04-15",
    "subtotal": 1000.00,
    "tax_amount": 250.00,
    "total_amount": 1250.00,
    "currency": "USD",
    "line_items": [
      { "description": "Consulting services", "quantity": 10, "unit_price": 100.00, "total": 1000.00 }
    ]
  },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

Authentication

All API endpoints (except GET /v1/health) require authentication via a Bearer token in the Authorization header.

http
Authorization: Bearer dk_live_xxxxxxxxxxxxxxxxxxxxxxxx

API Key Format

API keys start with dk_live_ followed by 32 random characters. Keys are generated when you sign up and can be regenerated from your your dashboard page.

Keep your API key secretNever expose your API key in client-side code or public repositories. Use environment variables to store it securely.

Rate Limit Headers

Every authenticated response includes rate limit information:

HeaderDescription
X-RateLimit-Limit-MinuteMaximum requests per minute for your plan
X-RateLimit-Remaining-MinuteRemaining requests this minute
X-RateLimit-Limit-MonthMaximum extractions per month for your plan
X-RateLimit-Remaining-MonthRemaining extractions this month

POST /v1/extract

Extract structured data from a document. This is the core endpoint.

POSThttps://docuextract.dev/v1/extract

Request Body

FieldTypeRequiredDescription
documentstringYesDocument as base64-encoded string or a publicly accessible URL
typestringNoDocument type hint. One of: invoice, receipt, bank_statement, resume, contract, form, id_document. Auto-detected if omitted.
modelstringNo"fast" (default) or "accurate". Fast uses Claude Haiku; accurate uses Claude Sonnet for complex/multi-page documents.
schemaobjectNoCustom JSON schema describing the fields to extract. When provided, the extraction is guided by your schema.

Response

json
{
  "data": { /* extracted fields */ },
  "metadata": {
    "type": "invoice",
    "confidence": 0.96,
    "model": "claude-haiku-4-5-20251001",
    "processing_time_ms": 1847,
    "page_count": 1
  }
}

POST /v1/detect

Detect the type of a document without extracting its data.

POSThttps://docuextract.dev/v1/detect

Response

json
{
  "type": "invoice",
  "confidence": 0.98
}

GET /v1/usage

Retrieve your current usage statistics for the billing period.

GEThttps://docuextract.dev/v1/usage

Response

json
{
  "used": 847,
  "limit": 5000,
  "plan": "pro",
  "period_end": "2024-04-24",
  "breakdown": [
    { "date": "2024-03-24", "count": 42 }
  ]
}

GET /v1/health

Health check endpoint. No authentication required.

GEThttps://docuextract.dev/v1/health
json
{ "status": "ok", "version": "1.0.0" }

POST /v1/billing/checkout

Create a Stripe Checkout session to subscribe to a paid plan. Returns a URL to redirect your user to for payment.

POSThttps://docuextract.dev/v1/billing/checkout

Request Body

FieldTypeRequiredDescription
planstringYesPlan to subscribe to. One of: starter, pro, scale

Response

json
{ "url": "https://checkout.stripe.com/c/pay/cs_live_..." }

Redirect the user to this URL. After payment, Stripe redirects back to your dashboard.


POST /v1/billing/portal

Create a Stripe Billing Portal session for subscription management.

POSThttps://docuextract.dev/v1/billing/portal
Requires an active Stripe subscription. Free plan users will receive a 400 error.

Response

json
{ "url": "https://billing.stripe.com/p/session/..." }

Webhooks

Webhooks send real-time HTTP POST notifications to your server when events occur in DocuExtract — extractions complete, usage limits approach, or billing events happen.

Instead of polling the API, register an HTTPS endpoint and we'll push events to you. Each delivery is signed with HMAC-SHA256 so you can verify authenticity.

How it works

  1. Register a webhook endpoint in your Dashboard → Webhooks
  2. Select which events to subscribe to
  3. Copy the signing secret (shown once)
  4. We POST a JSON payload to your URL when subscribed events occur
  5. Verify the signature and process the event

Webhooks use hybrid payloads: the webhook body contains a summary (event type, extraction ID, confidence, document type). To get the full extracted data, call GET /v1/extractions/{{extraction_id}} using the ext_ ID from the payload.


Event Types

DocuExtract supports 9 event types across three categories. Event access is gated by plan.

EventCategoryDescriptionPlans
extraction.completedCoreExtraction finished successfullyAll
extraction.failedCoreExtraction encountered an errorAll
usage.limit.approachingUsageUsage crossed 80% of monthly limitStarter+
usage.limit.reachedUsageMonthly extraction limit exhaustedStarter+
subscription.createdBillingNew subscription createdPro+
subscription.updatedBillingSubscription plan changedPro+
subscription.cancelledBillingSubscription cancelledPro+
invoice.payment_succeededBillingInvoice payment processedPro+
invoice.payment_failedBillingInvoice payment failedPro+

Endpoint limits by plan

PlanEndpointsAvailable Events
Free1Core events only
Starter3Core + Usage
Pro5All events
Scale10All events

Payload Format

Every webhook delivery sends a JSON payload in the following envelope format:

json
{
  "id": "evt_a1b2c3d4e5f67890",
  "event": "extraction.completed",
  "created": 1775059200,
  "data": {
    "extractionId": "ext_9z8y7x42",
    "status": "success",
    "confidence": 0.9942,
    "documentType": "invoice"
  }
}

Headers

HeaderDescription
X-DocuExtract-SignatureHMAC-SHA256 signature: sha256=<hex>
X-DocuExtract-EventEvent type (e.g. extraction.completed)
X-DocuExtract-DeliveryUnique delivery ID (UUID) for idempotency
Content-Typeapplication/json
User-AgentDocuExtract-Webhooks/1.0

Signature Verification

Every webhook delivery includes an X-DocuExtract-Signature header containing an HMAC-SHA256 signature of the request body using your endpoint's signing secret. Always verify this signature before processing events.

Node.js

javascript
const crypto = require('crypto');

function verifyWebhook(rawBody, signatureHeader, secret) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(rawBody)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signatureHeader)
  );
}

// Express example
app.post('/webhook', express.raw({ type: 'application/json' }), (req, res) => {
  const signature = req.headers['x-docuextract-signature'];
  if (!verifyWebhook(req.body, signature, process.env.WEBHOOK_SECRET)) {
    return res.status(401).send('Invalid signature');
  }
  const event = JSON.parse(req.body);
  console.log('Received:', event.event, event.data);
  res.status(200).send('OK');
});

Python

python
import hmac
import hashlib
import json
from flask import Flask, request, abort

app = Flask(__name__)
WEBHOOK_SECRET = "whsec_your_signing_secret"

def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode(), payload, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.route("/webhook", methods=["POST"])
def handle_webhook():
    signature = request.headers.get("X-DocuExtract-Signature", "")
    if not verify_signature(request.data, signature, WEBHOOK_SECRET):
        abort(401)
    event = json.loads(request.data)
    print(f"Received: {event['event']}", event["data"])
    return "OK", 200

Retry Policy

If your endpoint returns a non-2xx status code or times out, DocuExtract retries the delivery automatically:

AttemptDelayTimeout
1st (initial)Immediate10 seconds
2nd retry~30 seconds10 seconds
3rd retry~5 minutes10 seconds
4th retry (final)~30 minutes10 seconds

After 4 failed attempts, the delivery is marked as failed. You can view delivery history in your dashboard and use the "Send Test" button to verify connectivity.


Best Practices

  • Respond quickly. Return a 2xx status within 5 seconds. If you need to do heavy processing, acknowledge the webhook first, then process asynchronously.
  • Verify signatures. Always validate the X-DocuExtract-Signature header before processing. Never skip this step.
  • Handle duplicates. Use the X-DocuExtract-Delivery header (unique UUID per delivery attempt) as an idempotency key. Store processed delivery IDs and skip duplicates.
  • Use HTTPS only. Webhook endpoints must use HTTPS. HTTP URLs are rejected at registration time.
  • Fetch full data separately. Webhook payloads contain summaries. For full extraction results, call GET /v1/extractions/{id} with the extraction_id from the payload.
  • Monitor delivery health. Check your webhook delivery logs in the dashboard to catch failures early. Use the "Send Test" button after deploying endpoint changes.

Document Types

DocuExtract automatically detects document types, or you can specify one explicitly.

TypeDescriptionKey Fields Extracted
invoiceVendor invoices and billing statementsvendor name, invoice number, dates, line items, totals, payment terms
receiptPurchase receipts from retail, restaurants, etc.merchant name, date, items purchased, subtotal, tax, total, payment method
bank_statementBank and credit card statementsaccount number, period, opening/closing balance, transactions
resumeCVs and resumesname, contact info, work experience, education, skills
contractLegal agreements and contractsparties, effective date, termination date, key obligations, governing law
formFilled forms (applications, surveys, intake forms)all labeled fields and their values
id_documentID cards, passports, driver's licensesname, date of birth, expiry, document number, issuing authority
unknownFallback for unrecognized typesbest-effort extraction of all visible structured data

Error Codes

All errors return a JSON response with an error object containing code and message fields.

json
{
  "error": {
    "code": "unauthorized",
    "message": "Invalid or missing API key"
  }
}

4xx Client Errors

401 — unauthorized
The API key is missing, malformed, or revoked.
400 — invalid_request
A required field is missing or a field value is invalid (includes inaccessible document URLs).
413 — file_too_large
The document exceeds the 10 MB size limit.
415 — unsupported_format
The file format is not supported. Use PDF, PNG, JPG, or WEBP.
422 — extraction_failed
The AI extraction failed after retrying. Try again. If it persists, the document may be corrupted or too complex.
429 — rate_limited
You've exceeded your per-minute or per-month rate limit. Check the Retry-After header. Upgrade your plan for higher limits.

5xx Server Errors

500 — internal_error
Unexpected server error. Please try again or contact support.

Pricing

Simple, transparent pricing. No credit multipliers, no enterprise-gating.

Free
$0/mo
50 extractions/mo
  • 5 req/min rate limit
  • Haiku model only
  • No credit card required
Get started free →
Best Value
Pro
$99/mo
5,000 extractions/mo
  • 60 req/min rate limit
  • Haiku + Sonnet (3x cost)
  • Priority support
Get started →
Scale
$249/mo
20,000 extractions/mo
  • 120 req/min rate limit
  • All models + Priority
  • SLA + dedicated support
Get started →

Overage Pricing

When you exceed your monthly quota, additional extractions are billed at per-plan rates: $0.04/call (Starter), $0.025/call (Pro), $0.015/call (Scale). Free plan blocks requests at the limit.

TipMonitor your usage with GET /v1/usage or check the X-RateLimit-Remaining-Month header.