DocuExtract API
Send a document, get JSON back. No templates. No training. Works in 5 minutes.
DocuExtract converts unstructured documents — invoices, receipts, contracts, resumes, bank statements — into clean, validated JSON using Claude AI. You send a document (image, PDF, or URL), specify what you want extracted, and receive structured data in seconds.
https://docuextract.dev/v1| Feature | Detail |
|---|---|
| Authentication | Bearer token (API key) |
| Request format | JSON (Content-Type: application/json) |
| Response format | JSON |
| Max file size | 10 MB |
| Supported formats | PDF, PNG, JPG, WEBP (base64 or URL) |
| Default model | Claude Haiku 4.5 (fast) |
| Accurate model | Claude Sonnet 4.6 (complex documents) |
Quick Start
Extract structured data from a document in 3 steps.
Get your API key
Go to your dashboard to get your free API key. It looks like dk_live_xxxxxxxxxxxxxxxx.
Make your first extraction
Send a document URL (or base64) to /v1/extract:
curl https://docuextract.dev/v1/extract \ -H "Authorization: Bearer dk_live_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "document": "https://example.com/invoice.pdf", "type": "invoice" }'
const response = await fetch('https://docuextract.dev/v1/extract', { method: 'POST', headers: { 'Authorization': 'Bearer dk_live_YOUR_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ document: 'https://example.com/invoice.pdf', type: 'invoice', }), }); const result = await response.json(); console.log(result.data); // { vendor_name: "Acme Corp", total_amount: 1250.00, ... }
import requests response = requests.post( 'https://docuextract.dev/v1/extract', headers={ 'Authorization': 'Bearer dk_live_YOUR_KEY', 'Content-Type': 'application/json', }, json={ 'document': 'https://example.com/invoice.pdf', 'type': 'invoice', } ) result = response.json() print(result['data']) # {'vendor_name': 'Acme Corp', 'total_amount': 1250.0, ...}
Use the structured data
The response contains the extracted fields, confidence score, and processing metadata:
{
"data": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2024-0847",
"invoice_date": "2024-03-15",
"due_date": "2024-04-15",
"subtotal": 1000.00,
"tax_amount": 250.00,
"total_amount": 1250.00,
"currency": "USD",
"line_items": [
{ "description": "Consulting services", "quantity": 10, "unit_price": 100.00, "total": 1000.00 }
]
},
"metadata": {
"type": "invoice",
"confidence": 0.96,
"model": "claude-haiku-4-5-20251001",
"processing_time_ms": 1847,
"page_count": 1
}
}Authentication
All API endpoints (except GET /v1/health) require authentication via a Bearer token in the Authorization header.
Authorization: Bearer dk_live_xxxxxxxxxxxxxxxxxxxxxxxxAPI Key Format
API keys start with dk_live_ followed by 32 random characters. Keys are generated when you sign up and can be regenerated from your your dashboard page.
Rate Limit Headers
Every authenticated response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit-Minute | Maximum requests per minute for your plan |
X-RateLimit-Remaining-Minute | Remaining requests this minute |
X-RateLimit-Limit-Month | Maximum extractions per month for your plan |
X-RateLimit-Remaining-Month | Remaining extractions this month |
POST /v1/extract
Extract structured data from a document. This is the core endpoint.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
document | string | Yes | Document as base64-encoded string or a publicly accessible URL |
type | string | No | Document type hint. One of: invoice, receipt, bank_statement, resume, contract, form, id_document. Auto-detected if omitted. |
model | string | No | "fast" (default) or "accurate". Fast uses Claude Haiku; accurate uses Claude Sonnet for complex/multi-page documents. |
schema | object | No | Custom JSON schema describing the fields to extract. When provided, the extraction is guided by your schema. |
Response
{
"data": { /* extracted fields */ },
"metadata": {
"type": "invoice",
"confidence": 0.96,
"model": "claude-haiku-4-5-20251001",
"processing_time_ms": 1847,
"page_count": 1
}
}POST /v1/detect
Detect the type of a document without extracting its data.
Response
{
"type": "invoice",
"confidence": 0.98
}GET /v1/usage
Retrieve your current usage statistics for the billing period.
Response
{
"used": 847,
"limit": 5000,
"plan": "pro",
"period_end": "2024-04-24",
"breakdown": [
{ "date": "2024-03-24", "count": 42 }
]
}GET /v1/health
Health check endpoint. No authentication required.
{ "status": "ok", "version": "1.0.0" }POST /v1/billing/checkout
Create a Stripe Checkout session to subscribe to a paid plan. Returns a URL to redirect your user to for payment.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
plan | string | Yes | Plan to subscribe to. One of: starter, pro, scale |
Response
{ "url": "https://checkout.stripe.com/c/pay/cs_live_..." }Redirect the user to this URL. After payment, Stripe redirects back to your dashboard.
POST /v1/billing/portal
Create a Stripe Billing Portal session for subscription management.
Response
{ "url": "https://billing.stripe.com/p/session/..." }Webhooks
Webhooks send real-time HTTP POST notifications to your server when events occur in DocuExtract — extractions complete, usage limits approach, or billing events happen.
Instead of polling the API, register an HTTPS endpoint and we'll push events to you. Each delivery is signed with HMAC-SHA256 so you can verify authenticity.
How it works
- Register a webhook endpoint in your Dashboard → Webhooks
- Select which events to subscribe to
- Copy the signing secret (shown once)
- We POST a JSON payload to your URL when subscribed events occur
- Verify the signature and process the event
Webhooks use hybrid payloads: the webhook body contains a summary (event type, extraction ID, confidence, document type). To get the full extracted data, call GET /v1/extractions/{{extraction_id}} using the ext_ ID from the payload.
Event Types
DocuExtract supports 9 event types across three categories. Event access is gated by plan.
| Event | Category | Description | Plans |
|---|---|---|---|
extraction.completed | Core | Extraction finished successfully | All |
extraction.failed | Core | Extraction encountered an error | All |
usage.limit.approaching | Usage | Usage crossed 80% of monthly limit | Starter+ |
usage.limit.reached | Usage | Monthly extraction limit exhausted | Starter+ |
subscription.created | Billing | New subscription created | Pro+ |
subscription.updated | Billing | Subscription plan changed | Pro+ |
subscription.cancelled | Billing | Subscription cancelled | Pro+ |
invoice.payment_succeeded | Billing | Invoice payment processed | Pro+ |
invoice.payment_failed | Billing | Invoice payment failed | Pro+ |
Endpoint limits by plan
| Plan | Endpoints | Available Events |
|---|---|---|
| Free | 1 | Core events only |
| Starter | 3 | Core + Usage |
| Pro | 5 | All events |
| Scale | 10 | All events |
Payload Format
Every webhook delivery sends a JSON payload in the following envelope format:
{
"id": "evt_a1b2c3d4e5f67890",
"event": "extraction.completed",
"created": 1775059200,
"data": {
"extractionId": "ext_9z8y7x42",
"status": "success",
"confidence": 0.9942,
"documentType": "invoice"
}
}Headers
| Header | Description |
|---|---|
X-DocuExtract-Signature | HMAC-SHA256 signature: sha256=<hex> |
X-DocuExtract-Event | Event type (e.g. extraction.completed) |
X-DocuExtract-Delivery | Unique delivery ID (UUID) for idempotency |
Content-Type | application/json |
User-Agent | DocuExtract-Webhooks/1.0 |
Signature Verification
Every webhook delivery includes an X-DocuExtract-Signature header containing an HMAC-SHA256 signature of the request body using your endpoint's signing secret. Always verify this signature before processing events.
Node.js
const crypto = require('crypto');
function verifyWebhook(rawBody, signatureHeader, secret) {
const expected = 'sha256=' + crypto
.createHmac('sha256', secret)
.update(rawBody)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(signatureHeader)
);
}
// Express example
app.post('/webhook', express.raw({ type: 'application/json' }), (req, res) => {
const signature = req.headers['x-docuextract-signature'];
if (!verifyWebhook(req.body, signature, process.env.WEBHOOK_SECRET)) {
return res.status(401).send('Invalid signature');
}
const event = JSON.parse(req.body);
console.log('Received:', event.event, event.data);
res.status(200).send('OK');
});Python
import hmac
import hashlib
import json
from flask import Flask, request, abort
app = Flask(__name__)
WEBHOOK_SECRET = "whsec_your_signing_secret"
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = "sha256=" + hmac.new(
secret.encode(), payload, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
@app.route("/webhook", methods=["POST"])
def handle_webhook():
signature = request.headers.get("X-DocuExtract-Signature", "")
if not verify_signature(request.data, signature, WEBHOOK_SECRET):
abort(401)
event = json.loads(request.data)
print(f"Received: {event['event']}", event["data"])
return "OK", 200Retry Policy
If your endpoint returns a non-2xx status code or times out, DocuExtract retries the delivery automatically:
| Attempt | Delay | Timeout |
|---|---|---|
| 1st (initial) | Immediate | 10 seconds |
| 2nd retry | ~30 seconds | 10 seconds |
| 3rd retry | ~5 minutes | 10 seconds |
| 4th retry (final) | ~30 minutes | 10 seconds |
After 4 failed attempts, the delivery is marked as failed. You can view delivery history in your dashboard and use the "Send Test" button to verify connectivity.
Best Practices
- Respond quickly. Return a 2xx status within 5 seconds. If you need to do heavy processing, acknowledge the webhook first, then process asynchronously.
- Verify signatures. Always validate the
X-DocuExtract-Signatureheader before processing. Never skip this step. - Handle duplicates. Use the
X-DocuExtract-Deliveryheader (unique UUID per delivery attempt) as an idempotency key. Store processed delivery IDs and skip duplicates. - Use HTTPS only. Webhook endpoints must use HTTPS. HTTP URLs are rejected at registration time.
- Fetch full data separately. Webhook payloads contain summaries. For full extraction results, call
GET /v1/extractions/{id}with theextraction_idfrom the payload. - Monitor delivery health. Check your webhook delivery logs in the dashboard to catch failures early. Use the "Send Test" button after deploying endpoint changes.
Document Types
DocuExtract automatically detects document types, or you can specify one explicitly.
| Type | Description | Key Fields Extracted |
|---|---|---|
invoice | Vendor invoices and billing statements | vendor name, invoice number, dates, line items, totals, payment terms |
receipt | Purchase receipts from retail, restaurants, etc. | merchant name, date, items purchased, subtotal, tax, total, payment method |
bank_statement | Bank and credit card statements | account number, period, opening/closing balance, transactions |
resume | CVs and resumes | name, contact info, work experience, education, skills |
contract | Legal agreements and contracts | parties, effective date, termination date, key obligations, governing law |
form | Filled forms (applications, surveys, intake forms) | all labeled fields and their values |
id_document | ID cards, passports, driver's licenses | name, date of birth, expiry, document number, issuing authority |
unknown | Fallback for unrecognized types | best-effort extraction of all visible structured data |
Error Codes
All errors return a JSON response with an error object containing code and message fields.
{
"error": {
"code": "unauthorized",
"message": "Invalid or missing API key"
}
}4xx Client Errors
Retry-After header. Upgrade your plan for higher limits.5xx Server Errors
Pricing
Simple, transparent pricing. No credit multipliers, no enterprise-gating.
- 5 req/min rate limit
- Haiku model only
- No credit card required
- 30 req/min rate limit
- Haiku + Sonnet (3x cost)
- Email support
- 60 req/min rate limit
- Haiku + Sonnet (3x cost)
- Priority support
- 120 req/min rate limit
- All models + Priority
- SLA + dedicated support
Overage Pricing
When you exceed your monthly quota, additional extractions are billed at per-plan rates: $0.04/call (Starter), $0.025/call (Pro), $0.015/call (Scale). Free plan blocks requests at the limit.
GET /v1/usage or check the X-RateLimit-Remaining-Month header.