DocuExtract API
Send a document, get JSON back. No templates. No training. Works in 5 minutes.
DocuExtract converts unstructured documents — invoices, receipts, contracts, resumes, bank statements — into clean, validated JSON using Claude AI. You send a document (image, PDF, or URL), specify what you want extracted, and receive structured data in seconds.
https://docuextract.dev/v1| Feature | Detail |
|---|---|
| Authentication | Bearer token (API key) |
| Request format | JSON (Content-Type: application/json) |
| Response format | JSON |
| Max file size | 10 MB |
| Supported formats | PDF, PNG, JPG, WEBP (base64 or URL) |
| Default model | Claude Haiku 4.5 (fast) |
| Accurate model | Claude Sonnet 4.6 (complex documents) |
Quick Start
Extract structured data from a document in 3 steps.
Get your API key
Sign up at your dashboard to get your free API key. It looks like dk_live_xxxxxxxxxxxxxxxx.
Make your first extraction
Send a document URL (or base64) to /v1/extract:
curl https://docuextract.dev/v1/extract \ -H "Authorization: Bearer dk_live_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "document": "https://example.com/invoice.pdf", "type": "invoice" }'
const response = await fetch('https://docuextract.dev/v1/extract', { method: 'POST', headers: { 'Authorization': 'Bearer dk_live_YOUR_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ document: 'https://example.com/invoice.pdf', type: 'invoice', }), }); const result = await response.json(); console.log(result.data); // { vendor_name: "Acme Corp", total_amount: 1250.00, ... }
import requests response = requests.post( 'https://docuextract.dev/v1/extract', headers={ 'Authorization': 'Bearer dk_live_YOUR_KEY', 'Content-Type': 'application/json', }, json={ 'document': 'https://example.com/invoice.pdf', 'type': 'invoice', } ) result = response.json() print(result['data']) # {'vendor_name': 'Acme Corp', 'total_amount': 1250.0, ...}
Use the structured data
The response contains the extracted fields, confidence score, and processing metadata:
{
"data": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2024-0847",
"invoice_date": "2024-03-15",
"due_date": "2024-04-15",
"subtotal": 1000.00,
"tax_amount": 250.00,
"total_amount": 1250.00,
"currency": "USD",
"line_items": [
{ "description": "Consulting services", "quantity": 10, "unit_price": 100.00, "total": 1000.00 }
]
},
"metadata": {
"type": "invoice",
"confidence": 0.96,
"model": "claude-haiku-4-5-20251001",
"processing_time_ms": 1847,
"page_count": 1
}
}Authentication
All API endpoints (except GET /v1/health) require authentication via a Bearer token in the Authorization header.
Authorization: Bearer dk_live_xxxxxxxxxxxxxxxxxxxxxxxxAPI Key Format
API keys start with dk_live_ followed by 32 random characters. Keys are generated when you sign up and can be regenerated from your dashboard.
Rate Limit Headers
Every authenticated response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit-Minute | Maximum requests per minute for your plan |
X-RateLimit-Remaining-Minute | Remaining requests this minute |
X-RateLimit-Limit-Month | Maximum extractions per month for your plan |
X-RateLimit-Remaining-Month | Remaining extractions this month |
POST /v1/extract
Extract structured data from a document. This is the core endpoint.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
document | string | Yes | Document as base64-encoded string or a publicly accessible URL |
type | string | No | Document type hint. One of: invoice, receipt, bank_statement, resume, contract, form, id_document. Auto-detected if omitted. |
model | string | No | "fast" (default) or "accurate". Fast uses Claude Haiku; accurate uses Claude Sonnet for complex/multi-page documents. |
schema | object | No | Custom JSON schema describing the fields to extract. When provided, the extraction is guided by your schema. |
Response
{
"data": { /* extracted fields */ },
"metadata": {
"type": "invoice",
"confidence": 0.96,
"model": "claude-haiku-4-5-20251001",
"processing_time_ms": 1847,
"page_count": 1
}
}POST /v1/detect
Detect the type of a document without extracting its data.
Response
{
"type": "invoice",
"confidence": 0.98
}GET /v1/usage
Retrieve your current usage statistics for the billing period.
Response
{
"used": 847,
"limit": 10000,
"plan": "pro",
"period_end": "2024-04-24",
"breakdown": [
{ "date": "2024-03-24", "count": 42 }
]
}GET /v1/health
Health check endpoint. No authentication required.
{ "status": "ok", "version": "1.0.0" }POST /v1/billing/checkout
Create a Stripe Checkout session to subscribe to a paid plan. Returns a URL to redirect your user to for payment.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
plan | string | Yes | Plan to subscribe to. One of: starter, pro, scale |
Response
{ "url": "https://checkout.stripe.com/c/pay/cs_live_..." }Redirect the user to this URL. After payment, Stripe redirects back to your dashboard.
POST /v1/billing/portal
Create a Stripe Billing Portal session for subscription management.
Response
{ "url": "https://billing.stripe.com/p/session/..." }Document Types
DocuExtract automatically detects document types, or you can specify one explicitly.
| Type | Description | Key Fields Extracted |
|---|---|---|
invoice | Vendor invoices and billing statements | vendor name, invoice number, dates, line items, totals, payment terms |
receipt | Purchase receipts from retail, restaurants, etc. | merchant name, date, items purchased, subtotal, tax, total, payment method |
bank_statement | Bank and credit card statements | account number, period, opening/closing balance, transactions |
resume | CVs and resumes | name, contact info, work experience, education, skills |
contract | Legal agreements and contracts | parties, effective date, termination date, key obligations, governing law |
form | Filled forms (applications, surveys, intake forms) | all labeled fields and their values |
id_document | ID cards, passports, driver's licenses | name, date of birth, expiry, document number, issuing authority |
unknown | Fallback for unrecognized types | best-effort extraction of all visible structured data |
Error Codes
All errors return a JSON response with an error object containing code and message fields.
{
"error": {
"code": "unauthorized",
"message": "Invalid or missing API key"
}
}4xx Client Errors
Retry-After header. Upgrade your plan for higher limits.5xx Server Errors
Pricing
Simple, transparent pricing. No credit multipliers, no enterprise-gating.
- 10 req/min rate limit
- Haiku model only
- No credit card required
- 30 req/min rate limit
- Haiku + Sonnet
- Email support
- 60 req/min rate limit
- Haiku + Sonnet
- Priority support
- 120 req/min rate limit
- All models + Priority
- SLA + dedicated support
Overage Pricing
When you exceed your monthly quota, additional extractions are billed at $0.05 per extraction.
GET /v1/usage or check the X-RateLimit-Remaining-Month header.