DocParse
LLM-powered structured extraction — sync for short docs, async queue + webhook for large PDFs.
Built-in schemas
Invoice
invoice_number, vendor, amount, due_date, line_items
Loan Application
applicant, loan_amount, income, employment, credit_score
W-2 Tax Form
employee, employer, tax_year, wages, federal_tax
NDA
parties, effective_date, duration, governing_law
Contract
parties, dates, value, governing_law, termination
Async Jobs
Jobs submitted via POST /v1/doc/jobs. Enter your API key to view recent jobs.
Quick start — sync (short docs)
import httpx
client = httpx.Client(
base_url="https://api.mawlaia.com",
headers={"Authorization": "Bearer mwl_live_..."},
)
resp = client.post("/v1/doc/extract", json={
"schema_name": "invoice",
"text": "Invoice #INV-2024-001 from Acme Corp...\nTotal due: $1,250.00",
})
print(resp.json())
# {"schema_name": "invoice", "fields": {"invoice_number": {"value": "INV-2024-001", ...}}}Quick start — async queue (large PDFs)
import time, httpx
client = httpx.Client(
base_url="https://api.mawlaia.com",
headers={"Authorization": "Bearer mwl_live_..."},
)
# Submit job — returns immediately with job_id
job = client.post("/v1/doc/jobs", json={
"schema_name": "contract",
"text": long_pdf_text, # pass extracted text from pdfplumber
"webhook_url": "https://yourapp.com/webhooks/docparse", # optional
}).json()
# Poll until done
while job["status"] in ("pending", "processing"):
time.sleep(2)
job = client.get(f"/v1/doc/jobs/{job['job_id']}").json()
print(job["result"]) # {"parties": {"value": [...], "confidence": 0.97}, ...}
# Or receive via webhook — no polling needed:
# POST https://yourapp.com/webhooks/docparse
# {"job_id": "...", "status": "completed", "schema_name": "contract", "fields": {...}}Custom fields
resp = client.post("/v1/doc/extract", json={
"text": "...",
"fields": [
{"name": "patient_name", "description": "Full name of the patient", "required": True},
{"name": "diagnosis", "description": "Primary diagnosis code (ICD-10)", "type": "string"},
{"name": "visit_date", "description": "Date of the visit", "type": "date"},
],
})