mawlaia

DocParse

LLM-powered structured extraction — sync for short docs, async queue + webhook for large PDFs.

Built-in schemas

Invoice

invoice_number, vendor, amount, due_date, line_items

Loan Application

applicant, loan_amount, income, employment, credit_score

W-2 Tax Form

employee, employer, tax_year, wages, federal_tax

NDA

parties, effective_date, duration, governing_law

Contract

parties, dates, value, governing_law, termination

Async Jobs

Jobs submitted via POST /v1/doc/jobs. Enter your API key to view recent jobs.

Quick start — sync (short docs)

import httpx

client = httpx.Client(
    base_url="https://api.mawlaia.com",
    headers={"Authorization": "Bearer mwl_live_..."},
)

resp = client.post("/v1/doc/extract", json={
    "schema_name": "invoice",
    "text": "Invoice #INV-2024-001 from Acme Corp...\nTotal due: $1,250.00",
})
print(resp.json())
# {"schema_name": "invoice", "fields": {"invoice_number": {"value": "INV-2024-001", ...}}}

Quick start — async queue (large PDFs)

import time, httpx

client = httpx.Client(
    base_url="https://api.mawlaia.com",
    headers={"Authorization": "Bearer mwl_live_..."},
)

# Submit job — returns immediately with job_id
job = client.post("/v1/doc/jobs", json={
    "schema_name": "contract",
    "text": long_pdf_text,               # pass extracted text from pdfplumber
    "webhook_url": "https://yourapp.com/webhooks/docparse",  # optional
}).json()

# Poll until done
while job["status"] in ("pending", "processing"):
    time.sleep(2)
    job = client.get(f"/v1/doc/jobs/{job['job_id']}").json()

print(job["result"])  # {"parties": {"value": [...], "confidence": 0.97}, ...}

# Or receive via webhook — no polling needed:
# POST https://yourapp.com/webhooks/docparse
# {"job_id": "...", "status": "completed", "schema_name": "contract", "fields": {...}}

Custom fields

resp = client.post("/v1/doc/extract", json={
    "text": "...",
    "fields": [
        {"name": "patient_name", "description": "Full name of the patient", "required": True},
        {"name": "diagnosis",    "description": "Primary diagnosis code (ICD-10)", "type": "string"},
        {"name": "visit_date",   "description": "Date of the visit", "type": "date"},
    ],
})