An intelligent document processing API & UI powered by the Qwen3-VL multimodal model. It extracts structured data from scanned forms, invoices, and documents, ignoring boilerplate text and focusing on business values.
- Strict Data Extraction: Ignores legal text and instructions, extracts only values.
- Smart Formatting: Converts tables and forms into structured JSON.
- Visual Verification: Interactive UI highlights extracted fields on the image.
- Multiple Exports: Download results as JSON, CSV, or Excel.
- Dual Mode: Works as a Web UI (Gradio) and a REST API (FastAPI) simultaneously.
-
Clone the repository:
git clone https://github.com/ituvtu/qwen-doc-parser.git cd qwen-doc-parser -
Сreate a virtual environment and install dependencies:
python -m venv .venv # Windows: .venv\\Scripts\\activate # Mac/Linux: source .venv/bin/activate pip install -r requirements.txt
-
Set up environment variables: Copy .env.example to .env and add your Hugging Face Token:
HF_TOKEN=your_token_here
-
▶️ UsageRun with Docker (Recommended)
Option A: Using .env file (Best for security)
docker build -t qwen-doc-parser . docker run -p 7860:7860 --env-file .env qwen-doc-parserOption B: Passing token directly
docker run -p 7860:7860 -e HF_TOKEN=hf_YourTokenHere qwen-doc-parser
Run Locally
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload
Open your browser at http://localhost:7860.
You can use the API to extract data programmatically:
curl -X POST "http://localhost:7860/api/v1/extract" \\
-H "accept: application/json" \\
-H "Content-Type: multipart/form-data" \\
-F "file=@/path/to/invoice.jpg"The project includes a Python script to verify the API functionality immediately.
- Open
test_api.pyand update theIMAGE_PATHvariable to point to your test image. - Run the script:
python test_api.py