AI Guides
What Is PaddleOCR? A Simple Guide to OCR and Document AI
Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.
💡Key Takeaways
- Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.
What Is PaddleOCR? A Simple Guide to PaddlePaddle/PaddleOCR for OCR and Document AI
Image extracted from GitHub’s Open Graph preview for the PaddlePaddle/PaddleOCR repository. This image is not SVG.1
Quick summary
PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle. The official repository describes it as a toolkit that turns PDFs and image documents into structured data for AI, supports 100+ languages, and outputs LLM-ready JSON/Markdown data.2
In plain terms: if you have a scanned contract, receipt image, invoice, boarding pass, document screenshot, PDF with tables, or any image containing text, PaddleOCR can detect text, recognize it, return coordinates, and in more advanced pipelines, parse document layout into Markdown or structured JSON.
PaddleOCR is mainly useful for three tasks:
- General OCR: extract text from images or PDFs.
- Document parsing: preserve layout such as headings, paragraphs, tables, formulas, figures and charts.
- AI/RAG pipelines: transform document images into structured data for chatbots, search, agents and retrieval systems.
What problem does PaddleOCR solve?
Business data is often locked in visual documents:
- scanned PDFs;
- photos of forms;
- receipts and invoices;
- contracts;
- financial reports;
- tables inside PDFs;
- slide screenshots;
- multi-column documents;
- handwritten notes;
- documents with seals, formulas and charts.
LLMs work better with structured text than with raw images or scanned PDFs. PaddleOCR bridges that gap:
Image / PDF / scan
↓
PaddleOCR
↓
Text, coordinates, layout, tables, Markdown, JSON
↓
RAG / chatbot / database / automation workflow
What stands out in the PaddleOCR repository?
The GitHub repository has more than 79k stars at the time accessed, is licensed under Apache-2.0, and describes itself as a toolkit for converting PDF/image documents into structured data for AI.2
The README highlights:
- Support for 100+ languages.
- PP-OCRv5 for multilingual OCR.
- PP-StructureV3 for converting complex PDFs/images into Markdown or JSON.
- PaddleOCR-VL, a compact 0.9B VLM series for document parsing.
- Integrations with AI/RAG/agent projects such as Dify, RAGFlow, Pathway and Cherry Studio.
- Deployment across hardware such as NVIDIA GPUs, Intel CPUs, Kunlunxin XPUs and other AI accelerators.2
What PaddleOCR is not
| PaddleOCR is | PaddleOCR is not |
|---|---|
| An OCR and Document AI toolkit | A PDF editor |
| A system for extracting text and layout from images/PDFs | Only a translation tool |
| A CLI, Python API, serving system and MCP integration | Just one model |
| Useful for RAG/LLM/agents | Perfect on every low-quality scan |
| Deployable locally or as a service | Always a tiny lightweight package |
If you only need text from simple images, PP-OCRv5 is usually enough. If you need structure, tables and Markdown from complex documents, look at PP-StructureV3 or PaddleOCR-VL.
Main components
PP-OCRv5
PP-OCRv5 is the general OCR pipeline. The docs define OCR as technology that converts text in images into editable text, and the general OCR pipeline extracts text information from images and outputs text.3
The general OCR pipeline can include:
- document orientation classification;
- document unwarping;
- text line orientation classification;
- text detection;
- text recognition.
The README says PP-OCRv5 supports 100+ languages and improves accuracy by 13% over the previous generation across many scenarios.2
PP-StructureV3
PP-StructureV3 is for document structure, not just text. The docs describe layout analysis as identifying and extracting text blocks, titles, paragraphs, images, tables and other layout elements. PP-StructureV3 improves layout region detection, table recognition, formula recognition, multi-column reading order recovery, chart understanding and Markdown conversion.4
Use PP-StructureV3 for:
- multi-column PDFs;
- reports with tables;
- formulas;
- seals;
- figures and charts;
- research papers;
- contracts;
- RAG pipelines that need Markdown.
PaddleOCR-VL
PaddleOCR-VL is a vision-language model series for document parsing. The docs describe it as a compact 0.9B VLM that supports 109 languages and recognizes complex elements such as text, tables, formulas and charts.5
Use PaddleOCR-VL when:
- documents are complex;
- layout is difficult;
- scans are warped, skewed or photographed;
- output should be Markdown or structured data;
- you have suitable GPU/inference infrastructure.
Installation
Step 1: Create a Python environment
python -m venv .venv
source .venv/bin/activate
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
Step 2: Install an inference engine
The Quick Start says PaddleOCR supports unified inference-engine configuration, with support for PaddlePaddle and Transformers.6
CPU:
python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
Linux GPU example for CUDA 11.8:
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
Transformers engine:
python -m pip install "transformers>=5.8.0"
Step 3: Install paddleocr
Default features:
python -m pip install paddleocr
Full functionality:
python -m pip install "paddleocr[all]"
The Installation docs say the default package covers general OCR and document image preprocessing. Optional groups include doc-parser, ie, trans, doc2md and all.7
For basic image/PDF OCR, start with:
python -m pip install paddleocr
For document parsing, Markdown conversion, information extraction and more pipelines:
python -m pip install "paddleocr[all]"
CLI usage
General OCR:
paddleocr ocr -i ./image.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--engine paddle
Using the Transformers engine:
paddleocr ocr -i ./image.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--engine transformers
Text detection only:
paddleocr text_detection -i ./image.png --engine paddle
Text recognition only:
paddleocr text_recognition -i ./text_crop.png --engine paddle
PP-StructureV3:
paddleocr pp_structurev3 -i ./document.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--engine paddle
These commands follow the official Quick Start examples.8
Python API
Basic OCR:
from paddleocr import PaddleOCR
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
engine="paddle",
)
result = ocr.predict("./image.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
The official Quick Start uses the same PaddleOCR(...).predict() pattern and then calls print(), save_to_img() and save_to_json().9
Simple batch example:
from pathlib import Path
from paddleocr import PaddleOCR
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
engine="paddle",
)
input_dir = Path("scans")
output_dir = Path("ocr_json")
output_dir.mkdir(exist_ok=True)
for img in input_dir.glob("*"):
if img.suffix.lower() not in [".png", ".jpg", ".jpeg", ".pdf"]:
continue
result = ocr.predict(str(img))
for res in result:
res.save_to_json(str(output_dir))
Which pipeline should you choose?
| Need | Recommended option | Main output |
|---|---|---|
| Extract text from a simple image | PP-OCRv5 / paddleocr ocr | text + coordinates + confidence |
| Detect text boxes only | Text Detection module | bounding boxes |
| Recognize cropped text only | Text Recognition module | text |
| Convert complex PDFs to Markdown | PP-StructureV3 | Markdown/structured output |
| Handle tables, formulas and charts | PP-StructureV3 or PaddleOCR-VL | layout + Markdown/JSON |
| Parse difficult multilingual documents | PaddleOCR-VL | VLM-based document parsing |
| Give AI agents OCR tools | MCP Server or Agent Skills | OCR/parsing tools |
| Serve OCR to many apps | PaddleX Serving | HTTP service |
PaddleOCR for RAG and document chatbots
A basic RAG pipeline:
PDF / scanned image
↓
PaddleOCR / PP-StructureV3 / PaddleOCR-VL
↓
Markdown or JSON
↓
Chunking
↓
Embedding
↓
Vector database
↓
Document chatbot
For simple OCR, use PP-OCRv5. For structure-aware Markdown, use PP-StructureV3. For very difficult layouts, test PaddleOCR-VL.
Deploying as an API service
The Serving docs describe serving as a common production deployment method: inference capability is packaged as a service, and clients access it through network requests. Client code can be written in a different language from the server-side implementation.10
The docs recommend PaddleX for serving. Basic serving uses:
paddlex --install serving
paddlex --serve --pipeline OCR
The example server runs with Uvicorn at http://0.0.0.0:8080.11
Deployment shape:
Client app
↓ HTTP
PaddleOCR/PaddleX Serving
↓
GPU/CPU inference
↓
JSON/Markdown result
Use serving when:
- multiple applications need OCR;
- OCR should run on a GPU server;
- OCR should be separated from the main backend;
- clients are written in Java, Go, Node.js or other languages;
- you need independent scaling.
MCP Server for AI agents
PaddleOCR provides an MCP Server for LLM applications. The docs say the lightweight MCP server integrates text recognition, layout parsing and other PaddleOCR capabilities into large-model applications.12
Supported MCP tools/pipelines:
| Pipeline | MCP tool | Description |
|---|---|---|
| OCR | ocr | Detects and recognizes text in images/PDFs |
| PP-StructureV3 | pp_structurev3 | Extracts layout elements and converts to Markdown |
| PaddleOCR-VL | paddleocr_vl | VLM-based layout parsing and Markdown output |
| PaddleOCR-VL-1.5/1.6 | paddleocr_vl | Upgraded VLM pipelines |
Use MCP when you want Claude Desktop, Cursor, OpenClaw or another agent to call OCR/document-parsing tools.
Agent Skills
PaddleOCR also provides official Agent Skills. The docs say these skills package routing rules, calling steps, configuration requirements and best practices so Skills-enabled AI apps can handle OCR and document parsing more reliably.13
Main skills:
| Skill | Use case | Output |
|---|---|---|
paddleocr-text-recognition | Extract plain text from images/PDFs | line-level text, bounding boxes, confidence |
paddleocr-doc-parsing | Preserve headings, paragraphs, tables, formulas and layout | Markdown / structured output |
Install with the skills CLI:
npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-text-recognition -y
npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-doc-parsing -y
If the network is slow:
git clone https://github.com/PaddlePaddle/PaddleOCR.git
npx skills add ./PaddleOCR/skills/paddleocr-text-recognition
npx skills add ./PaddleOCR/skills/paddleocr-doc-parsing
The docs list prerequisites including Python 3.9+, PaddleOCR 3.6.0+ and an AI Studio access token for the skills.14
Personal setup guide
Minimal local setup:
python -m venv .venv
source .venv/bin/activate
python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
python -m pip install paddleocr
Test:
paddleocr ocr -i ./image.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--engine paddle
For document parsing:
python -m pip install "paddleocr[all]"
paddleocr pp_structurev3 -i ./document.png --engine paddle
Team deployment guide
Phase 1: local evaluation
- Select 20–50 real sample documents.
- Test PP-OCRv5 on simple images.
- Test PP-StructureV3 on complex PDFs.
- Compare JSON and Markdown outputs.
- Record issues such as wrong order, missed diacritics, broken tables, bad scans and low confidence.
Phase 2: standardize the pipeline
- Pick the right pipeline for each document type.
- Define input quality requirements: DPI, file size, image format.
- Store raw input, output JSON/Markdown and logs.
- Pin model and package versions.
- Add human review for business-critical output.
Phase 3: deploy a service
- Use PaddleX Serving or your own FastAPI wrapper.
- Run OCR workers separately from the main backend.
- Limit file size, pages and timeout.
- Use a queue for large PDFs.
- Use GPU for high throughput.
- Monitor latency, errors, memory and GPU usage.
Phase 4: integrate RAG and agents
- Store intermediate Markdown for debugging.
- Use MCP or Agent Skills for AI-agent workflows.
- Prefer local/self-hosted processing for sensitive data.
- Delete temporary files after processing.
Production and security notes
OCR often handles sensitive documents such as IDs, invoices, contracts, payroll files and internal reports. For production:
- do not log full OCR content if it includes personal data;
- limit file size and PDF page count;
- run OCR in a container or isolated worker;
- scan uploaded files if they come from the internet;
- restrict folder read/write permissions;
- delete temporary files;
- enforce timeouts;
- do not expose OCR services publicly without authentication;
- use HTTPS and token authentication;
- separate test and production environments;
- audit output if it affects financial/legal decisions;
- do not treat OCR confidence as absolute truth.
When should you use PaddleOCR?
Use PaddleOCR when:
- you need local/offline OCR;
- you need multilingual OCR;
- you process many scanned images or PDFs;
- you need coordinates and confidence scores;
- you need document parsing into Markdown/JSON;
- you build RAG from scanned documents;
- you want OCR tools inside an AI agent;
- you need to self-host an OCR API.
Be careful when:
- scans are extremely low-quality;
- results affect legal or financial decisions;
- there is no human review;
- hardware resources are limited;
- input documents are untrusted and not sandboxed.
PaddleOCR vs MarkItDown
| Criteria | PaddleOCR | MarkItDown |
|---|---|---|
| Main focus | OCR and Document AI for images/PDFs | Converting many file formats to Markdown |
| Strength | text recognition, layout, tables, formulas, parsing | DOCX/PPTX/XLSX/HTML/PDF to Markdown |
| Best input | images, scans, PDFs | digital documents in many formats |
| Output | text, coordinates, JSON, Markdown | Markdown |
| Choose when | document is visual/scanned or needs OCR | document already has extractable text |
They can complement each other: MarkItDown handles many digital formats, while PaddleOCR handles scanned images, OCR and complex visual layouts.
FAQ
What is PaddleOCR?
PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images/PDFs into text, JSON, Markdown and structured data for LLM/RAG workflows.2
Does PaddleOCR support multiple languages?
Yes. The README states that PaddleOCR supports 100+ languages.2
Do I need a GPU?
No for small tests; CPU works. For large documents, high throughput or heavier models, GPU is strongly recommended.
What is the difference between PP-OCRv5 and PP-StructureV3?
PP-OCRv5 focuses on text detection and recognition. PP-StructureV3 focuses on document layout parsing, tables, formulas, images and Markdown/structured output.34
What is PaddleOCR-VL?
PaddleOCR-VL is a compact VLM series for document parsing, supporting many languages and complex elements such as tables, formulas and charts.5
Does PaddleOCR provide MCP support?
Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3 and paddleocr_vl.12
Conclusion
PaddlePaddle/PaddleOCR is a practical repository for OCR and Document AI. Start with paddleocr ocr and the Python PaddleOCR(...).predict() API for simple extraction. Use PP-StructureV3 when you need layout-aware Markdown or structured output. Evaluate PaddleOCR-VL for difficult multilingual documents and complex layouts.
For production, accuracy is only one part of the work. You also need input validation, sandboxing, temporary-file cleanup, authentication, timeouts, version pinning, monitoring and human review for sensitive workflows.
References
Footnotes
-
GitHub Open Graph preview image for
PaddlePaddle/PaddleOCR. https://opengraph.githubassets.com/paddleocr-guide/PaddlePaddle/PaddleOCR ↩ -
GitHub.
PaddlePaddle/PaddleOCR. https://github.com/PaddlePaddle/PaddleOCR ↩ ↩2 ↩3 ↩4 ↩5 ↩6 -
PaddleOCR Documentation. “General OCR Pipeline Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/OCR.html ↩ ↩2
-
PaddleOCR Documentation. “PP-StructureV3 Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PP-StructureV3.html ↩ ↩2
-
PaddleOCR Documentation. “PaddleOCR-VL Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html ↩ ↩2
-
PaddleOCR Documentation. “Quick Start.” https://www.paddleocr.ai/latest/en/quick_start.html ↩
-
PaddleOCR Documentation. “Installation.” https://www.paddleocr.ai/latest/en/version3.x/installation.html ↩
-
PaddleOCR Quick Start, command-line usage examples. https://www.paddleocr.ai/latest/en/quick_start.html ↩
-
PaddleOCR Quick Start, Python script usage examples. https://www.paddleocr.ai/latest/en/quick_start.html ↩
-
PaddleOCR Documentation. “Self-hosted Serving.” https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html ↩
-
PaddleOCR Serving docs, Basic Serving via PaddleX. https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html ↩
-
PaddleOCR Documentation. “MCP Server.” https://www.paddleocr.ai/latest/en/version3.x/integrations/mcp_server.html ↩ ↩2
-
PaddleOCR Documentation. “Agent Skills.” https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html ↩
-
PaddleOCR Agent Skills docs, prerequisites and installation. https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html ↩
Written by PixelRouter Editorial Team
We publish deep, authoritative guides on AI infrastructure, API gateway security, cloud financial management, and system optimizations for developers.
FAQ
What is PaddleOCR?
PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images and PDFs into text, JSON, Markdown, and structured data for LLM and RAG workflows.
Does PaddleOCR support multiple languages?
Yes. The article states that PaddleOCR supports 100+ languages.
Do I need a GPU to use PaddleOCR?
Not for small tests; CPU can work. For large documents, high throughput, or heavier models, the article recommends using a GPU.
What is the difference between PP-OCRv5 and PP-StructureV3?
PP-OCRv5 focuses on text detection and recognition, while PP-StructureV3 focuses on document layout parsing, tables, formulas, images, and Markdown or structured output.
What is PaddleOCR-VL?
PaddleOCR-VL is a compact vision-language model series for document parsing that supports many languages and complex elements such as tables, formulas, and charts.
Does PaddleOCR provide MCP support?
Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3, and paddleocr_vl.
📂Related posts
AI Guides
What Is 9Router? A Simple Guide to AI Coding Provider Routing
A practical guide to decolua/9router, an open-source AI router and proxy for AI coding tools with OpenAI-compatible endpoints, provider routing, fallback combos, RTK token saving, dashboard setup, Docker deployment, and security notes.
AI Guides
What Is OmniVoice? A Simple Guide to Multilingual TTS and Voice Cloning
A beginner-friendly guide to k2-fsa/OmniVoice, covering multilingual zero-shot text-to-speech, voice cloning, voice design, installation, Python and CLI usage, batch inference, deployment patterns, and voice-safety notes.
AI Guides
What Is Claude Tap? A Simple Guide to AI Agent Trace Debugging
Learn what liaohch3/claude-tap is, how it works as a local proxy and trace viewer for AI coding agents, and how it helps inspect prompts, tools, token usage, request diffs, exports, proxy modes, and security considerations.