AI Guides

What Is PaddleOCR? A Simple Guide to OCR and Document AI

Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.

Published: Jun 4, 2026Updated: Jun 4, 2026Reading time: 10 minViews: 0

PaddleOCROCRDocument AIRAGPDF OCRPaddlePaddle

💡Key Takeaways

Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.

What Is PaddleOCR? A Simple Guide to PaddlePaddle/PaddleOCR for OCR and Document AI

GitHub Open Graph preview for PaddlePaddle/PaddleOCR

Image extracted from GitHub’s Open Graph preview for the PaddlePaddle/PaddleOCR repository. This image is not SVG.¹

Quick summary

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle. The official repository describes it as a toolkit that turns PDFs and image documents into structured data for AI, supports 100+ languages, and outputs LLM-ready JSON/Markdown data.²

In plain terms: if you have a scanned contract, receipt image, invoice, boarding pass, document screenshot, PDF with tables, or any image containing text, PaddleOCR can detect text, recognize it, return coordinates, and in more advanced pipelines, parse document layout into Markdown or structured JSON.

PaddleOCR is mainly useful for three tasks:

General OCR: extract text from images or PDFs.
Document parsing: preserve layout such as headings, paragraphs, tables, formulas, figures and charts.
AI/RAG pipelines: transform document images into structured data for chatbots, search, agents and retrieval systems.

What problem does PaddleOCR solve?

Business data is often locked in visual documents:

scanned PDFs;
photos of forms;
receipts and invoices;
contracts;
financial reports;
tables inside PDFs;
slide screenshots;
multi-column documents;
handwritten notes;
documents with seals, formulas and charts.

LLMs work better with structured text than with raw images or scanned PDFs. PaddleOCR bridges that gap:

Image / PDF / scan
        ↓
PaddleOCR
        ↓
Text, coordinates, layout, tables, Markdown, JSON
        ↓
RAG / chatbot / database / automation workflow

What stands out in the PaddleOCR repository?

The GitHub repository has more than 79k stars at the time accessed, is licensed under Apache-2.0, and describes itself as a toolkit for converting PDF/image documents into structured data for AI.²

The README highlights:

Support for 100+ languages.
PP-OCRv5 for multilingual OCR.
PP-StructureV3 for converting complex PDFs/images into Markdown or JSON.
PaddleOCR-VL, a compact 0.9B VLM series for document parsing.
Integrations with AI/RAG/agent projects such as Dify, RAGFlow, Pathway and Cherry Studio.
Deployment across hardware such as NVIDIA GPUs, Intel CPUs, Kunlunxin XPUs and other AI accelerators.²

What PaddleOCR is not

PaddleOCR is	PaddleOCR is not
An OCR and Document AI toolkit	A PDF editor
A system for extracting text and layout from images/PDFs	Only a translation tool
A CLI, Python API, serving system and MCP integration	Just one model
Useful for RAG/LLM/agents	Perfect on every low-quality scan
Deployable locally or as a service	Always a tiny lightweight package

If you only need text from simple images, PP-OCRv5 is usually enough. If you need structure, tables and Markdown from complex documents, look at PP-StructureV3 or PaddleOCR-VL.

Main components

PP-OCRv5

PP-OCRv5 is the general OCR pipeline. The docs define OCR as technology that converts text in images into editable text, and the general OCR pipeline extracts text information from images and outputs text.³

The general OCR pipeline can include:

document orientation classification;
document unwarping;
text line orientation classification;
text detection;
text recognition.

The README says PP-OCRv5 supports 100+ languages and improves accuracy by 13% over the previous generation across many scenarios.²

PP-StructureV3

PP-StructureV3 is for document structure, not just text. The docs describe layout analysis as identifying and extracting text blocks, titles, paragraphs, images, tables and other layout elements. PP-StructureV3 improves layout region detection, table recognition, formula recognition, multi-column reading order recovery, chart understanding and Markdown conversion.⁴

Use PP-StructureV3 for:

multi-column PDFs;
reports with tables;
formulas;
seals;
figures and charts;
research papers;
contracts;
RAG pipelines that need Markdown.

PaddleOCR-VL

PaddleOCR-VL is a vision-language model series for document parsing. The docs describe it as a compact 0.9B VLM that supports 109 languages and recognizes complex elements such as text, tables, formulas and charts.⁵

Use PaddleOCR-VL when:

documents are complex;
layout is difficult;
scans are warped, skewed or photographed;
output should be Markdown or structured data;
you have suitable GPU/inference infrastructure.

Installation

Step 1: Create a Python environment

python -m venv .venv
source .venv/bin/activate

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Step 2: Install an inference engine

The Quick Start says PaddleOCR supports unified inference-engine configuration, with support for PaddlePaddle and Transformers.⁶

CPU:

python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Linux GPU example for CUDA 11.8:

python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

Transformers engine:

python -m pip install "transformers>=5.8.0"

Step 3: Install paddleocr

Default features:

python -m pip install paddleocr

Full functionality:

python -m pip install "paddleocr[all]"

The Installation docs say the default package covers general OCR and document image preprocessing. Optional groups include doc-parser, ie, trans, doc2md and all.⁷

For basic image/PDF OCR, start with:

python -m pip install paddleocr

For document parsing, Markdown conversion, information extraction and more pipelines:

python -m pip install "paddleocr[all]"

CLI usage

General OCR:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine paddle

Using the Transformers engine:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine transformers

Text detection only:

paddleocr text_detection -i ./image.png --engine paddle

Text recognition only:

paddleocr text_recognition -i ./text_crop.png --engine paddle

PP-StructureV3:

paddleocr pp_structurev3 -i ./document.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --engine paddle

These commands follow the official Quick Start examples.⁸

Python API

Basic OCR:

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle",
)

result = ocr.predict("./image.png")

for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

The official Quick Start uses the same PaddleOCR(...).predict() pattern and then calls print(), save_to_img() and save_to_json().⁹

Simple batch example:

from pathlib import Path
from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle",
)

input_dir = Path("scans")
output_dir = Path("ocr_json")
output_dir.mkdir(exist_ok=True)

for img in input_dir.glob("*"):
    if img.suffix.lower() not in [".png", ".jpg", ".jpeg", ".pdf"]:
        continue

    result = ocr.predict(str(img))
    for res in result:
        res.save_to_json(str(output_dir))

Which pipeline should you choose?

Need	Recommended option	Main output
Extract text from a simple image	PP-OCRv5 / `paddleocr ocr`	text + coordinates + confidence
Detect text boxes only	Text Detection module	bounding boxes
Recognize cropped text only	Text Recognition module	text
Convert complex PDFs to Markdown	PP-StructureV3	Markdown/structured output
Handle tables, formulas and charts	PP-StructureV3 or PaddleOCR-VL	layout + Markdown/JSON
Parse difficult multilingual documents	PaddleOCR-VL	VLM-based document parsing
Give AI agents OCR tools	MCP Server or Agent Skills	OCR/parsing tools
Serve OCR to many apps	PaddleX Serving	HTTP service

PaddleOCR for RAG and document chatbots

A basic RAG pipeline:

PDF / scanned image
    ↓
PaddleOCR / PP-StructureV3 / PaddleOCR-VL
    ↓
Markdown or JSON
    ↓
Chunking
    ↓
Embedding
    ↓
Vector database
    ↓
Document chatbot

For simple OCR, use PP-OCRv5. For structure-aware Markdown, use PP-StructureV3. For very difficult layouts, test PaddleOCR-VL.

Deploying as an API service

The Serving docs describe serving as a common production deployment method: inference capability is packaged as a service, and clients access it through network requests. Client code can be written in a different language from the server-side implementation.¹⁰

The docs recommend PaddleX for serving. Basic serving uses:

paddlex --install serving
paddlex --serve --pipeline OCR

The example server runs with Uvicorn at http://0.0.0.0:8080.¹¹

Deployment shape:

Client app
  ↓ HTTP
PaddleOCR/PaddleX Serving
  ↓
GPU/CPU inference
  ↓
JSON/Markdown result

Use serving when:

multiple applications need OCR;
OCR should run on a GPU server;
OCR should be separated from the main backend;
clients are written in Java, Go, Node.js or other languages;
you need independent scaling.

MCP Server for AI agents

PaddleOCR provides an MCP Server for LLM applications. The docs say the lightweight MCP server integrates text recognition, layout parsing and other PaddleOCR capabilities into large-model applications.¹²

Supported MCP tools/pipelines:

Pipeline	MCP tool	Description
OCR	`ocr`	Detects and recognizes text in images/PDFs
PP-StructureV3	`pp_structurev3`	Extracts layout elements and converts to Markdown
PaddleOCR-VL	`paddleocr_vl`	VLM-based layout parsing and Markdown output
PaddleOCR-VL-1.5/1.6	`paddleocr_vl`	Upgraded VLM pipelines

Use MCP when you want Claude Desktop, Cursor, OpenClaw or another agent to call OCR/document-parsing tools.

Agent Skills

PaddleOCR also provides official Agent Skills. The docs say these skills package routing rules, calling steps, configuration requirements and best practices so Skills-enabled AI apps can handle OCR and document parsing more reliably.¹³

Main skills:

Skill	Use case	Output
`paddleocr-text-recognition`	Extract plain text from images/PDFs	line-level text, bounding boxes, confidence
`paddleocr-doc-parsing`	Preserve headings, paragraphs, tables, formulas and layout	Markdown / structured output

Install with the skills CLI:

npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-text-recognition -y
npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-doc-parsing -y

If the network is slow:

git clone https://github.com/PaddlePaddle/PaddleOCR.git
npx skills add ./PaddleOCR/skills/paddleocr-text-recognition
npx skills add ./PaddleOCR/skills/paddleocr-doc-parsing

The docs list prerequisites including Python 3.9+, PaddleOCR 3.6.0+ and an AI Studio access token for the skills.¹⁴

Personal setup guide

Minimal local setup:

python -m venv .venv
source .venv/bin/activate

python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
python -m pip install paddleocr

Test:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine paddle

For document parsing:

python -m pip install "paddleocr[all]"
paddleocr pp_structurev3 -i ./document.png --engine paddle

Team deployment guide

Phase 1: local evaluation

Select 20–50 real sample documents.
Test PP-OCRv5 on simple images.
Test PP-StructureV3 on complex PDFs.
Compare JSON and Markdown outputs.
Record issues such as wrong order, missed diacritics, broken tables, bad scans and low confidence.

Phase 2: standardize the pipeline

Pick the right pipeline for each document type.
Define input quality requirements: DPI, file size, image format.
Store raw input, output JSON/Markdown and logs.
Pin model and package versions.
Add human review for business-critical output.

Phase 3: deploy a service

Use PaddleX Serving or your own FastAPI wrapper.
Run OCR workers separately from the main backend.
Limit file size, pages and timeout.
Use a queue for large PDFs.
Use GPU for high throughput.
Monitor latency, errors, memory and GPU usage.

Phase 4: integrate RAG and agents

Store intermediate Markdown for debugging.
Use MCP or Agent Skills for AI-agent workflows.
Prefer local/self-hosted processing for sensitive data.
Delete temporary files after processing.

Production and security notes

OCR often handles sensitive documents such as IDs, invoices, contracts, payroll files and internal reports. For production:

do not log full OCR content if it includes personal data;
limit file size and PDF page count;
run OCR in a container or isolated worker;
scan uploaded files if they come from the internet;
restrict folder read/write permissions;
delete temporary files;
enforce timeouts;
do not expose OCR services publicly without authentication;
use HTTPS and token authentication;
separate test and production environments;
audit output if it affects financial/legal decisions;
do not treat OCR confidence as absolute truth.

When should you use PaddleOCR?

Use PaddleOCR when:

you need local/offline OCR;
you need multilingual OCR;
you process many scanned images or PDFs;
you need coordinates and confidence scores;
you need document parsing into Markdown/JSON;
you build RAG from scanned documents;
you want OCR tools inside an AI agent;
you need to self-host an OCR API.

Be careful when:

scans are extremely low-quality;
results affect legal or financial decisions;
there is no human review;
hardware resources are limited;
input documents are untrusted and not sandboxed.

PaddleOCR vs MarkItDown

Criteria	PaddleOCR	MarkItDown
Main focus	OCR and Document AI for images/PDFs	Converting many file formats to Markdown
Strength	text recognition, layout, tables, formulas, parsing	DOCX/PPTX/XLSX/HTML/PDF to Markdown
Best input	images, scans, PDFs	digital documents in many formats
Output	text, coordinates, JSON, Markdown	Markdown
Choose when	document is visual/scanned or needs OCR	document already has extractable text

They can complement each other: MarkItDown handles many digital formats, while PaddleOCR handles scanned images, OCR and complex visual layouts.

FAQ

What is PaddleOCR?

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images/PDFs into text, JSON, Markdown and structured data for LLM/RAG workflows.²

Does PaddleOCR support multiple languages?

Yes. The README states that PaddleOCR supports 100+ languages.²

Do I need a GPU?

No for small tests; CPU works. For large documents, high throughput or heavier models, GPU is strongly recommended.

What is the difference between PP-OCRv5 and PP-StructureV3?

PP-OCRv5 focuses on text detection and recognition. PP-StructureV3 focuses on document layout parsing, tables, formulas, images and Markdown/structured output.³⁴

What is PaddleOCR-VL?

PaddleOCR-VL is a compact VLM series for document parsing, supporting many languages and complex elements such as tables, formulas and charts.⁵

Does PaddleOCR provide MCP support?

Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3 and paddleocr_vl.¹²

Conclusion

PaddlePaddle/PaddleOCR is a practical repository for OCR and Document AI. Start with paddleocr ocr and the Python PaddleOCR(...).predict() API for simple extraction. Use PP-StructureV3 when you need layout-aware Markdown or structured output. Evaluate PaddleOCR-VL for difficult multilingual documents and complex layouts.

For production, accuracy is only one part of the work. You also need input validation, sandboxing, temporary-file cleanup, authentication, timeouts, version pinning, monitoring and human review for sensitive workflows.

References

Footnotes

GitHub Open Graph preview image for PaddlePaddle/PaddleOCR. https://opengraph.githubassets.com/paddleocr-guide/PaddlePaddle/PaddleOCR ↩
GitHub. PaddlePaddle/PaddleOCR. https://github.com/PaddlePaddle/PaddleOCR ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
PaddleOCR Documentation. “General OCR Pipeline Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/OCR.html ↩ ↩²
PaddleOCR Documentation. “PP-StructureV3 Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PP-StructureV3.html ↩ ↩²
PaddleOCR Documentation. “PaddleOCR-VL Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html ↩ ↩²
PaddleOCR Documentation. “Quick Start.” https://www.paddleocr.ai/latest/en/quick_start.html ↩
PaddleOCR Documentation. “Installation.” https://www.paddleocr.ai/latest/en/version3.x/installation.html ↩
PaddleOCR Quick Start, command-line usage examples. https://www.paddleocr.ai/latest/en/quick_start.html ↩
PaddleOCR Quick Start, Python script usage examples. https://www.paddleocr.ai/latest/en/quick_start.html ↩
PaddleOCR Documentation. “Self-hosted Serving.” https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html ↩
PaddleOCR Serving docs, Basic Serving via PaddleX. https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html ↩
PaddleOCR Documentation. “MCP Server.” https://www.paddleocr.ai/latest/en/version3.x/integrations/mcp_server.html ↩ ↩²
PaddleOCR Documentation. “Agent Skills.” https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html ↩
PaddleOCR Agent Skills docs, prerequisites and installation. https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html ↩

Written by PixelRouter Editorial Team

We publish deep, authoritative guides on AI infrastructure, API gateway security, cloud financial management, and system optimizations for developers.

FAQ

What is PaddleOCR?

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images and PDFs into text, JSON, Markdown, and structured data for LLM and RAG workflows.

Does PaddleOCR support multiple languages?

Yes. The article states that PaddleOCR supports 100+ languages.

Do I need a GPU to use PaddleOCR?

Not for small tests; CPU can work. For large documents, high throughput, or heavier models, the article recommends using a GPU.

What is the difference between PP-OCRv5 and PP-StructureV3?

PP-OCRv5 focuses on text detection and recognition, while PP-StructureV3 focuses on document layout parsing, tables, formulas, images, and Markdown or structured output.

What is PaddleOCR-VL?

PaddleOCR-VL is a compact vision-language model series for document parsing that supports many languages and complex elements such as tables, formulas, and charts.

Does PaddleOCR provide MCP support?

Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3, and paddleocr_vl.

📂Related posts

AI Guides

What Is 9Router? A Simple Guide to AI Coding Provider Routing

A practical guide to decolua/9router, an open-source AI router and proxy for AI coding tools with OpenAI-compatible endpoints, provider routing, fallback combos, RTK token saving, dashboard setup, Docker deployment, and security notes.

👁 011 min

AI Guides

What Is OmniVoice? A Simple Guide to Multilingual TTS and Voice Cloning

A beginner-friendly guide to k2-fsa/OmniVoice, covering multilingual zero-shot text-to-speech, voice cloning, voice design, installation, Python and CLI usage, batch inference, deployment patterns, and voice-safety notes.

👁 011 min

AI Guides

What Is Claude Tap? A Simple Guide to AI Agent Trace Debugging

Learn what liaohch3/claude-tap is, how it works as a local proxy and trace viewer for AI coding agents, and how it helps inspect prompts, tools, token usage, request diffs, exports, proxy modes, and security considerations.

👁 012 min

← PixelRouter Blog