AI Guides

What Is PaddleOCR? A Simple Guide to OCR and Document AI

Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.

Published: Jun 4, 2026Updated: Jun 4, 2026Reading time: 10 minViews: 0
PaddleOCROCRDocument AIRAGPDF OCRPaddlePaddle

💡Key Takeaways

  • Learn what PaddleOCR is, how it extracts text and structure from images and PDFs, and when to use PP-OCRv5, PP-StructureV3, PaddleOCR-VL, MCP, and serving for OCR and Document AI workflows.

What Is PaddleOCR? A Simple Guide to PaddlePaddle/PaddleOCR for OCR and Document AI

GitHub Open Graph preview for PaddlePaddle/PaddleOCR
GitHub Open Graph preview for PaddlePaddle/PaddleOCR

Image extracted from GitHub’s Open Graph preview for the PaddlePaddle/PaddleOCR repository. This image is not SVG.1

Quick summary

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle. The official repository describes it as a toolkit that turns PDFs and image documents into structured data for AI, supports 100+ languages, and outputs LLM-ready JSON/Markdown data.2

In plain terms: if you have a scanned contract, receipt image, invoice, boarding pass, document screenshot, PDF with tables, or any image containing text, PaddleOCR can detect text, recognize it, return coordinates, and in more advanced pipelines, parse document layout into Markdown or structured JSON.

PaddleOCR is mainly useful for three tasks:

  • General OCR: extract text from images or PDFs.
  • Document parsing: preserve layout such as headings, paragraphs, tables, formulas, figures and charts.
  • AI/RAG pipelines: transform document images into structured data for chatbots, search, agents and retrieval systems.

What problem does PaddleOCR solve?

Business data is often locked in visual documents:

  • scanned PDFs;
  • photos of forms;
  • receipts and invoices;
  • contracts;
  • financial reports;
  • tables inside PDFs;
  • slide screenshots;
  • multi-column documents;
  • handwritten notes;
  • documents with seals, formulas and charts.

LLMs work better with structured text than with raw images or scanned PDFs. PaddleOCR bridges that gap:

Image / PDF / scan
        ↓
PaddleOCR
        ↓
Text, coordinates, layout, tables, Markdown, JSON
        ↓
RAG / chatbot / database / automation workflow

What stands out in the PaddleOCR repository?

The GitHub repository has more than 79k stars at the time accessed, is licensed under Apache-2.0, and describes itself as a toolkit for converting PDF/image documents into structured data for AI.2

The README highlights:

  • Support for 100+ languages.
  • PP-OCRv5 for multilingual OCR.
  • PP-StructureV3 for converting complex PDFs/images into Markdown or JSON.
  • PaddleOCR-VL, a compact 0.9B VLM series for document parsing.
  • Integrations with AI/RAG/agent projects such as Dify, RAGFlow, Pathway and Cherry Studio.
  • Deployment across hardware such as NVIDIA GPUs, Intel CPUs, Kunlunxin XPUs and other AI accelerators.2

What PaddleOCR is not

PaddleOCR isPaddleOCR is not
An OCR and Document AI toolkitA PDF editor
A system for extracting text and layout from images/PDFsOnly a translation tool
A CLI, Python API, serving system and MCP integrationJust one model
Useful for RAG/LLM/agentsPerfect on every low-quality scan
Deployable locally or as a serviceAlways a tiny lightweight package

If you only need text from simple images, PP-OCRv5 is usually enough. If you need structure, tables and Markdown from complex documents, look at PP-StructureV3 or PaddleOCR-VL.

Main components

PP-OCRv5

PP-OCRv5 is the general OCR pipeline. The docs define OCR as technology that converts text in images into editable text, and the general OCR pipeline extracts text information from images and outputs text.3

The general OCR pipeline can include:

  • document orientation classification;
  • document unwarping;
  • text line orientation classification;
  • text detection;
  • text recognition.

The README says PP-OCRv5 supports 100+ languages and improves accuracy by 13% over the previous generation across many scenarios.2

PP-StructureV3

PP-StructureV3 is for document structure, not just text. The docs describe layout analysis as identifying and extracting text blocks, titles, paragraphs, images, tables and other layout elements. PP-StructureV3 improves layout region detection, table recognition, formula recognition, multi-column reading order recovery, chart understanding and Markdown conversion.4

Use PP-StructureV3 for:

  • multi-column PDFs;
  • reports with tables;
  • formulas;
  • seals;
  • figures and charts;
  • research papers;
  • contracts;
  • RAG pipelines that need Markdown.

PaddleOCR-VL

PaddleOCR-VL is a vision-language model series for document parsing. The docs describe it as a compact 0.9B VLM that supports 109 languages and recognizes complex elements such as text, tables, formulas and charts.5

Use PaddleOCR-VL when:

  • documents are complex;
  • layout is difficult;
  • scans are warped, skewed or photographed;
  • output should be Markdown or structured data;
  • you have suitable GPU/inference infrastructure.

Installation

Step 1: Create a Python environment

python -m venv .venv
source .venv/bin/activate

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Step 2: Install an inference engine

The Quick Start says PaddleOCR supports unified inference-engine configuration, with support for PaddlePaddle and Transformers.6

CPU:

python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Linux GPU example for CUDA 11.8:

python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

Transformers engine:

python -m pip install "transformers>=5.8.0"

Step 3: Install paddleocr

Default features:

python -m pip install paddleocr

Full functionality:

python -m pip install "paddleocr[all]"

The Installation docs say the default package covers general OCR and document image preprocessing. Optional groups include doc-parser, ie, trans, doc2md and all.7

For basic image/PDF OCR, start with:

python -m pip install paddleocr

For document parsing, Markdown conversion, information extraction and more pipelines:

python -m pip install "paddleocr[all]"

CLI usage

General OCR:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine paddle

Using the Transformers engine:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine transformers

Text detection only:

paddleocr text_detection -i ./image.png --engine paddle

Text recognition only:

paddleocr text_recognition -i ./text_crop.png --engine paddle

PP-StructureV3:

paddleocr pp_structurev3 -i ./document.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --engine paddle

These commands follow the official Quick Start examples.8

Python API

Basic OCR:

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle",
)

result = ocr.predict("./image.png")

for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

The official Quick Start uses the same PaddleOCR(...).predict() pattern and then calls print(), save_to_img() and save_to_json().9

Simple batch example:

from pathlib import Path
from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle",
)

input_dir = Path("scans")
output_dir = Path("ocr_json")
output_dir.mkdir(exist_ok=True)

for img in input_dir.glob("*"):
    if img.suffix.lower() not in [".png", ".jpg", ".jpeg", ".pdf"]:
        continue

    result = ocr.predict(str(img))
    for res in result:
        res.save_to_json(str(output_dir))

Which pipeline should you choose?

NeedRecommended optionMain output
Extract text from a simple imagePP-OCRv5 / paddleocr ocrtext + coordinates + confidence
Detect text boxes onlyText Detection modulebounding boxes
Recognize cropped text onlyText Recognition moduletext
Convert complex PDFs to MarkdownPP-StructureV3Markdown/structured output
Handle tables, formulas and chartsPP-StructureV3 or PaddleOCR-VLlayout + Markdown/JSON
Parse difficult multilingual documentsPaddleOCR-VLVLM-based document parsing
Give AI agents OCR toolsMCP Server or Agent SkillsOCR/parsing tools
Serve OCR to many appsPaddleX ServingHTTP service

PaddleOCR for RAG and document chatbots

A basic RAG pipeline:

PDF / scanned image
    ↓
PaddleOCR / PP-StructureV3 / PaddleOCR-VL
    ↓
Markdown or JSON
    ↓
Chunking
    ↓
Embedding
    ↓
Vector database
    ↓
Document chatbot

For simple OCR, use PP-OCRv5. For structure-aware Markdown, use PP-StructureV3. For very difficult layouts, test PaddleOCR-VL.

Deploying as an API service

The Serving docs describe serving as a common production deployment method: inference capability is packaged as a service, and clients access it through network requests. Client code can be written in a different language from the server-side implementation.10

The docs recommend PaddleX for serving. Basic serving uses:

paddlex --install serving
paddlex --serve --pipeline OCR

The example server runs with Uvicorn at http://0.0.0.0:8080.11

Deployment shape:

Client app
  ↓ HTTP
PaddleOCR/PaddleX Serving
  ↓
GPU/CPU inference
  ↓
JSON/Markdown result

Use serving when:

  • multiple applications need OCR;
  • OCR should run on a GPU server;
  • OCR should be separated from the main backend;
  • clients are written in Java, Go, Node.js or other languages;
  • you need independent scaling.

MCP Server for AI agents

PaddleOCR provides an MCP Server for LLM applications. The docs say the lightweight MCP server integrates text recognition, layout parsing and other PaddleOCR capabilities into large-model applications.12

Supported MCP tools/pipelines:

PipelineMCP toolDescription
OCRocrDetects and recognizes text in images/PDFs
PP-StructureV3pp_structurev3Extracts layout elements and converts to Markdown
PaddleOCR-VLpaddleocr_vlVLM-based layout parsing and Markdown output
PaddleOCR-VL-1.5/1.6paddleocr_vlUpgraded VLM pipelines

Use MCP when you want Claude Desktop, Cursor, OpenClaw or another agent to call OCR/document-parsing tools.

Agent Skills

PaddleOCR also provides official Agent Skills. The docs say these skills package routing rules, calling steps, configuration requirements and best practices so Skills-enabled AI apps can handle OCR and document parsing more reliably.13

Main skills:

SkillUse caseOutput
paddleocr-text-recognitionExtract plain text from images/PDFsline-level text, bounding boxes, confidence
paddleocr-doc-parsingPreserve headings, paragraphs, tables, formulas and layoutMarkdown / structured output

Install with the skills CLI:

npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-text-recognition -y
npx skills add PaddlePaddle/PaddleOCR -g --skill paddleocr-doc-parsing -y

If the network is slow:

git clone https://github.com/PaddlePaddle/PaddleOCR.git
npx skills add ./PaddleOCR/skills/paddleocr-text-recognition
npx skills add ./PaddleOCR/skills/paddleocr-doc-parsing

The docs list prerequisites including Python 3.9+, PaddleOCR 3.6.0+ and an AI Studio access token for the skills.14

Personal setup guide

Minimal local setup:

python -m venv .venv
source .venv/bin/activate

python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
python -m pip install paddleocr

Test:

paddleocr ocr -i ./image.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --engine paddle

For document parsing:

python -m pip install "paddleocr[all]"
paddleocr pp_structurev3 -i ./document.png --engine paddle

Team deployment guide

Phase 1: local evaluation

  • Select 20–50 real sample documents.
  • Test PP-OCRv5 on simple images.
  • Test PP-StructureV3 on complex PDFs.
  • Compare JSON and Markdown outputs.
  • Record issues such as wrong order, missed diacritics, broken tables, bad scans and low confidence.

Phase 2: standardize the pipeline

  • Pick the right pipeline for each document type.
  • Define input quality requirements: DPI, file size, image format.
  • Store raw input, output JSON/Markdown and logs.
  • Pin model and package versions.
  • Add human review for business-critical output.

Phase 3: deploy a service

  • Use PaddleX Serving or your own FastAPI wrapper.
  • Run OCR workers separately from the main backend.
  • Limit file size, pages and timeout.
  • Use a queue for large PDFs.
  • Use GPU for high throughput.
  • Monitor latency, errors, memory and GPU usage.

Phase 4: integrate RAG and agents

  • Store intermediate Markdown for debugging.
  • Use MCP or Agent Skills for AI-agent workflows.
  • Prefer local/self-hosted processing for sensitive data.
  • Delete temporary files after processing.

Production and security notes

OCR often handles sensitive documents such as IDs, invoices, contracts, payroll files and internal reports. For production:

  • do not log full OCR content if it includes personal data;
  • limit file size and PDF page count;
  • run OCR in a container or isolated worker;
  • scan uploaded files if they come from the internet;
  • restrict folder read/write permissions;
  • delete temporary files;
  • enforce timeouts;
  • do not expose OCR services publicly without authentication;
  • use HTTPS and token authentication;
  • separate test and production environments;
  • audit output if it affects financial/legal decisions;
  • do not treat OCR confidence as absolute truth.

When should you use PaddleOCR?

Use PaddleOCR when:

  • you need local/offline OCR;
  • you need multilingual OCR;
  • you process many scanned images or PDFs;
  • you need coordinates and confidence scores;
  • you need document parsing into Markdown/JSON;
  • you build RAG from scanned documents;
  • you want OCR tools inside an AI agent;
  • you need to self-host an OCR API.

Be careful when:

  • scans are extremely low-quality;
  • results affect legal or financial decisions;
  • there is no human review;
  • hardware resources are limited;
  • input documents are untrusted and not sandboxed.

PaddleOCR vs MarkItDown

CriteriaPaddleOCRMarkItDown
Main focusOCR and Document AI for images/PDFsConverting many file formats to Markdown
Strengthtext recognition, layout, tables, formulas, parsingDOCX/PPTX/XLSX/HTML/PDF to Markdown
Best inputimages, scans, PDFsdigital documents in many formats
Outputtext, coordinates, JSON, MarkdownMarkdown
Choose whendocument is visual/scanned or needs OCRdocument already has extractable text

They can complement each other: MarkItDown handles many digital formats, while PaddleOCR handles scanned images, OCR and complex visual layouts.

FAQ

What is PaddleOCR?

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images/PDFs into text, JSON, Markdown and structured data for LLM/RAG workflows.2

Does PaddleOCR support multiple languages?

Yes. The README states that PaddleOCR supports 100+ languages.2

Do I need a GPU?

No for small tests; CPU works. For large documents, high throughput or heavier models, GPU is strongly recommended.

What is the difference between PP-OCRv5 and PP-StructureV3?

PP-OCRv5 focuses on text detection and recognition. PP-StructureV3 focuses on document layout parsing, tables, formulas, images and Markdown/structured output.34

What is PaddleOCR-VL?

PaddleOCR-VL is a compact VLM series for document parsing, supporting many languages and complex elements such as tables, formulas and charts.5

Does PaddleOCR provide MCP support?

Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3 and paddleocr_vl.12

Conclusion

PaddlePaddle/PaddleOCR is a practical repository for OCR and Document AI. Start with paddleocr ocr and the Python PaddleOCR(...).predict() API for simple extraction. Use PP-StructureV3 when you need layout-aware Markdown or structured output. Evaluate PaddleOCR-VL for difficult multilingual documents and complex layouts.

For production, accuracy is only one part of the work. You also need input validation, sandboxing, temporary-file cleanup, authentication, timeouts, version pinning, monitoring and human review for sensitive workflows.

References

Footnotes

  1. GitHub Open Graph preview image for PaddlePaddle/PaddleOCR. https://opengraph.githubassets.com/paddleocr-guide/PaddlePaddle/PaddleOCR

  2. GitHub. PaddlePaddle/PaddleOCR. https://github.com/PaddlePaddle/PaddleOCR 2 3 4 5 6

  3. PaddleOCR Documentation. “General OCR Pipeline Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/OCR.html 2

  4. PaddleOCR Documentation. “PP-StructureV3 Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PP-StructureV3.html 2

  5. PaddleOCR Documentation. “PaddleOCR-VL Usage Tutorial.” https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html 2

  6. PaddleOCR Documentation. “Quick Start.” https://www.paddleocr.ai/latest/en/quick_start.html

  7. PaddleOCR Documentation. “Installation.” https://www.paddleocr.ai/latest/en/version3.x/installation.html

  8. PaddleOCR Quick Start, command-line usage examples. https://www.paddleocr.ai/latest/en/quick_start.html

  9. PaddleOCR Quick Start, Python script usage examples. https://www.paddleocr.ai/latest/en/quick_start.html

  10. PaddleOCR Documentation. “Self-hosted Serving.” https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html

  11. PaddleOCR Serving docs, Basic Serving via PaddleX. https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/serving/serving.html

  12. PaddleOCR Documentation. “MCP Server.” https://www.paddleocr.ai/latest/en/version3.x/integrations/mcp_server.html 2

  13. PaddleOCR Documentation. “Agent Skills.” https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html

  14. PaddleOCR Agent Skills docs, prerequisites and installation. https://www.paddleocr.ai/latest/en/version3.x/integrations/skills.html

PR

Written by PixelRouter Editorial Team

We publish deep, authoritative guides on AI infrastructure, API gateway security, cloud financial management, and system optimizations for developers.

FAQ

What is PaddleOCR?

PaddleOCR is an open-source OCR and Document AI toolkit from PaddlePaddle that converts images and PDFs into text, JSON, Markdown, and structured data for LLM and RAG workflows.

Does PaddleOCR support multiple languages?

Yes. The article states that PaddleOCR supports 100+ languages.

Do I need a GPU to use PaddleOCR?

Not for small tests; CPU can work. For large documents, high throughput, or heavier models, the article recommends using a GPU.

What is the difference between PP-OCRv5 and PP-StructureV3?

PP-OCRv5 focuses on text detection and recognition, while PP-StructureV3 focuses on document layout parsing, tables, formulas, images, and Markdown or structured output.

What is PaddleOCR-VL?

PaddleOCR-VL is a compact vision-language model series for document parsing that supports many languages and complex elements such as tables, formulas, and charts.

Does PaddleOCR provide MCP support?

Yes. PaddleOCR provides an MCP Server with tools such as ocr, pp_structurev3, and paddleocr_vl.