Best fit
- You need compression while preserving searchable text behavior.
- Files mix scanned pages and digital text pages.
- Future retrieval depends on keyword search and copyability.
Constraint workflow
Use this when legal, archive, or operations teams still need text retrieval after compression.
Reduce size with structure edits before quality-sensitive compression.
Balanced mode is less likely to break text search behavior.
Quickly confirm searchable output before handoff.
Needs compressed files that remain searchable for future lookup.
Different page types need different processing.
Checkpoint: Scanned sections are clearly tagged for OCR.
Targeted OCR keeps output cleaner and faster.
Checkpoint: Known keywords are searchable in converted pages.
Safe compression avoids losing searchability.
Checkpoint: Copy-paste sample text remains accurate.
Expected outcome: Users can search and copy key terms reliably.
Avoid this: Flattening all pages into images during export.
| Signal | Likely cause | Recommended fix |
|---|---|---|
| Keyword search returns nothing | Pages were image-only after scan pipeline. | Run OCR on scanned sections and validate with target keywords. |
| Search works in one reader but fails in another | Text layer encoding is inconsistent. | Re-export with stable text rendering and re-test on two readers. |
| Copy-paste output is garbled | Font mapping or OCR language pack is mismatched. | Re-run with proper language profile and verify copy-paste snippets. |