PDFBox Validation
Apache PDFBox library provides PreflightParser class. Using this class, we can validate the PDF Document. The ApachePreflight library is a Java tool that implements a parser compliant with the ISO-19005 specification (aka PDF/A-1).
Categories of Validation Error
In PDFBox library, if a validation fails, the object of the Validation Result contains all causes of the failure. In order to understand the validation failure, all the error codes have the following form X [.Y [.Z]] where-
- X -> It is the category (Example – Font validation error)
- Y -> It represent a subsection of the category (Example – “Font with Glyph(symbol) error”)
- Z -> It represent the cause of the error (Example – “Font with a missing Glyph”)
Note: Category (‘Y‘) and cause (‘Z‘) may be missing according to the difficulty to identify the error detail.
Follow the below steps to perform validation in PDF document –
Load Existing Document
Insert the path of the fileName as a string file, which can be shown in the following code.
Instantiate the parser with given PDF file
Instantiate the PreflightParser class and pass the existing fileName as its parameter.
Call the parse() method
The parse() method is used to parse the stream and populate the COSDocument object. The COSDocument object allows access to all aspects of a PDF document.
Get preflight document and validate.
Example-
Output:
After successful execution of the above program, the following output message will be shown below.