Class ConformingPDFParser

    • Constructor Detail

      • ConformingPDFParser

        public ConformingPDFParser​(java.io.File inputFile)
                            throws java.io.IOException
        Constructor.
        Parameters:
        inputFile - The input stream that contains the PDF document.
        Throws:
        java.io.IOException - If there is an error initializing the stream.
    • Method Detail

      • parse

        public void parse()
                   throws java.io.IOException
        This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.
        Throws:
        java.io.IOException - If there is an error reading from the stream or corrupt data is found.
      • getDocument

        public COSDocument getDocument()
                                throws java.io.IOException
        This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.
        Returns:
        The document that was parsed.
        Throws:
        java.io.IOException - If there is an error getting the document.
      • getPDDocument

        public PDDocument getPDDocument()
                                 throws java.io.IOException
        This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
        Returns:
        The document at the PD layer.
        Throws:
        java.io.IOException - If there is an error getting the document.
      • parseTrailerInformation

        protected long parseTrailerInformation()
                                        throws java.io.IOException,
                                               java.lang.NumberFormatException
        Throws:
        java.io.IOException
        java.lang.NumberFormatException
      • readByteBackwards

        protected byte readByteBackwards()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readByte

        protected byte readByte()
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readBackwardUntilWhitespace

        protected java.lang.String readBackwardUntilWhitespace()
                                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • consumeWhitespaceBackwards

        protected byte consumeWhitespaceBackwards()
                                           throws java.io.IOException
        This will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.
        Returns:
        the first non-whitespace character found
        Throws:
        java.io.IOException - if there is an error reading from the file
      • consumeWhitespace

        protected byte consumeWhitespace()
                                  throws java.io.IOException
        This will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.
        Returns:
        the first non-whitespace character found
        Throws:
        java.io.IOException - if there is an error reading from the file
      • readLongBackwards

        protected long readLongBackwards()
                                  throws java.io.IOException,
                                         java.lang.NumberFormatException
        This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.
        Returns:
        the parsed number
        Throws:
        java.io.IOException - if there is an error reading from the file
        java.lang.NumberFormatException - if the bytes read can not be converted to a number
      • readInt

        protected int readInt()
                       throws java.io.IOException
        Description copied from class: BaseParser
        This will read an integer from the stream.
        Overrides:
        readInt in class BaseParser
        Returns:
        The integer that was read from the stream.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readNumber

        protected COSNumber readNumber()
                                throws java.io.IOException
        This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).
        Returns:
        the COSNumber which was read/parsed
        Throws:
        java.io.IOException
      • parseNumber

        protected COSNumber parseNumber​(java.lang.String number)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • processCosObject

        protected COSBase processCosObject​(java.lang.String string)
                                    throws java.io.IOException
        Throws:
        java.io.IOException
      • readObjectBackwards

        protected COSBase readObjectBackwards()
                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • readNameBackwards

        protected COSName readNameBackwards()
                                     throws java.io.IOException
        Throws:
        java.io.IOException
      • getObject

        public COSBase getObject​(long objectNumber,
                                 long generation)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • readObject

        public COSBase readObject​(long objectNumber,
                                  long generation)
                           throws java.io.IOException
        This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.
        Parameters:
        objectNumber - the object number you expect to read
        generation - the generation you expect this object to be
        Returns:
        the object being read.
        Throws:
        java.io.IOException
      • readObject

        protected COSBase readObject()
                              throws java.io.IOException
        This actually reads the object data.
        Returns:
        the object which is read
        Throws:
        java.io.IOException
      • readString

        protected java.lang.String readString()
                                       throws java.io.IOException
        This will read the next string from the stream.
        Overrides:
        readString in class BaseParser
        Returns:
        The string that was read from the stream.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readDictionaryBackwards

        protected COSDictionary readDictionaryBackwards()
                                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • readLineBackwards

        protected java.lang.String readLineBackwards()
                                              throws java.io.IOException
        This will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.
        Returns:
        the string which was read
        Throws:
        java.io.IOException - if there was an error reading data from the file
      • readLine

        protected java.lang.String readLine()
                                     throws java.io.IOException
        This will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.
        Overrides:
        readLine in class BaseParser
        Returns:
        the string which was read
        Throws:
        java.io.IOException - if there was an error reading data from the file
      • readWord

        protected java.lang.String readWord()
                                     throws java.io.IOException
        Throws:
        java.io.IOException
      • isRecursivlyRead

        public boolean isRecursivlyRead()
        Returns:
        the recursivlyRead
      • setRecursivlyRead

        public void setRecursivlyRead​(boolean recursivlyRead)
        Parameters:
        recursivlyRead - the recursivlyRead to set