diff options
Diffstat (limited to 'doc/pdftotext.cat')
-rw-r--r-- | doc/pdftotext.cat | 68 |
1 files changed, 45 insertions, 23 deletions
diff --git a/doc/pdftotext.cat b/doc/pdftotext.cat index 5c8a709..8217743 100644 --- a/doc/pdftotext.cat +++ b/doc/pdftotext.cat @@ -4,27 +4,27 @@ pdftotext(1) pdftotext(1) NAME pdftotext - Portable Document Format (PDF) to text converter (version - 3.03) + 3.04) SYNOPSIS pdftotext [options] [PDF-file [text-file]] DESCRIPTION - Pdftotext converts Portable Document Format (PDF) files to plain text. + Pdftotext converts Portable Document Format (PDF) files to plain text. - Pdftotext reads the PDF file, PDF-file, and writes a text file, text- - file. If text-file is not specified, pdftotext converts file.pdf to + Pdftotext reads the PDF file, PDF-file, and writes a text file, text- + file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout. CONFIGURATION FILE - Pdftotext reads a configuration file at startup. It first tries to + Pdftotext reads a configuration file at startup. It first tries to find the user's private config file, ~/.xpdfrc. If that doesn't exist, it looks for a system-wide config file, typically /usr/local/etc/xpdfrc - (but this location can be changed when pdftotext is built). See the + (but this location can be changed when pdftotext is built). See the xpdfrc(5) man page for details. OPTIONS - Many of the following options can be set with configuration file com- + Many of the following options can be set with configuration file com- mands. These are listed in square brackets with the description of the corresponding command line option. @@ -35,22 +35,44 @@ OPTIONS Specifies the last page to convert. -layout - Maintain (as best as possible) the original physical layout of - the text. The default is to 'undo' physical layout (columns, - hyphenation, etc.) and output the text in reading order. + Maintain (as best as possible) the original physical layout of + the text. The default is to 'undo' physical layout (columns, + hyphenation, etc.) and output the text in reading order. If the + -fixed option is given, character spacing within each line will + be determined by the specified character pitch. + + -table Table mode is similar to physical layout mode, but optimized for + tabular data, with the goal of keeping rows and columns aligned + (at the expense of inserting extra whitespace). If the -fixed + option is given, character spacing within each line will be + determined by the specified character pitch. + + -lineprinter + Line printer mode uses a strict fixed-character-pitch and + -height layout. That is, the page is broken into a grid, and + characters are placed into that grid. If the grid spacing is + too small for the actual characters, the result is extra white- + space. If the grid spacing is too large, the result is missing + whitespace. The grid spacing can be specified using the -fixed + and -linespacing options. If one or both are not given on the + command line, pdftotext will attempt to compute appropriate + value(s). + + -raw Keep the text in content stream order. Depending on how the PDF + file was generated, this may or may not be useful. -fixed number - Assume fixed-pitch (or tabular) text, with the specified charac- - ter width (in points). This forces physical layout mode. + Specify the character pitch (character width), in points, for + physical layout, table, or line printer mode. This is ignored + in all other modes. - -raw Keep the text in content stream order. This is a hack which - often "undoes" column formatting, etc. Use of raw mode is no - longer recommended. + -linespacing number + Specify the line spacing, in points, for line printer mode. + This is ignored in all other modes. - -htmlmeta - Generate a simple HTML file, including the meta information. - This simply wraps the text in <pre> and </pre> and prepends the - meta headers. + -clip Text which is hidden because of clipping is removed before doing + layout, and then added back in. This can be helpful for tables + where clipped (invisible) text would overlap the next column. -enc encoding-name Sets the encoding to use for text output. The encoding-name @@ -102,14 +124,14 @@ EXIT CODES 99 Other error. AUTHOR - The pdftotext software and documentation are copyright 1996-2011 Glyph + The pdftotext software and documentation are copyright 1996-2014 Glyph & Cog, LLC. SEE ALSO - xpdf(1), pdftops(1), pdfinfo(1), pdffonts(1), pdfdetach(1), - pdftoppm(1), pdfimages(1), xpdfrc(5) + xpdf(1), pdftops(1), pdftohtml(1), pdfinfo(1), pdffonts(1), pdfde- + tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5) http://www.foolabs.com/xpdf/ - 15 August 2011 pdftotext(1) + 28 May 2014 pdftotext(1) |