summaryrefslogtreecommitdiff
path: root/doc/pdftotext.cat
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pdftotext.cat')
-rw-r--r--doc/pdftotext.cat68
1 files changed, 45 insertions, 23 deletions
diff --git a/doc/pdftotext.cat b/doc/pdftotext.cat
index 5c8a709..8217743 100644
--- a/doc/pdftotext.cat
+++ b/doc/pdftotext.cat
@@ -4,27 +4,27 @@ pdftotext(1) pdftotext(1)
NAME
pdftotext - Portable Document Format (PDF) to text converter (version
- 3.03)
+ 3.04)
SYNOPSIS
pdftotext [options] [PDF-file [text-file]]
DESCRIPTION
- Pdftotext converts Portable Document Format (PDF) files to plain text.
+ Pdftotext converts Portable Document Format (PDF) files to plain text.
- Pdftotext reads the PDF file, PDF-file, and writes a text file, text-
- file. If text-file is not specified, pdftotext converts file.pdf to
+ Pdftotext reads the PDF file, PDF-file, and writes a text file, text-
+ file. If text-file is not specified, pdftotext converts file.pdf to
file.txt. If text-file is '-', the text is sent to stdout.
CONFIGURATION FILE
- Pdftotext reads a configuration file at startup. It first tries to
+ Pdftotext reads a configuration file at startup. It first tries to
find the user's private config file, ~/.xpdfrc. If that doesn't exist,
it looks for a system-wide config file, typically /usr/local/etc/xpdfrc
- (but this location can be changed when pdftotext is built). See the
+ (but this location can be changed when pdftotext is built). See the
xpdfrc(5) man page for details.
OPTIONS
- Many of the following options can be set with configuration file com-
+ Many of the following options can be set with configuration file com-
mands. These are listed in square brackets with the description of the
corresponding command line option.
@@ -35,22 +35,44 @@ OPTIONS
Specifies the last page to convert.
-layout
- Maintain (as best as possible) the original physical layout of
- the text. The default is to 'undo' physical layout (columns,
- hyphenation, etc.) and output the text in reading order.
+ Maintain (as best as possible) the original physical layout of
+ the text. The default is to 'undo' physical layout (columns,
+ hyphenation, etc.) and output the text in reading order. If the
+ -fixed option is given, character spacing within each line will
+ be determined by the specified character pitch.
+
+ -table Table mode is similar to physical layout mode, but optimized for
+ tabular data, with the goal of keeping rows and columns aligned
+ (at the expense of inserting extra whitespace). If the -fixed
+ option is given, character spacing within each line will be
+ determined by the specified character pitch.
+
+ -lineprinter
+ Line printer mode uses a strict fixed-character-pitch and
+ -height layout. That is, the page is broken into a grid, and
+ characters are placed into that grid. If the grid spacing is
+ too small for the actual characters, the result is extra white-
+ space. If the grid spacing is too large, the result is missing
+ whitespace. The grid spacing can be specified using the -fixed
+ and -linespacing options. If one or both are not given on the
+ command line, pdftotext will attempt to compute appropriate
+ value(s).
+
+ -raw Keep the text in content stream order. Depending on how the PDF
+ file was generated, this may or may not be useful.
-fixed number
- Assume fixed-pitch (or tabular) text, with the specified charac-
- ter width (in points). This forces physical layout mode.
+ Specify the character pitch (character width), in points, for
+ physical layout, table, or line printer mode. This is ignored
+ in all other modes.
- -raw Keep the text in content stream order. This is a hack which
- often "undoes" column formatting, etc. Use of raw mode is no
- longer recommended.
+ -linespacing number
+ Specify the line spacing, in points, for line printer mode.
+ This is ignored in all other modes.
- -htmlmeta
- Generate a simple HTML file, including the meta information.
- This simply wraps the text in <pre> and </pre> and prepends the
- meta headers.
+ -clip Text which is hidden because of clipping is removed before doing
+ layout, and then added back in. This can be helpful for tables
+ where clipped (invisible) text would overlap the next column.
-enc encoding-name
Sets the encoding to use for text output. The encoding-name
@@ -102,14 +124,14 @@ EXIT CODES
99 Other error.
AUTHOR
- The pdftotext software and documentation are copyright 1996-2011 Glyph
+ The pdftotext software and documentation are copyright 1996-2014 Glyph
& Cog, LLC.
SEE ALSO
- xpdf(1), pdftops(1), pdfinfo(1), pdffonts(1), pdfdetach(1),
- pdftoppm(1), pdfimages(1), xpdfrc(5)
+ xpdf(1), pdftops(1), pdftohtml(1), pdfinfo(1), pdffonts(1), pdfde-
+ tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5)
http://www.foolabs.com/xpdf/
- 15 August 2011 pdftotext(1)
+ 28 May 2014 pdftotext(1)