[UFO Chicago] grep on a PDF
Neil R. Ormos
ormos at ripco.com
Mon Mar 17 19:53:40 PDT 2014
Brian Sobolak wrote:
> Hi --
>
> I have a daily report that is delivered to me via PDF that I'd like to do
> two simply operations:
>
> - 'wc' the whole thing to count roughly the number of lines
> - 'grep' to count the number of certain words.
>
> I'm reasonably certain that the underlying document is parseable as it
> isn't a scanned image or anything like that.
>
> I looked around briefly for 'pdf2txt' or something similar and didn't find
> it in my toolkit; I also started the gs man page before being interrupted.
>
> Has anyone tried this?
Try pdftotext(1). It comes in the poppler-utils
package, at least in Debian.
Output can be ordered/organized in three different
ways, depending on whether you use the -layout
option, the -raw option, or the default (none of
these options). If the default output is
unusable, try the other options.
--Neil
More information about the ufo
mailing list