<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 17, 2014 at 9:53 PM, Neil R. Ormos <span dir="ltr"><<a href="mailto:ormos@ripco.com" target="_blank">ormos@ripco.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Brian Sobolak wrote:<br>
<br>
> Hi --<br>
><br>
> I have a daily report that is delivered to me via PDF that I'd like to do<br>
> two simply operations:<br>
><br>
> - 'wc' the whole thing to count roughly the number of lines<br>
> - 'grep' to count the number of certain words.<br>
><br>
> I'm reasonably certain that the underlying document is parseable as it<br>
> isn't a scanned image or anything like that.<br>
><br>
> I looked around briefly for 'pdf2txt' or something similar and didn't find<br>
> it in my toolkit; I also started the gs man page before being interrupted.<br>
><br>
> Has anyone tried this?<br>
<br>
</div>Try pdftotext(1). It comes in the poppler-utils<br>
package, at least in Debian.<br>
<br>
Output can be ordered/organized in three different<br>
ways, depending on whether you use the -layout<br>
option, the -raw option, or the default (none of<br>
these options). If the default output is<br>
unusable, try the other options.<br></blockquote><div><br><br>pdftotext would be my first choice, but if you want to get lost in options and be ready to take on whatever pdf you ever encounter:<br><br><a href="http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/">http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/</a><br>
<br><a href="https://packages.debian.org/wheezy/pdftk">https://packages.debian.org/wheezy/pdftk</a><br><br></div><div>You can use it to import a data file and fill out a PDF form. (because couldn't figure out how to fill it out and save it.)<br>
</div><div><br> <br clear="all"></div></div>-- <br>Carl K
</div></div>