<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 17, 2014 at 9:53 PM, Neil R. Ormos <span dir="ltr"><<a href="mailto:ormos@ripco.com" target="_blank">ormos@ripco.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Brian Sobolak wrote:<br>

<br>

> Hi --<br>

><br>

> I have a daily report that is delivered to me via PDF that I'd like to do<br>

> two simply operations:<br>

><br>

>  - 'wc' the whole thing to count roughly the number of lines<br>

>  - 'grep' to count the number of certain words.<br>

><br>

> I'm reasonably certain that the underlying document is parseable as it<br>

> isn't a scanned image or anything like that.<br>

><br>

> I looked around briefly for 'pdf2txt' or something similar and didn't find<br>

> it in my toolkit; I also started the gs man page before being interrupted.<br>

><br>

> Has anyone tried this?<br>

<br>

</div>Try pdftotext(1).  It comes in the poppler-utils<br>

package, at least in Debian.<br>

<br>

Output can be ordered/organized in three different<br>

ways, depending on whether you use the -layout<br>

option, the -raw option, or the default (none of<br>

these options).  If the default output is<br>

unusable, try the other options.<br></blockquote><div><br><br>pdftotext would be my first choice, but if you want to get lost in options and be ready to take on whatever pdf you ever encounter:<br><br><a href="http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/">http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/</a><br>


<br><a href="https://packages.debian.org/wheezy/pdftk">https://packages.debian.org/wheezy/pdftk</a><br><br></div><div>You can use it to import a data file and fill out a PDF form. (because couldn't figure out how to fill it out and save it.)<br>


</div><div><br> <br clear="all"></div></div>-- <br>Carl K

</div></div>