Compiling tesseract under OSX

Tesseract is a venerable OCR tool that runs from the command line.

While you can get hold of it in OSX by using homebrew or MacPorts, if you're like me, you don't like the bloat associated with these effective, but unwieldy tools.

My objective was to compile tesseract in OSX, and it turns out it's a fairly straightforward process: as long as you're willing to accept the rather rough hacks needed to get it going.

Here are the steps I used:

Pre-requisites:
  • XCode development tools
  • leptonica
  • libtiff, libpng, libjpeg (for leptonica)
Note that libtiff is dependent on libjpeg, so you'll need libjpeg first.

 1) Install leptonica

Leptonica is a library of image processing functions. Tesseract makes heavy use of Leptonica, so you'll need to install the libraries first. You should also ensure you've installed the image formate libraries you want to use with tesseract eg. libpng, libtiff and libjpeg

Download Leptonica, and then do the make/make install thing. You'll end up with a few .dylibs and .a files in your /usr/local/ directories.Now we get onto compiling tesseract.

2) Fix your devtools path before running autogen.sh


Your XCode devtools should be in your /Users/your_name directory. There will be one called 'devtools' there, and you need to add it to your path. Note that if you're using root (sudo or su) to compile tesseract, you'll need to point to the devtools directory specifically associated with a user. The root user is unlikely to have devtools, so if you've 'su'd, then don't forget to hardcode in your user name:

PATH=$PATH:/Users/your_user_name/devtools/autotools-bin/bin/
export $PATH

 

3) Run autogen.sh

You may need to do this as su (see first step for pointing devtools correctly).

(Thanks to http://emop.tamu.edu/Installing-Tesseract-Mac for this tip)

 

4) Set leptonica header location

Leptonica by default is installed in /usr/local/include/leptonica (not /usr/local/include). The configure script might not pick this up, set so the environment variable LIBLEPT_HEADERSDIR to /usr/local/include/leptonica

  OR

Update ./configure and add /usr/local/include/leptonica to the leptonica paths on line 17692:

 LIBLEPT_HEADERSDIR="/usr/local/include/leptonica /usr/local/include /usr/include /opt/local/include/leptonica"

5) Remove the check of leptonica small program

For some reason, the configure script tiny C program to test leptonica would not work on my rig. But...I know leptonica is there and running fine. To stop configure quitting at this step (it produces a message "leptonica library with pdf support (>= 1.71) is missing", change line 17748 from:
as_fn_error $? "leptonica library with pdf support (>= 1.71) is missing" "$LINENO" 5
to just a simple message (and continue):
    $as_echo_n "Pretending we compiled a leptonica program"

You also need to force the LIBS environment variable to pick up leptonica. On line 17754, after the fi, enter:
  LIBS="-llept $LIBS"

 

6) Remove training options

To use tesseract's training functions, you'd need pango, cairo and icu-dev as well as the function PKG_CHECK_MODULES (which was not available on my rig).
The easiest (if hackiest) way to do this is to simply comment the PKG_CHECK_MODULES line, then set the have_xxx to false ie.

#PKG_CHECK_MODULES(pango, pango, have_pango=true, have_pango=false)
have_pango=false

Do this for pango (above) and cairo too.

7) Get a copy of eng.traineddata 

You can get a copy of eng.traineddata from github. Download this file and install in /usr/local/share/tessdata.

8) Configure and make

You should now be able to run ./configure, then make and make install to give you tesseract in /usr/local/bin.


Comments

Popular Posts