Compiling tesseract under OSX
Tesseract is a venerable OCR tool that runs from the command line.
While you can get hold of it in OSX by using homebrew or MacPorts, if you're like me, you don't like the bloat associated with these effective, but unwieldy tools.
My objective was to compile tesseract in OSX, and it turns out it's a fairly straightforward process: as long as you're willing to accept the rather rough hacks needed to get it going.
Here are the steps I used:
Pre-requisites:
Download Leptonica, and then do the make/make install thing. You'll end up with a few .dylibs and .a files in your /usr/local/ directories.Now we get onto compiling tesseract.
Your XCode devtools should be in your /Users/your_name directory. There will be one called 'devtools' there, and you need to add it to your path. Note that if you're using root (sudo or su) to compile tesseract, you'll need to point to the devtools directory specifically associated with a user. The root user is unlikely to have devtools, so if you've 'su'd, then don't forget to hardcode in your user name:
PATH=$PATH:/Users/your_user_name/devtools/autotools-bin/bin/
export $PATH
(Thanks to http://emop.tamu.edu/Installing-Tesseract-Mac for this tip)
as_fn_error $? "leptonica library with pdf support (>= 1.71) is missing" "$LINENO" 5
to just a simple message (and continue):
$as_echo_n "Pretending we compiled a leptonica program"
You also need to force the LIBS environment variable to pick up leptonica. On line 17754, after the fi, enter:
LIBS="-llept $LIBS"
The easiest (if hackiest) way to do this is to simply comment the PKG_CHECK_MODULES line, then set the have_xxx to false ie.
#PKG_CHECK_MODULES(pango, pango, have_pango=true, have_pango=false)
have_pango=false
Do this for pango (above) and cairo too.
While you can get hold of it in OSX by using homebrew or MacPorts, if you're like me, you don't like the bloat associated with these effective, but unwieldy tools.
My objective was to compile tesseract in OSX, and it turns out it's a fairly straightforward process: as long as you're willing to accept the rather rough hacks needed to get it going.
Here are the steps I used:
Pre-requisites:
- XCode development tools
- leptonica
- libtiff, libpng, libjpeg (for leptonica)
1) Install leptonica
Leptonica is a library of image processing functions. Tesseract makes heavy use of Leptonica, so you'll need to install the libraries first. You should also ensure you've installed the image formate libraries you want to use with tesseract eg. libpng, libtiff and libjpegDownload Leptonica, and then do the make/make install thing. You'll end up with a few .dylibs and .a files in your /usr/local/ directories.Now we get onto compiling tesseract.
2) Fix your devtools path before running autogen.sh
Your XCode devtools should be in your /Users/your_name directory. There will be one called 'devtools' there, and you need to add it to your path. Note that if you're using root (sudo or su) to compile tesseract, you'll need to point to the devtools directory specifically associated with a user. The root user is unlikely to have devtools, so if you've 'su'd, then don't forget to hardcode in your user name:
PATH=$PATH:/Users/your_user_name/devtools/autotools-bin/bin/
export $PATH
3) Run autogen.sh
You may need to do this as su (see first step for pointing devtools correctly).(Thanks to http://emop.tamu.edu/Installing-Tesseract-Mac for this tip)
4) Set leptonica header location
Leptonica by default is installed in /usr/local/include/leptonica (not /usr/local/include). The configure script might not pick this up, set so the environment variable LIBLEPT_HEADERSDIR to /usr/local/include/leptonicaOR
Update ./configure and add /usr/local/include/leptonica to the leptonica paths on line 17692:
LIBLEPT_HEADERSDIR="/usr/local/include/leptonica /usr/local/include /usr/include /opt/local/include/leptonica"
5) Remove the check of leptonica small program
For some reason, the configure script tiny C program to test leptonica would not work on my rig. But...I know leptonica is there and running fine. To stop configure quitting at this step (it produces a message "leptonica library with pdf support (>= 1.71) is missing", change line 17748 from:as_fn_error $? "leptonica library with pdf support (>= 1.71) is missing" "$LINENO" 5
to just a simple message (and continue):
$as_echo_n "Pretending we compiled a leptonica program"
You also need to force the LIBS environment variable to pick up leptonica. On line 17754, after the fi, enter:
LIBS="-llept $LIBS"
6) Remove training options
To use tesseract's training functions, you'd need pango, cairo and icu-dev as well as the function PKG_CHECK_MODULES (which was not available on my rig).The easiest (if hackiest) way to do this is to simply comment the PKG_CHECK_MODULES line, then set the have_xxx to false ie.
#PKG_CHECK_MODULES(pango, pango, have_pango=true, have_pango=false)
have_pango=false
Do this for pango (above) and cairo too.
Comments
Post a Comment