I plan to extend qt-box-editor with some additional features (e.g. generating boxes), but I would need to have tesseract and Leptonica as a library for Windows. Thanks to great job of Tom Powers there is already Leptonica library build by VC++ and there will be tesseract C++ library for version 3.02.
But it is not suggested using library created by other compiler. And my project use MinGW… I made some test to build tesseract with cmake+MinGW, but I plan to use Leptonica too. So I decided to compile Leptonica and tesseract with MinGW. Here is a short tutorial how to do it.
You need to have installed MinGW and MSYS. If you do not have it, then please follow great tutorial about compiling OpenTTD on MinGW.
Also try to install following packages (you should receive an error message if they are installed already):
mingw-get install msys-wget mingw-get install msys-bzip2 mingw-get install msys-patch mingw-get install msys-autoconf mingw-get install msys-libtool
Create directory '/usr/src'
in “MinGW Shell” and go there:
mkdir -p /usr/src cd /usr/src
This will be our “build directory”.
It is expected that you will download (source) package from individual site by yourself. Please be aware there could be available other (newer) versions of libraries/programs. If you download other version of library you will need to change names of packages/directories etc.
If you do not have installed svn (e.g. TortoiseSVN), please install svn as described on wiki.openttd.org. If you do not plan to test current svn code of tesseract (you will need to download the latest package), you do not need svn.
sourceware.org provides package of pthreads-win32.
tar xf pthreads-w32-2-8-0-release.tar.gz cd pthreads-w32-2-8-0-release export CPPFLAGS="-DPTW32_STATIC_LIB" make clean GC-static cp -iv pthread.h semaphore.h sched.h /mingw/include/ cp -iv libpthreadGC2.a /mingw/lib/libpthread.a cd ..
Leptonica support several image format. Respective libraries has to be installed before Leptonica is configured end installed.
zlib is a free, general-purpose, legally unencumbered — that is, not covered by any patents — lossless data-compression library. It is need by libpng.
tar xf zlib-1.2.6.tar.gz cd zlib-1.2.6 make -f win32/Makefile.gcc
Then change (line 33) ‘SHARED_MODE=0’ to ‘SHARED_MODE=1’ in “win32/Makefile.gcc” and run:
BINARY_PATH=/usr/local/bin \ INCLUDE_PATH=/usr/local/include \ LIBRARY_PATH=/usr/local/lib \ make -f win32/Makefile.gcc install cd ..
xz provides lzma support needed by TIFF library.
tar xf xz-5.0.3.tar.bz2 cd xz-5.0.3 ./configure make -j 4 && make install cd ..
libpng is an open, extensible image format with lossless compression.
tar xf libpng-1.5.9.tar.xz cd libpng-1.5.9 ./configure make && make install cd ..
giflib is a library for reading and writing gif images. It is API and ABI compatible with libungif which was in wide use while the LZW compression algorithm was patented.
tar xf giflib-4.1.6.tar.bz2 cd giflib-4.1.6 ./autogen.sh LDFLAGS="-no-undefined -Wl,--as-needed" ./configure make -j 4 && make install cd ..
jpeg-8d is free library for JPEG image compression.
tar xf jpegsrc.v8d.tar.gz cd jpeg-8d ./configure make -j 4 && make install cd ..
JBIG-KIT implements a highly effective data compression algorithm for bi-level high-resolution images such as fax pages or scanned documents. It can be used by TIFF library. This is optional package.
tar xf jbigkit-2.0.tar.gz cd jbigkit wget http://www.sk-spell.sk.cx/file_download/99/autotools_support.patch.gz gzip -cd autotools_support.patch.gz | patch -p1 ./autogen.sh ./configure make && make install cd ..
libtiff provides support for the Tag Image File Format (TIFF), a widely used format for storing image data.
You need to use 3.9.5 version – 4.0.1 did not work with tesseract/leptonica (I need to do more testing why).
tar xf tiff-3.9.5.tar.gz cd tiff-3.9.5 ./autogen.sh ./configure make -j 4 && make install cd ..
WebP is an image format that does lossy compression of digital photographic images.
tar xf libwebp-0.1.3.tar.gz cd libwebp-0.1.3 ./autogen.sh LDFLAGS="-no-undefined -Wl,--as-needed" \ CPPFLAGS=-DQGLOBAL_H ./configure make && make install cd ..
tar xf leptonica-1.68.tar.gz cd leptonica-1.68 ./autobuild ./configure
For version 1.68 you need to patch (it should be fixed in next version):
wget "http://leptonica.googlecode.com/issues/attachment?aid=560001000&name=zlib-include.patch&token=say6dkQyRWJp2MvoOO1hmTqXAtU%3A1332684407152" -O zlib-include.patch patch -p1 <zlib-include.patch
Then you can continue:
make -j 4 && make install cd ..
If you want to test recent code from svn then you need to fetch code from svn first:
svn checkout \ https://tesseract-ocr.googlecode.com/svn/trunk/ \ tesseract-ocr
WARNING: Because of number and size of language data svn repository is bigger than 624M! Alternatively you can download current snapshot of svn repository WITHOUT language data files and uncompress it as other packages.
Build process will consist:
cd tesseract-ocr ./autogen.sh LDFLAGS="-no-undefined -Wl,--as-needed" \ ./configure --disable-tessdata-prefix make -j 4 && make install
Option '--disable-tessdata-prefix'
will prevent that “TESSDATA-PREFIX” is set to installation directory (usually “/usr/local/share” or “/usr/share”) and built-in. With this option it is expected to have “tessdata” directory at the same place where is executable (or library) – if environment variable “TESSDATA_PREFIX” is not set.
Maybe you can use ‘CPPFLAG=”-DNDEBUG”’ before ‘./configure’ for release version:
LDFLAGS="-no-undefined -Wl,--as-needed" \ CPPFLAG="-DNDEBUG" ./configure \ --disable-tessdata-prefix
If you get tesseract from svn, you can install all language files with:
make install LANGS=
If you want to install just English, German and Spain language files then run:
make install LANGS="spa eng deu"
If you compiled tesseract from package, then you need to download and install (uncompress and copy to tessdata directory) language files manually.
Do not mix different versions of language data!!!. E.g. you cannot use 2.0x language files with tesseract 3.0x. You can not use higher version of language file in lower version tesseract (e.g. 3.02 language file in tesseract 3.01). But you can use 3.01 language file in tesseract 3.02.
And final info:
$ tesseract -v tesseract 3.02 leptonica-1.68 libgif 4.1.6 : libjpeg 8d : libpng 1.5.9 : libtiff 3.9.5 : zlib 1.2.6
Webp is not listed there, but it is supported:
convert eurotext.tif eurotext.png cwebp -q 100 eurotext.png -o eurotext.webp tesseract.exe eurotext.webp eurotext-webp