tesseract-ocr-en
These pages are dedicated to my tests of tesseract-ocr 3.00 and related software. All tests were done on Mandrivalinux 64bit primary.
In past I tried to test/train tesseract (2.04) for Slovak language. Version 3.00 brought support for a lot of languages including Slovak. Its ocr result differs based on input scans. Unfortunately training data are not available, so I can not improve it. Training process is still not described in every details.
For this reason I started to find a way how to create language file with my data. I expect that reader is familiar with ReadMe, FAQ and Training process.
articles
- Lector 0.3.0 released – created on: 26. November 2011| Modified on 26. November 2011
- QT4 tesseract - simple gui for tesseract – created on: 10. October 2011| Modified on 10. October 2011
- tesseract training for Slovak Fraktur script – created on: 24. April 2011| Modified on 24. April 2011
- QT Box Editor 1.03 is out – created on: 24. March 2011| Modified on 24. March 2011
- QT Box Editor 1.02 is out – created on: 11. February 2011| Modified on 11. February 2011
- CowBoxer - box editor for tesseract-ocr 3.00 – created on: 20. January 2011| Modified on 20. January 2011
- Scan Tailor - post-processing tool for scanned pages – created on: 12. November 2010| Modified on 12. November 2010
- Slovak wordlist – created on: 30. September 2010| Modified on 30. September 2010
- Linux OCR Software Comparison – created on: 25. August 2010| Modified on 25. August 2010
- Windows build of recent tesseract code (revision 454) – created on: 24. August 2010| Modified on 24. August 2010
- New version of leptonica library (1.66) – created on: 23. August 2010| Modified on 23. August 2010
- rss for tesseract-ocr-en – created on: 21. August 2010| Modified on 21. August 2010
- tesseract-ocr-en: variables – created on: 18. August 2010| Modified on 20. August 2010
- tesseract-ocr-en: dictionary creating – created on: 30. April 2010| Modified on 18. May 2010
- tesseract-ocr-en: Clustering and Compute the Character Set – created on: 27. April 2010| Modified on 28. April 2010
download
- leptonlib-1.66-0.src.rpm [6MB] (stiahnuté: 26x) – SRPM package of leptonica 1.66
- tesseract-ocr-r319.tar.gz [4MB] (stiahnuté: 147x) – tesseract 3.00 svn revision 319 with English language data file only
- zdpo-fontname.patch.gz [620B] (stiahnuté: 235x) – patch that solve issues regarding
input names/fontnames during training process - tesseract-unicharambigs.3.00.tar.gz [3KB] (stiahnuté: 332x) – extracted unicharambigs files from tesseract 3.00 lang.traineddata
- tesseract-3.00.slk-orig.tar.gz [1MB] (stiahnuté: 210x) – original Slovak language data for tesseract 3.00 (from svn revision)
- tesseract-3.00.slk-0.3.tar.gz [4MB] (stiahnuté: 211x) – Slovak language data created for tesseract 2.04 and reused for tesseract 3.00
gui for tesseract: