OCR on Linux

tux_compOCR (Optical Character Recognition) comes very handy when you are trying to convert the text in a scanned document or image to plain text. What you need is a OCR engine like tesseract-ocr. It is available in synaptic on Ubuntu. To install tesseract-ocr on Ubuntu, run:

$ sudo apt-get install tesseract-ocr

Tesseract has a GUI front-end available – gImageReader. To install on Ubuntu, run:

$ sudo add-apt-repository ppa:sandromani/gimagereader
$ sudo apt-get update
$ sudo apt-get install gimagereader

Many applications use the engine, for example gscantopdf which creates pdf documents. To install on Ubuntu:

$ sudo apt-get install gscan2pdf

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s