Hi guys, and welcome to the last part of our Java OCR tutorial! We’re going to learn how to deploy the Sampler app to our Android phone. The Sampler application allows you to take images of characters / sets of characters to be trained. This is going to be a really short post and I’m really excited to share another OCR engine with you guys. So let’s get started.
Creating A Project From Existing Source
The Sampler app can be found here : /trunk/demos/sampler. Simply create a new Android Project and make sure you check ‘Create project from existing source’ checkbox. You could also import it to your workspace, either way works.
Again, you’ll need to link some classes / libraries
- javaocr-core – /trunk/core/src/main/src/java
- javaocr-camera-utils – /trunk/demos/camera-utils/src/main/java
Examining The Code
No OCR happening here obviously, this app is for sampling. What we’re interested in is the method
saveSample() This is where all the SD card storing action is happening.
final DataOutputStream dos = new DataOutputStream(new FileOutputStream(outdir.getAbsolutePath() + "/" + exp + "_" + (System.currentTimeMillis() / 1000) + ".dat"));
You’ll notice right away that it creates a file with a filename based on the characters you are sampling. So if you’re taking samples of the characters ABC, it’ll create a file with a filename ABC_12*timestamp*514.dat
This can be a real issue when sampling special characters, Android doesn’t allow you to create files with special characters as file names. Bummer.
Putting It All Together
Take samples with the Sampler, get all the .dat files from the SD card, put them in a folder in your PC. Set the trainer to point to that folder via
DAT_SOURCE_DIR variable. Retrieve the moments.json and freespaces.json generated by the Trainer. Put these files inside your Recognizer app – /res/raw/.
Note: In the Recognizer app, make sure you’re pointing to the right trained files. You can set them in the
InputStream inputStream = getResources().openRawResource(R.raw.freespaces)
inputStream = getResources().openRawResource(R.raw.moments);
And that’s basically it.
Java OCR is a really solid OCR engine, and it’s native to Java, so it’s ideal if you want to integrate an OCR engine to your Java project without having to wrap it around JNI, or if you want to learn how OCR works.
The problem is, it’s not fully ported to Android and requires serious sampling and training. After more than a hundred samples, I decided to use Tesseract instead.
Anyway, I’ll be covering a few OCR engines in the following days. Hope you guys enjoyed Java OCR.