What is OCR?
Optical character recognition, Optical character reader or OCR is the process of reading printed or handwritten text and converting them into machine-encoded text. OCR is mainly used in the field of artificial intelligence, pattern recognition, and computer vision.
So how does it work? In simple words, for a computer, an image is nothing but a collection of pixels. In OCR processing, the image is scanned for light and dark areas to identify each character.
Emanuel Goldberg, an Israeli physicist and inventor developed a machine in 1914 that could read characters and convert them into standard telegraph code. Concurrently, in 1913, Edmund Fournier d’Albe invented the optophone. It was used mainly by the blind to scan text. It produces time-varying chords of tones to identify a letter. That was the beginning of OCR. With the advent of computer and internet, OCR is now available for free through different products like Adobe Acrobat, Google Drive etc.
Where is OCR used?
OCR is used in places like:
- Automatic data entry like check passport, invoice, bank statement etc
- Automatic number plate recognition
- Scanning and reading out the words to blind people
- Extracting business card information and storing them in a contact list etc
OCR in Android devices:
In this blog, we will learn how to implement OCR in Android applications. To implement it, we will use Mobile Vision Text API that provides an easy way to integrate OCR on almost all Android devices.
We have previously explored how Face Detection works (check details here ). Text Detection is similar to face detection. You can pull the code from Github directly (link) and run it using android studio.
- Create a project on Android Studio with one blank Activity. Add the Google Play services dependency to it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characterscompile 'com.google.android.gms:play-services-vision:11.8.0' - Add permission for camera in the manifest file :
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters<uses-permission android:name="android.permission.CAMERA"/> - Our main and only Activity file is MainActivity.java and layout xml file is activity_main.xml. activity_main.xml:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="utf-8"?> <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:id="@+id/activity_main" android:layout_width="match_parent" android:layout_height="match_parent" tools:context="com.example.textdetectionsample.MainActivity"> <SurfaceView android:id="@+id/surfaceView" android:layout_width="match_parent" android:layout_height="match_parent" android:paddingBottom="60dp" /> <TextView android:id="@+id/textView" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_alignParentBottom="true" android:layout_centerHorizontal="true" android:text="dfsadfa" android:textColor="#ffffff" android:textSize="30sp" /> </RelativeLayout> We have one SurfaceView to show the camera view and one TextView to show the detected text.
- In the MainActivity, check if camera-permission is available or not. If not, request for it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersif (ActivityCompat.checkSelfPermission(this, Manifest.permission.CAMERA) == PackageManager.PERMISSION_GRANTED) { startTextRecognizer(); } else { askCameraPermission(); } - On receiving the permission, create a TextRecognizer object.
- Create a CameraSource object to start the camera.
- Set one processor to the TextRecognizer to detect if any text is available on the camera screen. We will receive one callback and update the TextView that is on the camera screen. The code for starting the camera source looks like:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
mCameraSource = new CameraSource.Builder(getApplicationContext(), mTextRecognizer) .setFacing(CameraSource.CAMERA_FACING_BACK) .setRequestedPreviewSize(1280, 1024) .setRequestedFps(15.0f) .setAutoFocusEnabled(true) .build(); mSurfaceView.getHolder().addCallback(new SurfaceHolder.Callback() { @Override public void surfaceCreated(SurfaceHolder holder) { if (ActivityCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.CAMERA) == PackageManager.PERMISSION_GRANTED) { try { mCameraSource.start(mSurfaceView.getHolder()); } catch (IOException e) { e.printStackTrace(); } } else { askCameraPermission(); } } @Override public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) { } @Override public void surfaceDestroyed(SurfaceHolder holder) { mCameraSource.stop(); } }); } - And to start the text recognizing processor:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
mTextRecognizer.setProcessor(new Detector.Processor() { @Override public void release() { } @Override public void receiveDetections(Detector.Detections detections) { SparseArray items = detections.getDetectedItems(); StringBuilder stringBuilder = new StringBuilder(); for (int i = 0; i < items.size(); ++i) { TextBlock item = items.valueAt(i); if (item != null && item.getValue() != null) { stringBuilder.append(item.getValue() + " "); } } final String fullText = stringBuilder.toString(); Handler handler = new Handler(Looper.getMainLooper()); handler.post(new Runnable() { public void run() { mTextView.setText(fullText); } }); } }); }
How To Run the project :
- Use Android Studio 3.0 +.
- Import the project.
- Run it on a phone.
Ensure that Google play services are installed on the phone and it is connected to the internet.
Output:
You will get an output similar to the following image after executing the project:
Conclusion :
Using Google mobile vision API, we can easily integrate face detection, text detection or bar code detection on any Android device. Not only on Android, for iOS devices also Google has introduced the same features. If you want to learn more about Mobile vision API, you can check reference doc here.