Which Object detection model will give the best result on images when the speed is not a problem for Text Images

I want to develop a model for cropping the equations from the Maths questions as people like me are struggling a lot for doing it manually for the research purpose. I want to know if we can do this? and if we can out of all the possible solutions out there for object recognition models, which one will produce the best results on Text images.

As there is tensorflow’s object recognition API, RCNN, Fast RCNN, Faster RCNN, YOLO (v-1,2,3,4,5).

An if there is any other , please do suggest. What I want to do is to detect the gray areas of equations in this image.

enter image description here

Note: The grey region shown in the image is for just demonstrating. My actual images are simple cropped questions from books with with background and black letters (most of the books)

Cross Validated Asked on November 21, 2021

1 Answers

One Answer

Note that there are two problems in this case: segmentation and classification. A neural net might be a solution for both steps in this case because you can easily generate zillions of labelled test images. Nevertheless, a classic approach should yield comparable results with much less efforts:

  1. Use a simple page segmentation alorithm like runlength smearing or bounding box merging for segmenting the image into regions
  2. Classify each region with an arbitrary classifier. You can use a NN on all normalized input pixels for this, but other classifiers like kNN should also work with gradient histograms as features (the gradients are computed on quasi grayscale images, which are generated from the onbit images by blurring). Gradient histograms were the state-of-the art features before the renaissance of neural nets.

Out of curiosity, I have tried out step one with the python library Gamera ( with the following code:

from gamera.core import *

img = load_image("MathExpressionInputExample.png")
img = img.to_onebit()

segments = img.runlength_smearing()

# now you could process each segment (e.g. saving it to a file)
for seg in segments:
    # do some stuff

# visualize the result
color_ccs = img.graph_color_ccs(segments)

The result looks reasonable to me(note that the colors only indicate the segmentation, with adjacent segments having different colors):

Segmentation result

Answered by cdalitz on November 21, 2021

Add your own answers!

Related Questions

What statistical analysis to used for kinetic data with multiple groups?

1  Asked on August 5, 2020 by carlos-valenzuela


Random forest after cross validation

1  Asked on August 1, 2020 by steven-niggebrugge


Grey relation between two datasets?

0  Asked on July 31, 2020 by msilvy


What is the seasonal trend lowess model in time series?

0  Asked on July 28, 2020 by christopher-u


Extended Cox model and cox.zph

2  Asked on July 25, 2020 by finance


Ask a Question

Get help from others!

© 2021 All rights reserved.