Handwritten flowchart to code using deep learning

David Betancourt Montellano
4 min readApr 15, 2023

--

This article is an overview of a 2019–2020 school project focused on the recognition of handwritten flowcharts using mostly convolutional neural networks, with the recognition it generates C source code and the reconstructed digital version of the diagram.

“Hola mundo” (hello world) example.

The project produced several artifacts, including two deep learning models, a dataset for shapes and connectors, a module for generating C source code based on recognition results, another module for creating digital versions, and finally, a handler with a graphical user interface (GUI) to facilitate the inference and training process on a local computer.

The inspiration for this project stemmed from a similar project in which a handwritten mockup of a website was provided, and the system automatically generated the corresponding HTML, CSS, and JavaScript code. If you are reading this after March 2023, you may be aware that ChatGPT-4’s image multimodal capabilities can potentially handle tasks like this, and possibly even recognize handwritten flowcharts as well. Ref.

Inference pipeline

First of all, the image is converted to grayscale. As you can see, there are two detection flows that happen in an independent way (in the implementation don’t run in a concurrent way because of a limited resource availability).

At this point I’d like to clarify some terms:

  • Classification means assigning a class or category.
  • Location means find (x1, y1) and (x2, y2) coordinates inside the image for an specific object, the rectangle that we can draw using those coordinates is called a bounding box.
  • Detection is location and classification over an object inside the image.
  • Finally recognition refers to detect all the elements on the flowchart image (shapes and text).

One the one hand, there’s the text flow which binarizes the image, locates the text using Keras OCR and then classifies the text using a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network. Following this process, a new window appears where the user can verify the accuracy of the text detection using separated text inputs. If the user makes corrections to the text, they can decide to run continual learning which does train (forward and backpropagation) the text models to improve the performance in subsequent detections. This approach allows the models to better adapt to the user’s specific handwriting style.

On the other hand, the flow of shapes and connectors preprocess the image by appying unsharp masking to emphasize image features. A Faster R-CNN model is then used for object detection (objects are shapes and connectors), the image feature extraction is carried out using the VGG-16 backbone.

Object detection of shapes and connectors.

After the detection process, a directed acyclic graph (DAG) is built using the detections as nodes, a Conway diagram assisting int the construction. Finally, the DAG is used by both the code constructor module and the digital diagram one.

Results

When performing recognition trough the whole pipeline, the challenging part was the text. To address this, the aforementioned continual learning tecnnique was implemented, after several iterations with the same text style, text detection starts to improve. The performance could be better if the text dataset used was more oriented towards mathematical expressions or programming. The IAM dataset, which is on English, was used, and this may have been a factor since the system was tested with Spanish flowcharts.

Algorithm to generate the n-th term of the Fibonacci sequence.
Print even numbers from 2 to 100.

The detection of shapes and connectors was tested on a base of 56 flowcharts from various algorithms, in which 75% of the tested images were fully recognized. The object bounding boxes for most classes were accurate, with only a slightly lower performance observed for vertical arrows.

Regarding text classification, it was evaluated using the test dataset from the IAM dataset, achieving a 66.7% success rate in the tests performed. Using the Character Error Rate (CER) metric, we got 8.2% of error which measn a 91.8% of correct classification was reached.

Code, dataset and paper

The project repository is publicly available on GitHub under MIT license, where it describes how to set up for testing and also how to use our models through a graphical interface to process flowcharts. A paper was published on 2022 about this project on International Journal of Computer Applications, you can find it here.

In the same setup provided on the README there are links to download the datasets.

--

--

David Betancourt Montellano
David Betancourt Montellano

Written by David Betancourt Montellano

Me gusta leer de ciencia y tecnología, de ideas | STEM

Responses (1)