Document Understanding Transformer(Donut) model is a transformer based model has ocr functionality.
You can finetune it with less effort than other OCR model finetuning. No ROI, no coordinates of annotations.
It's easy to prepare training data and extract specific info from an input image. Speaking about deploying it, as it's a transformer, it needs GPU acceleration to run. If you're looking for cheaper solution for document parsing, maybe YOLO model would a good choice.
a result output:
clovaai/donut
{'predictions': [{'billdate': '2024/03/20', 'billamount': '9555', 'etd': '2024/04/06', 'previousbillamount': '16658'}]}
Tips for Donut model training data
Tips for YOLO model training data