Document Understanding Transformer(Donut) model is a transformer based model has ocr functionality. You can finetune it with less effort than other ocr model finetuning. No ROI, no coordinates of annotations. It's easy to prepare training data and extract specific info from an input image.
a result output:
{'predictions': [{'billdate': '2024/03/20', 'billamount': '9555', 'etd': '2024/04/06', 'previousbillamount': '16658'}]}
clovaai/donut