Abstract: Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of ...