I use python 3.6.4, Pytorch1.0.0 & torchvision 0.2.1, scipy 1.2.1.
The results in paper 'Deep Cross-Modal Pojection Learning for Image-Text Matching' on CUHK-PEDES are:{top- 1 = 49.37%,top-10 = 79.27%}, but I only get {top- 1 = 38.35%,top-10 = 63.39%} using MobileNetv1 as backbone, and {top- 1 = 41.44%,top-10 = 65.66%} using Resnet152. I wander if anyone could reproduce the results, and if it is convenient, please share the training details and hypeparameters.
I use python 3.6.4, Pytorch1.0.0 & torchvision 0.2.1, scipy 1.2.1.
The results in paper 'Deep Cross-Modal Pojection Learning for Image-Text Matching' on CUHK-PEDES are:{top- 1 = 49.37%,top-10 = 79.27%}, but I only get {top- 1 = 38.35%,top-10 = 63.39%} using MobileNetv1 as backbone, and {top- 1 = 41.44%,top-10 = 65.66%} using Resnet152. I wander if anyone could reproduce the results, and if it is convenient, please share the training details and hypeparameters.