Skip to content

Train an mAP 0.71 model by modifying 'mask' & 'scale'  #23

Open
@cory8249

Description

@cory8249
Contributor

I traced YOLOv2 C code last few days, I think there is a misunderstanding about 'mask' and 'scale'.

In this pytorch repo, the mask is used for loss function. It helps the network to focus on correct anchor boxes, instead of punishing other irrelevant boxes.
self.iou_loss = nn.MSELoss(size_average=False)(iou_pred * iou_mask, _ious * iou_mask) / num_boxes

So how to calculate right scale_mask ?

YOLO's mask is based on predicted objectness(0~1) for the box
So, if the box's predicted objectness is high (e.g. 0.9). But there are no ground-truth in that position. It should be punished. The punishment = noobject_scale * (0 - predicted objectness)
l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]);
Hence, this function help network learns to give reasonable confidence on the box

However, in this repo
_iou_mask[best_ious <= cfg.iou_thresh] = cfg.noobject_scale
dose not consider objectness. It punishes every unqualified box with the same value. Hence the detector learn very poor about objectness

Here is the most obvious one, other 'mask' and 'scale' are also implemented wrong way. And acutally YOLO has more complicated policy about these scale_mask. (some if-else conditions). I also find that YOLO's the loss is calculated before 'exp() and log(), not after.

By fixing scale_mask bug, VOC07 test mAP (trained on VOC07+12 trainval) increases from 0.67 to 0.71. Which is much closer to yolo-voc-weights.h5 (0.7221)

You can refer to my code darknet_v2.py. Though I am still debugging, not completed yet. Just for pointing out what I found.

Activity

longcw

longcw commented on May 16, 2017

@longcw
Owner

Thank you!

JesseYang

JesseYang commented on May 17, 2017

@JesseYang

@cory8249
In my understanding, the l.delta in darknet source code is the minus derivative of the loss with respect to the input value.

If the mask for those positions without ground truth boxes is just l.noobject_scale, then the loss is defined as l.noobject_scale / 2 * (pred_iou - gt_iou) ^ 2, and the gt_iou is 0. In this case, the minus derivative with respect to pred_iou should be: l.noobject_scale * (0 - pred_iou), which is consistent with the darknet source code: l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]).

From the equation that loss = l.noobject_scale / 2 * (pred_iou - gt_iou) ^ 2, the punishment for those positions without gt boxes depends on both the noobject_scale and the pred_iou. The minus derivative l.noobject_scale * (0 - pred_iou) also shows this point. Thus when pred_iou goes greater (from 0 to 1), the punishment already goes greater, and it is not necessary to incorporate pred_iou to the mask part to improve the punishment.

So I think the previous implementation _iou_mask[best_ious < cfg['iou_thresh']] = cfg['noobject_scale'] * 1 is reasonable and consistent with darknet source code.

changed the title [-]Train an mAP 0.71 model by fixing wrong implementation of 'mask' & 'scale' [/-] [+]Train an mAP 0.71 model by modifying 'mask' & 'scale' [/+] on May 17, 2017
yangyu12

yangyu12 commented on Dec 3, 2017

@yangyu12

Hi @cory8249 ,
I find out your yolo2-pytorch codes in your repository. But I find it hard to compare your code with original longcw's version.
Can you please list all the modification you do to improve the mAP to 0.71.
B.T.W is darknet_training_v3.py that works to obtain 0.71 mAP ?

Erotemic

Erotemic commented on Mar 27, 2018

@Erotemic

@JesseYang Your argument makes sense to me, and I tend to agree with it, but when I look in the current source code I see that @cory8249's version is being used. Why is this? It seems like iou_mask should simply be cfg.noobject_scale wherever there is no object. Is this wrong?

xuzijian

xuzijian commented on Apr 9, 2018

@xuzijian

I agree with @JesseYang 's points and in order to meet with the original code, I guess it should be
_iou_mask[best_ious < cfg['iou_thresh']] = math.sqrt(0.5*cfg['noobject_scale']) (and as well as for high iou anchors).

I'm just doing an experiment to test such settings.
The results is quite similar (and a little bit better with 416*416 input) with what I got from 'master' version, which is 72.3% currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @JesseYang@Erotemic@xuzijian@longcw@cory8249

        Issue actions

          Train an mAP 0.71 model by modifying 'mask' & 'scale' · Issue #23 · longcw/yolo2-pytorch