-
Notifications
You must be signed in to change notification settings - Fork 45.6k
Slim Retraining #1877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @eshirima - the best way is to just run the eval.py binary. We typically run this binary in parallel to training, pointing it at the directory holding the checkpoint that is being trained. The eval.py binary will write logs to an You want to see that the mAP has "lifted off" in the first few hours, and then you want to see when it converges. It's hard to tell without looking at these plots how many steps you need. |
@jch1 Thank you so much for your help both here and on SO!!! I just finished my training and it works really well. |
hi @eshirima, i'm trying to retrain ssd model with my own dataset and I reach loss around 2-5, but when I do the predict nothing is detected because the scores are around 0.01. I'm training a binary classifier like you, could you tell me how you label map is and how you create tfrecord file (I followed the tensorflow tutorial) but now I really don't know where the error is, my config file is like yours. Thanks |
@oscarorti I shared my experience on SO |
The solution here to run train.py and eval.py at the same time on single gpu is by adding the following lines in the
This will use 50% of gpu for training process, then run eval.py for the rest of memory |
I finally started the training process of the object detection API on my own dataset. Since none of the currently available models consisted of my object, I got rid of the checkpoint options in my configuration file.
A snap of the logged info so far
From step 0 until now, my loss has dramatically decreased but for the past couple hours, my loss has been fluctuating between 1-2. My questions are:
My config file:
The text was updated successfully, but these errors were encountered: