-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
Issue summary
I can successfully build caffe, with make all, make pycaffe, make test without error.
When I make runtest, it stops immediately; When I train mnist model, it stops ealierly, and gives the same errors.
I didn't change anything, just clone, and make. I have struggled with this issue for a long time, anybody can help me find out what's it wrong? thanks
*** Error in `.build_debug/tools/caffe': double free or corruption (out): 0x0000000002119160 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f01f4ea87e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f01f4eb0e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f01f4eb498c]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf8internal28DestroyDefaultRepeatedFieldsEv+0x1f)[0x7f01f61be8af]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf23ShutdownProtobufLibraryEv+0x8b)[0x7f01f61bdb3b]
/usr/lib/x86_64-linux-gnu/libmirprotobuf.so.3(+0x20329)[0x7f01d04fd329]
/lib64/ld-linux-x86-64.so.2(+0x10c17)[0x7f01f85e8c17]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7f01f4e6aff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7f01f4e6b045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7f01f4e51837]
.build_debug/tools/caffe[0x426dd9]
I attached my Makefile.config at
Makefile.config.pdf
I also attached the full debug output for record.
debug output.pdf
Your system configuration
Operating system: Ubuntu 16.04 Desktop
Compiler: gcc
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): anaconda python 2.7
Best,
Weldon
Activity
shelhamer commentedon Apr 14, 2017
Sorry, this seems to be a system issue. Please ask installation questions on the mailing list.
From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:
cailile commentedon Apr 24, 2017
Hi, I encounter this problem recently when running the caffe/ssd branch. The cause turned out to be that caffe has simultaneously linked to libprotobuf.so and libprotobuf-lite.so, which double free allocated memory. You may check whether you have this double-link problem by checking the libraries that the built caffe has linked to by typing:
ldd caffe | grep proto
In my case, the caffe has simultaneously linked to libprotobuf.so.10, libprotobuf-lite.so.10 and libmirprotobuf.so.3, and the latter two were originally linked to opencv_highgui. By removing the opencv's highgui library from caffe's makefile and the involved functions in the source files, the problem was gone.
Hope this helps and good luck!
jmuncaster commentedon Jun 4, 2017
@cailile thank you for your comment, I encountered this problem recently and you helped me to fix it. The GTK build of opencv_highgui was responsible for bringing in libprotobuf-lite.so. The fix that I did, which does not require changing the source code, was to rebuild OpenCV against Qt5 instead of GTK, and rebuild caffe. On Ubuntu 16.04 the qt5 package is "qt5-default" and the OpenCV cmake option is WITH_QT.
jontitalukdar commentedon Jun 13, 2017
@cailile I have encountered the exact same problem during installing caffe/ssd branch as mentioned here. However, the solution you directed is a bit unclear and it would really help if you could elaborate more on how you solved it. Thanks a lot.
cailile commentedon Jun 15, 2017
cailile commentedon Jun 15, 2017
@jontitalukdar Here are some more comments. The solution I currently adopt is to roll back to Ubuntu 14.04, because simply excluding opencv_highgui when building caffe will only solve the problem on the caffe side. Later on when I want to import both caffe and cv2 in Python, the problem came up again. I am not sure whether there is a solution for libprotobuf and libprotobuf-lite to run together. @jmuncaster's solution is worth a try. If he post it earlier, I may not have to roll back to Ubuntu 14.04:)
jontitalukdar commentedon Jun 15, 2017
@cailile Thank you so much for your reply. You are absolutely correct, the opencv_highgui will cause problems when importing both caffe and cv2 withing the same script. Moreover, I installed opencv in a python virtual environment, which caused some further errors. Removing any one of the two, libprotobuf and libprotobuf-lite, might cause further unforeseen problems in the future.
So I tried rebuilding OpenCV using Qt5 instead of GTK as proposed by @jmuncaster , and it worked!
I cleaned the original OpenCV build and then reinstalled it with Qt5.
make clean
mkdir build
cd build/
cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=/usr/local -DFORCE_VTK=ON -DWITH_TBB=ON -DWITH_V4L=ON -DWITH_QT=ON -DWITH_OPENGL=ON -DWITH_CUBLAS=ON -DCUDA_NVCC_FLAGS="-D_FORCE_INLINES" -DWITH_GDAL=ON -DWITH_XINE=ON -DBUILD_EXAMPLES=ON ..
I also added the library path of OpenCV in the Caffe Makefile.config and then reibuilt ssd/caffe using make.
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial /usr/local/share/OpenCV/3rdparty/lib/
It seems to have worked for me for now. I will keep a close watch if any other discrepancies crop up, but it Works for now!
Thank you so much for your help @cailile :)
novate commentedon Dec 22, 2017
@caille Thank you so much for your solution. Now the problem of double free or corruption has gone. The side effect is that when we make caffe without highgui, we can't utilize things like webcam or output detections as video.
@jontitalukdar Here is something I suggest: when making openCV, I strongly suggest add -D WITH_GTK=NO, without this my computer will automatically build with gtk if it can find gtk packs on computer which I don’t know why.
What’s more, I can’t install qt5-default(don’t know why, but can’t apt-get, lots of unmets), but I use qt4 instead for compiling openCV, and it works.
wishinger-li commentedon Dec 26, 2017
@cailile thanks for your suggestion,It worked on my computer,but,I have another problem.
The same code I used three months ago,it run smoothly.When I use it tomorrow,it run with error.
so what happens during this period?
panecho commentedon Aug 9, 2018
I solved it according to #5777.
laker-sprus commentedon Oct 26, 2018
Nice. Also work for the "./upgrade_net_proto_binary" abort problem.