Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with dynamic loading and caffe.proto #1917

Closed
TomKae opened this issue Feb 20, 2015 · 38 comments
Closed

Problem with dynamic loading and caffe.proto #1917

TomKae opened this issue Feb 20, 2015 · 38 comments
Labels

Comments

@TomKae
Copy link

TomKae commented Feb 20, 2015

After switching to the current caffe version today, dynamic loading of several libraries, which contain different classifiers for specific tasks, does not work anymore. We've got the following error message:

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto

Furthermore we are using Ubuntu 14.04 and the following version of libprotobuf:

/usr/lib/x86_64-linux-gnu/libprotobuf.so.8
/usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0

Before updating caffe everything was fine and allows for loading of more than one library containing a caffe net. Seems as if the part of building the caffe specific proto header and cc-file has changed. Is there any conncetion to our problem? And how can we fix our problem?

Best, Tom

@shelhamer shelhamer added the JL label Feb 20, 2015
@shelhamer
Copy link
Member

There is a known issue with protobuf in loading dynamically linked libraries that all link to protobuf: https://code.google.com/p/protobuf/issues/detail?id=128. @longjon may be able to comment from his experience on this. A possible workaround is to combine your separate library-classifier combinations into a single library-classifiers arrangement with different calls for each model -- but in my own work I've only worked with a single libcaffe.so linked library that may execute different models depending on the calling code.

@TomKae
Copy link
Author

TomKae commented Feb 21, 2015

Well, the point is, that it works fine until I've updated the caffe version yesterday and checked out the latest master (because of the modification of the net-constructor!). Even with the same protobuf version our implementation works with an older version of caffe on other 14.04 ubuntu systems we have. But this is a caffe version, we checked out by means of a snapshot (zip-file) more than half a year ago, so unfortunately we don't have any version number! But I already used a newer version, that also works in our framework with dynamic loading. Unfortunately I removed this trunk (before I checked out the current version) and thus I don't know its exact version.

We found out, that in the new caffe version, the CMakeList files and structure changed for building the protobuf files and part, respectively (in comparison to our old caffe snapshot!). So it seems, that these changes cause our trouble. I would like to test the last 2-3 caffe master versions. Is there a way to check out these older versions?

One more issue: Handling different classifiers in different libraries is a perfect way to allow for a flexible system structure, which is essential for a modular system concept. But even in this dynamic framework we only load "libcaffe.so" once which is a part of the main application. This is the usual way to handle shared objects.

@shelhamer
Copy link
Member

Is there a way to check out these older versions?

You can check out any version of the project since it is versioned through git. You can also look at our releases and pick a favorite.

CMakeList files and structure changed

#1667 overhauled the CMake build.

One more issue: Handling different classifiers in different libraries is a perfect way to allow for a flexible system structure, which is essential for a modular system concept. But even in this dynamic framework we only load "libcaffe.so" once which is a part of the main application. This is the usual way to handle shared objects.

Right, that's sensible and fine -- what I was trying to say and why I linked the protobuf issue is that multiple libraries linked to protobuf (like Caffe classifiers) can conflict if they have a shared message. The protobuf issue suggests statically linking your classifier modules to libprotobuf. Earlier the whole Caffe project was static linked but we've switched to dynamic linking.

@longjon
Copy link
Contributor

longjon commented Feb 21, 2015

You may want to try the Makefile/Makefile.config build; at present the CMake build is still community-supported, so I can't offer specific help with that.

The error you're getting does suggest that libcaffe is being loaded twice. You may want to check this with LD_DEBUG.

If you can produce a minimal non-working example using the Makefile build, I may be able to look into the issue.

@TomKae
Copy link
Author

TomKae commented Feb 24, 2015

Thanks for you fast replies. We solved our problem, which was caused by linking the "libproto.a" to our dynamic libraries. This causes the above error and is not necessary anymore with the new caffe version. Now everything is fine and works. :-)

Thank you for your great support & best regards, Tom

@tianzhi0549
Copy link
Contributor

Yep, linking caffe against libprotobuf.a instead of libprotobuf.so could solve this issue.

@denny1108
Copy link

@tianzhi0549 Can you show in details how to link caffe against libprotobuf.a instead of libprotobuf.so. Thank you so much.

@tianzhi0549
Copy link
Contributor

@denny1108 I changed caffe's Makefile. Specifically, I added -Wl,-Bstatic -lprotobuf -Wl,-Bdynamic to LDFLAGS and removed protobuf from LIBRARIES.

I have uploaded my Makefile to gist(https://gist.github.com/tianzhi0549/773c8dbc383c0cb80e7b). You could check it out to see what changes I made (Line 172 and 369).

@nbubis
Copy link

nbubis commented Jan 27, 2016

@tianzhi0549 Changing the Makefile gives:

LD -o .build_debug/lib/libcaffe.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libprotobuf.a(common.o): relocation R_X86_64_32S against `_ZTVN6google8protobuf7ClosureE' can not be used when making a shared object; recompile with -fPIC
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libprotobuf.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status

ANy ideas on how to solve this?

@tianzhi0549
Copy link
Contributor

@nbubis Try to run make clean before running make.

@nbubis
Copy link

nbubis commented Jan 27, 2016

This is after make clean.
On Jan 27, 2016 18:22, "Tian Zhi" notifications@github.com wrote:

@nbubis https://github.com/nbubis Try to run make clean before running
make.


Reply to this email directly or view it on GitHub
#1917 (comment).

@tianzhi0549
Copy link
Contributor

@nbubis Sorry for that.
I find this issue is similar to yours. You could give it a try. It seems that protobuf isn't installed correctly. Thank you:-).

@denny1108
Copy link

@tianzhi0549 thank you so much. At the end, I uninstall the old protobuf and install a new version, which solve my problem. It seems that the old protobuf on my machine can only generate one network instance for caffe.

@puneetdabulya
Copy link

@denny1108 Can you please tell me which version of protobuf worked for you. I am also facing this issue. Basically when I run MATLAB + caffe code for this first time, it works fine. But rerunning the same code crashes. If I restart MATLAB, the same code runs fine again. Its painful to restart MATLAB everytime I need to rerun. (will be helpful if you could share URL/details what exactly you did to fix this issue).

@nbubis
Copy link

nbubis commented Feb 2, 2016

@puneetdabulya & whoever else runs into this issue:

The issue seems to be memory deallocation by the caffe code. Using a static version of protobuf therefore solves the issue, since the protobuf lib is loaded separately by each instance of caffe.

To solve:

Uninstall any protobuf compilers you currently may have on your system.
Download the protobuf C++ source code (v3 beta worked) from github.
Configure protobuf with ./configure --disable-shared, and then build & install as usual.
Rebuild caffe with the corrected makefile posted above by @tianzhi0549.

Good luck!

@puneetdabulya
Copy link

@nbubis Thank you for the instructions. Those who are stuck at after following the above instructions.

LD -o .build_debug/lib/libcaffe.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libprotobuf.a(common.o): relocation R_X86_64_32S against `_ZTVN6google8protobuf7ClosureE' can not be used when making a shared object; recompile with -fPIC
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libprotobuf.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status

While installing protobuf, edit src/Makefile, in CXXFLAGS add -fPIC and recompile. It would fix this error.

@sallymmx
Copy link

I have do as the instruction as above, but a new problem is: when I do "make matcaffe", there is something wrong,
kaffe@kaffe:~/Documents/wmm/mat_faster_rcnn-master/external/caffe$ make matcaffeMEX matlab/+caffe/private/caffe_.cpp
Building with 'g++'.
/home/kaffe/Documents/wmm/mat_faster_rcnn-master/external/caffe/matlab/+caffe/private/caffe_.cpp:21:35:** fatal error: include/caffe/caffe.hpp: No such file or directory**
compilation terminated.

make: *** [matlab/+caffe/private/caffe_.mexa64] Error 255
but I have found caffe.hpp in the include/caffe/ ,
Please help me with this!
@puneetdabulya @nbubis @tianzhi0549

@szm-R
Copy link

szm-R commented Mar 6, 2016

Hi,
I did the things mentioned by nbubis but still matlab crashes after the first run :(

@NoaArbel
Copy link

NoaArbel commented Mar 8, 2016

Hello,

After downloading protobuf C++ source code, configure, make and install, and recompiling Caffe, everything worked fine in Matlab (i.e. matcaffe), but pycaffe does not!
I get an error while importing caffe in my python code:
from google.protobuf import symbol_database as _symbol_database
ImportError: cannot import name symbol_database

Does anyone had a problem like this? What do you thing I can do to fix it?

Thanks.

@puneetdabulya
Copy link

I installed pycaffe recently. Most protobuf related issues can be solved by
installing tensorflow using pip. First install pip and then search for how
to install tensorflow.

Hope it works.

Thanks

On Tuesday, March 8, 2016, NoaArbel notifications@github.com wrote:

Hello,

After downloading protobuf C++ source code, configure, make and install,
and recompiling Caffe, everything worked fine in Matlab (i.e. matcaffe),
but pycaffe does not!
I get an error while importing caffe in my python code:
from google.protobuf import symbol_database as _symbol_database
ImportError: cannot import name symbol_database

Does anyone had a problem like this? What do you thing I can do to fix it?

Thanks.


Reply to this email directly or view it on GitHub
#1917 (comment).

--Puneet

@NoaArbel
Copy link

NoaArbel commented Mar 8, 2016

It worked!
Thanks @puneetdabulya

@phunghx
Copy link

phunghx commented Mar 10, 2016

I'm facing with this problem. Even though I reinstalled the latest protobuf, my matlab program also crashes in second time. Please help me to resolve this error. Thank you very much!

@raingo
Copy link

raingo commented Apr 25, 2016

The root cause is some 3rd party library, such as opencv, are built with caffe.proto

The opencv-contrib-dnn module should be disabled for my case.

@KentChun33333
Copy link

@raingo
Would you share that How to disable opencv-contrib-dnn module ?
I am currently facing the similar problem...

@raingo
Copy link

raingo commented May 9, 2016

@KentChun33333 cmake -D BUILD_opencv_dnn=OFF

@ZiangYan
Copy link

ZiangYan commented Sep 5, 2016

@raingo works for me, thanks!

@KentChun33333 you could remove dnn directory from opencv_contrib/modules, and re-compile opencv with opencv_contrib again.

@agilmor
Copy link

agilmor commented Oct 6, 2016

@raingo workaround worked also for me, thanks!

So, is this a problem on protobuf? on caffe? or in opencv?
We should report it where it belongs in order to be properly fixed... right? ;-)

@ZiangYan
Copy link

ZiangYan commented Oct 6, 2016

@agilmor It's a problem on the third part module of opencv named dnn, I think.

@dtmoodie
Copy link

dtmoodie commented Dec 1, 2016

Hello,
I'm having a similar problem but I've traced the root of the problem to a ros install.
I have a program that loads an interface for caffe via a dynamically loaded plugin as well as an interface for ROS via another plugin. Only one of these plugins can be loaded at a time.
I used LD_DEBUG=libs which revealed that the ROS plugin is only loading libprotobuf.so not caffe's libproto.a. To me this indicates that it's not just an issue with loading libproto.a, but it's an issue of anything linking protobuf.

The exact error is:
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1018] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

It does work correctly if I build against a static libprotobuf.a.

@xingkongliang
Copy link

I run "make clean", after that run "make". This problem is solved.

@ShangxuanWu
Copy link

another possibility is that you messed up the bashrc file (when you have multiple caffe versions)

@zhujiagang
Copy link

I find the above instruction not worked and I configure protobuf with ./configure --enable-shared --with-pic, then it works.

@sainisanjay
Copy link

@dtmoodie, i have the same error, can you please elaborate me the solution. Where i have to used LD_DEBUG=libs??

@sainisanjay
Copy link

@TomKae please guide me to solve the error.

@TomKae
Copy link
Author

TomKae commented Aug 24, 2017 via email

@dtmoodie
Copy link

My solution can be found here: https://github.com/dtmoodie/docker_scripts/blob/master/deploy/Dockerfile

I compile protobuf from source with the correct flags and then compile Cafe against a static protobuf.

@sainisanjay
Copy link

Thanks @TomKae @dtmoodie For your response. However, my problem is exactly solved by using following link solution.
https://xiaobai1217.github.io/2017/08/07/fast_rcnn/#more

@Dawson-huang
Copy link

My solution way is that:
Rebuilding opencv with -D BUILD_opencv_dnn=OFF and then rebuilding caffe solved the issue.
Like this link:
https://stackoverflow.com/questions/43661767/raspberry-pi-2-with-caffe-protobuf-error
The premise is that I have already done these operations:
added -Wl,-Bstatic -lprotobuf -Wl,-Bdynamic to LDFLAGS and removed protobuf from LIBRARIES.
Maybe don't need these operations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests