Description
Hi,
I am trying to compile TF for an ARM device and to exploit NEON-related optimizations.
Basically, I follow these instructions for PI (my device is a Sabre Board, not PI): https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile
but with a bit different flags, namely:
make -f tensorflow/contrib/makefile/Makefile OPTFLAGS="-Ofast -mfpu=neon -funsafe-math-optimizations -ftree-vectorize" HOST_OS=LINUX
because my CPU does not support vfpv4:
cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 7.54
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
The error I got:
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h: In static member function ‘static void gemmlowp::meta::GemmExecutorPackLHS::ExecuteDispatch3D(const P&) [with P = gemmlowp::meta::GemmParams<unsigned char, int, gemmlowp::meta::ColumnMajorWithSum, gemmlowp::meta::RowMajorWithSum, gemmlowp::meta::QuantizedStaticPreprocessedAsInt32, gemmlowp::meta::RowMajor>; int m = 1; int n = 8; int k = 8; int m_leftovers = 0; int n_leftovers = 7; int k_leftovers = 4]’:
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’
"d25", "d26", "d27", "d28", "d29", "cc", "memory");
^
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: ‘asm’ operand has impossible constraints
make: *** [/root/tensorflow_exp/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/meta_support.o] Error 1
As far as I understand, it complains about too little registers, which wonders me, since ARM has 16 registers. If I remove -mfpu=neon
flag, everything works like a charm.
I would greatly appreciated for any suggestions.
Environment info
Ubuntu 14.04.4 LTS
Linux sabresd 4.1.15-1.0.0+g3924425 #1 SMP PREEMPT Sun Mar 13 14:09:51 CST 2016 armv7l armv7l armv7l GNU/Linux
Installed version of CUDA and cuDNN:
No CUDA installed
TF commit hash: 1fcd6d1294564066c6f92b121a3aaf4ed186dc1a
Activity
aselle commentedon Oct 31, 2016
This looks like a problem compiling gemmlowp. So you should perhaps file a bug on http://github.com/google/gemmlowp/issues ... On their github README, they say that this is sometimes caused by having an insufficient compiler.
aselle commentedon Oct 31, 2016
@maciekcc, could you please comment on this?
maciekcc commentedon Oct 31, 2016
I'm on it. What compiler are you using? Some quick things you might check: try adding -fomit-frame-pointer if it's not enabled (it saves one extra register), or remove '-mthumb' (-mno-thumb) if you are using that.
dprylipko commentedon Nov 1, 2016
@aselle : Yes, it's gemmlowp code.
@maciekcc:
Did you mean -mno-thumb-interwork? Thanks for the hint, but I tried this:
make -f tensorflow/contrib/makefile/Makefile OPTFLAGS="-Ofast -mfpu=neon -funsafe-math-optimizations -ftree-vectorize -mno-thumb-interwork -fomit-frame-pointer" HOST_OS=LINUX
with the same result...
dprylipko commentedon Nov 3, 2016
I tried to compile TF on RPi 3 model B:
It worked fine, except some minor issues with the Makefile.
samjabrahams commentedon Jan 16, 2017
Currently updating my TensorFlow on Raspberry Pi guide and compiling version 0.12 (and hopefully 1.0.0-alpha afterward); just wanted to note that adding
--copt="-fomit-frame-pointer"
as a flag tobazel build
let me get past the gemmlowp compilation steps.Here's hoping everything else is peachy!
dprylipko commentedon Jan 16, 2017
I managed to build TF on RPi as well, I can't do that on Sabre Board though.
samjabrahams commentedon Jan 16, 2017
I should have been more specific- my last comment was for compiling TensorFlow through Bazel instead of using the Makefile.
pczekalski commentedon Apr 22, 2017
Same problem on Orange Pi+ 2E. Did you find the solution?
8 remaining items
eric-ibarra commentedon Jun 22, 2017
I'm still having this issue compiling with gcc-4.8.4. It works if I compile without the neon optimization, however, I believe there is significant performance to be gained using the neon optimization with eigen
godardt commentedon Aug 2, 2017
Any updates on this? I'm still having the issue with TF 1.3.0 rc1:
with:
and:
while compiling with Bazel with:
maciekcc commentedon Aug 2, 2017
The fix is already in gemmlowp (since May actually) : google/gemmlowp@9941cad
I will check if it ended up in tensorflow.
godardt commentedon Aug 3, 2017
@maciekcc Doesn't seems so. The most recent commit from the bazel workspace is this one.
zhonghuaxi commentedon Aug 8, 2017
+1