Skip to content

Error compiling on ARM with -mfpu=neon: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’ #5303

Closed
@dprylipko

Description

@dprylipko

Hi,

I am trying to compile TF for an ARM device and to exploit NEON-related optimizations.
Basically, I follow these instructions for PI (my device is a Sabre Board, not PI): https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile

but with a bit different flags, namely:
make -f tensorflow/contrib/makefile/Makefile OPTFLAGS="-Ofast -mfpu=neon -funsafe-math-optimizations -ftree-vectorize" HOST_OS=LINUX

because my CPU does not support vfpv4:

cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

The error I got:

/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h: In static member function ‘static void gemmlowp::meta::GemmExecutorPackLHS::ExecuteDispatch3D(const P&) [with P = gemmlowp::meta::GemmParams<unsigned char, int, gemmlowp::meta::ColumnMajorWithSum, gemmlowp::meta::RowMajorWithSum, gemmlowp::meta::QuantizedStaticPreprocessedAsInt32, gemmlowp::meta::RowMajor>; int m = 1; int n = 8; int k = 8; int m_leftovers = 0; int n_leftovers = 7; int k_leftovers = 4]’:
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’
         "d25", "d26", "d27", "d28", "d29", "cc", "memory");
                                                           ^
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: ‘asm’ operand has impossible constraints
make: *** [/root/tensorflow_exp/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/meta_support.o] Error 1

As far as I understand, it complains about too little registers, which wonders me, since ARM has 16 registers. If I remove -mfpu=neon flag, everything works like a charm.

I would greatly appreciated for any suggestions.

Environment info

Ubuntu 14.04.4 LTS
Linux sabresd 4.1.15-1.0.0+g3924425 #1 SMP PREEMPT Sun Mar 13 14:09:51 CST 2016 armv7l armv7l armv7l GNU/Linux

Installed version of CUDA and cuDNN:
No CUDA installed
TF commit hash: 1fcd6d1294564066c6f92b121a3aaf4ed186dc1a

Activity

aselle

aselle commented on Oct 31, 2016

@aselle
Contributor

This looks like a problem compiling gemmlowp. So you should perhaps file a bug on http://github.com/google/gemmlowp/issues ... On their github README, they say that this is sometimes caused by having an insufficient compiler.

aselle

aselle commented on Oct 31, 2016

@aselle
Contributor

@maciekcc, could you please comment on this?

maciekcc

maciekcc commented on Oct 31, 2016

@maciekcc

I'm on it. What compiler are you using? Some quick things you might check: try adding -fomit-frame-pointer if it's not enabled (it saves one extra register), or remove '-mthumb' (-mno-thumb) if you are using that.

dprylipko

dprylipko commented on Nov 1, 2016

@dprylipko
Author

@aselle : Yes, it's gemmlowp code.
@maciekcc:

# gcc --version
gcc (Ubuntu/Linaro 4.8.5-2ubuntu1~14.04.1) 4.8.5

Did you mean -mno-thumb-interwork? Thanks for the hint, but I tried this:

make -f tensorflow/contrib/makefile/Makefile OPTFLAGS="-Ofast -mfpu=neon -funsafe-math-optimizations -ftree-vectorize -mno-thumb-interwork -fomit-frame-pointer" HOST_OS=LINUX

with the same result...

dprylipko

dprylipko commented on Nov 3, 2016

@dprylipko
Author

I tried to compile TF on RPi 3 model B:

make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI TARGET=PI OPTFLAGS="-Os -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8

It worked fine, except some minor issues with the Makefile.

self-assigned this
on Nov 3, 2016
samjabrahams

samjabrahams commented on Jan 16, 2017

@samjabrahams
Contributor

Currently updating my TensorFlow on Raspberry Pi guide and compiling version 0.12 (and hopefully 1.0.0-alpha afterward); just wanted to note that adding --copt="-fomit-frame-pointer" as a flag to bazel build let me get past the gemmlowp compilation steps.

Here's hoping everything else is peachy!

dprylipko

dprylipko commented on Jan 16, 2017

@dprylipko
Author

I managed to build TF on RPi as well, I can't do that on Sabre Board though.

samjabrahams

samjabrahams commented on Jan 16, 2017

@samjabrahams
Contributor

I should have been more specific- my last comment was for compiling TensorFlow through Bazel instead of using the Makefile.

pczekalski

pczekalski commented on Apr 22, 2017

@pczekalski

Same problem on Orange Pi+ 2E. Did you find the solution?

8 remaining items

eric-ibarra

eric-ibarra commented on Jun 22, 2017

@eric-ibarra

I'm still having this issue compiling with gcc-4.8.4. It works if I compile without the neon optimization, however, I believe there is significant performance to be gained using the neon optimization with eigen

godardt

godardt commented on Aug 2, 2017

@godardt

Any updates on this? I'm still having the issue with TF 1.3.0 rc1:

/tensorflow/core/kernels/_objs/quantized_ops/tensorflow/core/kernels/meta_support.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/gemmlowp/meta/streams.h:293:0,                                                                            from external/gemmlowp/meta/quantized_mul_kernels.h:22,
                 from ./tensorflow/core/kernels/meta_support.h:21,                                                                       from tensorflow/core/kernels/meta_support.cc:18:                                                       external/gemmlowp/meta/streams_arm_32.h: In static member function 'static void gemmlowp::meta::GemmExecutorPackLHS::ExecuteDispatch3D(const P&) [with P = gemmlowp::meta::GemmParams<unsigned char, int, gemmlowp::meta::ColumnMajorWithSum, gemmlowp::meta::RowMajorWithSum, gemmlowp::meta::QuantizedStaticPreprocessedAsInt32, gemmlowp::meta::RowMajor>; int m = 1; int n = 8; int k = 8; int m_leftovers = 0; int n_leftovers = 7; int k_leftovers = 4]':
external/gemmlowp/meta/streams_arm_32.h:4211:59: error: can't find a register in class 'LO_REGS' while reloading 'asm'
         "d25", "d26", "d27", "d28", "d29", "cc", "memory");
                                                           ^
external/gemmlowp/meta/streams_arm_32.h:4211:59: error: 'asm' operand has impossible constraints
Target //tensorflow:libtensorflow.so failed to build
INFO: Elapsed time: 17200.207s, Critical Path: 601.22s

with:

$ gcc --version
gcc (Ubuntu/Linaro 4.8.5-4ubuntu2) 4.8.5

and:

$cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 76.80
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 76.80
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 2
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 76.80
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 3
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 76.80
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

Hardware        : BCM2709
Revision        : a02082
Serial          : 0000000081b86d8e

while compiling with Bazel with:

bazel build -c opt --copt="-mfpu=neon-vfpv4" --copt="-funsafe-math-optimizations" --copt="-ftree-vectorize" --copt="-fomit-frame-pointer" --local_resources 1024,1.0,1.0 --verbose_failures //tensorflow:libtensorflow.so
maciekcc

maciekcc commented on Aug 2, 2017

@maciekcc

The fix is already in gemmlowp (since May actually) : google/gemmlowp@9941cad

I will check if it ended up in tensorflow.

godardt

godardt commented on Aug 3, 2017

@godardt

@maciekcc Doesn't seems so. The most recent commit from the bazel workspace is this one.

zhonghuaxi

zhonghuaxi commented on Aug 8, 2017

@zhonghuaxi

+1

make -f tensorflow/contrib/makefile/Makefile TARGET=PI OPTFLAGS="-O3 -mcpu=cortex-a15 -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -Llibs" -j100 CXX=/opt/toolchains/gcc-linaro-arm-linux-gnueabihf-4.8-2013.09_linux/bin/arm-linux-gnueabihf-g++
/home/zxi/projects/tensorflow/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h: In static member function ‘static void gemmlowp::meta::GemmExecutorPackLHS::ExecuteDispatch3D(const P&) [with P = gemmlowp::meta::GemmParams<unsigned char, int, gemmlowp::meta::ColumnMajorWithSum, gemmlowp::meta::RowMajorWithSum, gemmlowp::meta::QuantizedStaticPreprocessedAsInt32, gemmlowp::meta::RowMajor>; int m = 1; int n = 8; int k = 8; int m_leftovers = 0; int n_leftovers = 7; int k_leftovers = 4]’:
/home/zxi/projects/tensorflow/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’
         "d25", "d26", "d27", "d28", "d29", "cc", "memory");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @aselle@itsmeolivia@godardt@samjabrahams@eric-ibarra

      Issue actions

        Error compiling on ARM with -mfpu=neon: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’ · Issue #5303 · tensorflow/tensorflow