Error compiling on ARM with -mfpu=neon: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’

Hi,

I am trying to compile TF for an ARM device and to exploit NEON-related optimizations.
Basically, I follow these instructions for PI (my device is a Sabre Board, not PI): https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile

but with a bit different flags, namely:
`make -f tensorflow/contrib/makefile/Makefile OPTFLAGS="-Ofast -mfpu=neon -funsafe-math-optimizations -ftree-vectorize" HOST_OS=LINUX`

because my CPU does not support vfpv4:

```
cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10
```

The error I got:
```
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h: In static member function ‘static void gemmlowp::meta::GemmExecutorPackLHS::ExecuteDispatch3D(const P&) [with P = gemmlowp::meta::GemmParams<unsigned char, int, gemmlowp::meta::ColumnMajorWithSum, gemmlowp::meta::RowMajorWithSum, gemmlowp::meta::QuantizedStaticPreprocessedAsInt32, gemmlowp::meta::RowMajor>; int m = 1; int n = 8; int k = 8; int m_leftovers = 0; int n_leftovers = 7; int k_leftovers = 4]’:
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’
         "d25", "d26", "d27", "d28", "d29", "cc", "memory");
                                                           ^
/root/tensorflow_exp/tensorflow/contrib/makefile/downloads/gemmlowp/meta/streams_arm_32.h:4211:59: error: ‘asm’ operand has impossible constraints
make: *** [/root/tensorflow_exp/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/meta_support.o] Error 1
```

As far as I understand, it complains about too little registers, which wonders me, since ARM has 16 registers. If I remove ` -mfpu=neon` flag, everything works like a charm. 

I would greatly appreciated for any suggestions.

### Environment info
Ubuntu 14.04.4 LTS
`Linux sabresd 4.1.15-1.0.0+g3924425 #1 SMP PREEMPT Sun Mar 13 14:09:51 CST 2016 armv7l armv7l armv7l GNU/Linux`

Installed version of CUDA and cuDNN: 
No CUDA installed
TF commit hash: `1fcd6d1294564066c6f92b121a3aaf4ed186dc1a`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error compiling on ARM with -mfpu=neon: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’ #5303

Environment info

8 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Error compiling on ARM with -mfpu=neon: can’t find a register in class ‘LO_REGS’ while reloading ‘asm’ #5303

Description

Environment info

Activity

aselle commented on Oct 31, 2016

aselle commented on Oct 31, 2016

maciekcc commented on Oct 31, 2016

dprylipko commented on Nov 1, 2016

dprylipko commented on Nov 3, 2016

samjabrahams commented on Jan 16, 2017

dprylipko commented on Jan 16, 2017

samjabrahams commented on Jan 16, 2017

pczekalski commented on Apr 22, 2017

8 remaining items

eric-ibarra commented on Jun 22, 2017

godardt commented on Aug 2, 2017

maciekcc commented on Aug 2, 2017

godardt commented on Aug 3, 2017

zhonghuaxi commented on Aug 8, 2017

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions