Skip to content

bazel GPU build error with fatal error: external/nccl_archive/src/nccl.h: No such file or directory #327

Closed
@cheyang

Description

@cheyang
Contributor

We are trying to build Tensorflow Serving 0.5.1 with TensorFlow 1.0.0@07bb8ea

Basing on CUDA 7.5, cuDNN 5.
Bazel 0.4.4

cd serving && bazel build -c opt --config=cuda tensorflow_serving/...
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe0160
8c/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ c
ompilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/
ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed
: error executing command external/local_config_cuda/crosstool/clang
/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTI
FY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-paramete
r ... (remaining 76 argument(s) skipped): com.google.devtools.build.
lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/org_tensorflow/tensorflow/contrib/ncc
l/kernels/nccl_manager.cc:15:0:
external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager
.h:23:44: fatal error: external/nccl_archive/src/nccl.h: No such fil
e or directory
 #include "external/nccl_archive/src/nccl.h"
                                            ^
compilation terminated.
INFO: Elapsed time: 147.378s, Critical Path: 107.11s

I'm able to find nccl.h, but it can't be found during bazel build. Any suggestions? Thanks in advanced.

find / -name nccl.h
/root/.cache/bazel/_bazel_root/5071e8dca1385fb776f72b33971bf157/exte
rnal/nccl_archive/src/nccl.h
/root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/exte
rnal/nccl_archive/src/nccl.h

Activity

tvkpz

tvkpz commented on Feb 19, 2017

@tvkpz

Same error here.

cuda 8.0
cudnn 5.1
bazel 4.2

ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ compilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 77 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc:15:0:
external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.h:23:44: fatal error: external/nccl_archive/src/nccl.h: No such file or directory
compilation terminated.

Any solutions?

cheyang

cheyang commented on Feb 23, 2017

@cheyang
ContributorAuthor

@ kirilg,can you help take a quick look at this issue? Thank you.

kinhunt

kinhunt commented on Feb 24, 2017

@kinhunt

same here
2017-02-24 1 05 02

jlertle

jlertle commented on Feb 24, 2017

@jlertle

To get around it you can comment out the DEP for nccl in: tensorflow/tensorflow/contrib/BUILD

Line 42 iirc

cheyang

cheyang commented on Feb 25, 2017

@cheyang
ContributorAuthor

Thanks, @jlertle

sskgit

sskgit commented on Feb 25, 2017

@sskgit

Thanks @jlertle.

cosastro

cosastro commented on Mar 24, 2017

@cosastro

which line in: tensorflow/tensorflow/contrib/BUILD is the DEP for nccl? i can't find it, thanks.

perdasilva

perdasilva commented on Mar 24, 2017

@perdasilva
Contributor

65: "//tensorflow/contrib/nccl:nccl_py",

I believe...

jlertle

jlertle commented on Mar 24, 2017

@jlertle

It was moved into a Windows check but the referenced path is still having issues resolving during Serving build process on Ubuntu. Bazel stuff.

cosastro

cosastro commented on Mar 27, 2017

@cosastro

I tried a script provided by #318, it works fine

skonto

skonto commented on Apr 10, 2017

@skonto

If you comment it out examples fail, I managed to built it as well but... I get
ImportError: cannot import name nccl with a minst example.

Here is the task that fails:

>>>>> # @org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so [action 'Compiling external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc']
(cd /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/execroot/serving && \
  exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE 
  '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object
   -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD 
   -MF bazel-out/local_linux-opt/bin/external/org_tensorflow/tensorflow/contrib/nccl/_objs/python/ops/_nccl_ops.so/external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.pic.d
    '-frandom-seed=bazel-out/local_linux-opt/bin/external/org_tensorflow/tensorflow/contrib/nccl/_objs/python/ops/_nccl_ops.so/external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.pic.o' -fPIC -DEIGEN_MPL2_ONLY 
  -iquote external/org_tensorflow -iquote bazel-out/local_linux-opt/genfiles/external/org_tensorflow -iquote external/bazel_tools 
  -iquote bazel-out/local_linux-opt/genfiles/external/bazel_tools -iquote external/nccl_archive 
  -iquote bazel-out/local_linux-opt/genfiles/external/nccl_archive -iquote external/local_config_cuda 
  -iquote bazel-out/local_linux-opt/genfiles/external/local_config_cuda -iquote external/protobuf 
  -iquote bazel-out/local_linux-opt/genfiles/external/protobuf -iquote external/eigen_archive 
  -iquote bazel-out/local_linux-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl
   -iquote bazel-out/local_linux-opt/genfiles/external/local_config_sycl -isystem external/bazel_tools/tools/cpp/gcc3 
   -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda
    -isystem external/local_config_cuda/cuda/include -isystem bazel-out/local_linux-opt/genfiles/external/local_config_cuda/cuda/include 
    -isystem external/protobuf/src -isystem bazel-out/local_linux-opt/genfiles/external/protobuf/src -isystem external/eigen_archive 
    -isystem bazel-out/local_linux-opt/genfiles/external/eigen_archive -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-DGOOGLE_CUDA=1' -msse3 -pthread -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc -o bazel-out/local_linux-opt/bin/external/org_tensorflow/tensorflow/contrib/nccl/_objs/python/ops/_nccl_ops.so/external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.pic.o)
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ compilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 77 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc:15:0:
external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.h:23:44: fatal error: external/nccl_archive/src/nccl.h:
 No such file or directory
 #include "external/nccl_archive/src/nccl.h"
                                            ^

I verified that nccl_Archive is fetched and unzipped correctly under .cache dir and from what I see
-iquote external/nccl_archive should do the work to include all stuff needed.

skonto

skonto commented on Apr 11, 2017

@skonto

I solved it by removing the prefix /external/nccl_archive.

41 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @gatoatigrado@jlertle@discordianfish@praeclarum@perdasilva

        Issue actions

          bazel GPU build error with fatal error: external/nccl_archive/src/nccl.h: No such file or directory · Issue #327 · tensorflow/serving