Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations" in "Hello, TensorFlow!" program #7778

Closed
bingq opened this issue Feb 22, 2017 · 56 comments
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower

Comments

@bingq
Copy link

bingq commented Feb 22, 2017

Opening this with reference to #7500.

Installed TensorFlow 1.0 with reference to https://www.tensorflow.org/install/install_windows on Windows 10 and hit the same issue discussed in #7500. With applying the solution suggested in that thread, the original issue disappeared but got the new warnings:

C:\Users\geldqb>python
Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
2017-02-22 22:28:20.696929: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.698285: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.700143: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.700853: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.701498: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.702190: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.702837: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-22 22:28:20.703460: W c:\tf_jenkins\home\workspace\nightly-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
print(sess.run(hello))
b'Hello, TensorFlow!'

@Carmezim
Copy link
Contributor

Carmezim commented Feb 22, 2017

Those are simply warnings. They are just informing you if you build TensorFlow from source it can be faster on your machine. Those instructions are not enabled by default on the builds available I think to be compatible with more CPUs as possible.
If you have any other doubts regarding this please feel free to ask, otherwise this can be closed.

edit: To deactivate these warnings as @yaroslavvb suggested in another comment, do the following:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

or if you're on a Unix system simply do export TF_CPP_MIN_LOG_LEVEL=2.

TF_CPP_MIN_LOG_LEVEL is a TensorFlow environment variable responsible for the logs, to silence INFO logs set it to 1, to filter out WARNING 2 and to additionally silence ERROR logs (not recommended) set it to 3

@bingq
Copy link
Author

bingq commented Feb 22, 2017

Thanks Camezim !
Any hint of how to kill those warnings?

@poxvoculi poxvoculi added the stat:community support Status - Community Support label Feb 22, 2017
@yaroslavvb
Copy link
Contributor

@bingq the best way I found to kill unwanted TF output:

run your script as tf.sh myscript.py where tf.sh contains

#!/bin/sh
# Run python script, filtering out TensorFlow logging
# https://github.com/tensorflow/tensorflow/issues/566#issuecomment-259170351
python $* 3>&1 1>&2 2>&3 3>&- | grep -v ":\ I\ " | grep -v "WARNING:tensorflow" | grep -v ^pciBusID | grep -v ^major: | grep -v ^name: |grep -v ^Total\ memory:|grep -v ^Free\ memory:

You can add extra |grep -v parts to get rid of more things

@tomrunia
Copy link

@Carmezim Any estimate of how much faster when compiling from source using advanced CPU instructions? Any reason to do this at all when running most of the graph on a GPU?

@Carmezim
Copy link
Contributor

Carmezim commented Feb 24, 2017

@tomrunia I haven't tested myself yet (actually am building it now with SSE) although heard 4-8x. If @yaroslavvb wants to chime in as he himself got 3x speed improvement.

@yaroslavvb
Copy link
Contributor

Yup, 3x for large matrix multiply on Xeon-V3, I expect it's probably due to FMA/AVX rather than SSE

@Carmezim
Copy link
Contributor

Carmezim commented Feb 25, 2017

@tomrunia Well pointed by @yaroslavvb, not SSE specifically in his case although those CPU instructions are expected to provide performance improvement

@astrojuanlu
Copy link

It would be very nice to silence these warnings from the Python side, it's not so easy to use grep from Windows.

@yaroslavvb
Copy link
Contributor

Try export TF_CPP_MIN_LOG_LEVEL=2

@astrojuanlu
Copy link

Try export TF_CPP_MIN_LOG_LEVEL=2

Thanks @yaroslavvb, I haven't tried it yet but an environment variable is definitely more useful.

@e-freiman
Copy link

It works, thank you

@hughsando
Copy link

Is it possible that these errors are coming from the fact that with MSVC, x64, SSE2 is implicit (all x64 chips have SSE2) but the __SSE2__ et al defines are not explicitly set?
Perhaps the guards should be on EIGEN_VECTORIZE_SSE2 etc instead.

@e-freiman
Copy link

e-freiman commented Mar 9, 2017

Could you say what should I exactly do according to your idea? What does "guards on" mean?

@hughsando
Copy link

The SSE warnings use code like this:

#ifndef __SSE__
    WarnIfFeatureUnused(CPUFeature::SSE, "SSE");
#endif  // __SSE__
#ifndef __SSE2__
    WarnIfFeatureUnused(CPUFeature::SSE2, "SSE2");
#endif  // __SSE2__

But the Eigen imeplementation (eigen/Eigen/Core) uses more complicated logic to work out whether to use SSS1/2:

#ifndef EIGEN_DONT_VECTORIZE
  #if defined (EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC) || defined(EIGEN_SSE2_ON_MSVC_2008_OR_LATER)

    // Defines symbols for compile-time detection of which instructions are
    // used.
    // EIGEN_VECTORIZE_YY is defined if and only if the instruction set YY is used
    #define EIGEN_VECTORIZE
    #define EIGEN_VECTORIZE_SSE
    #define EIGEN_VECTORIZE_SSE2

    // Detect sse3/ssse3/sse4:
    // gcc and icc defines __SSE3__, ...
    // there is no way to know about this on msvc. You can define EIGEN_VECTORIZE_SSE* if you
    // want to force the use of those instructions with msvc.
    #ifdef __SSE3__
      #define EIGEN_VECTORIZE_SSE3
    #endif
    #ifdef __SSSE3__
      #define EIGEN_VECTORIZE_SSSE3
    #endif
    #ifdef __SSE4_1__
      #define EIGEN_VECTORIZE_SSE4_1
    #endif
    #ifdef __SSE4_2__
      #define EIGEN_VECTORIZE_SSE4_2
    #endif
    #ifdef __AVX__
      #define EIGEN_VECTORIZE_AVX
      #define EIGEN_VECTORIZE_SSE3
      #define EIGEN_VECTORIZE_SSSE3
      #define EIGEN_VECTORIZE_SSE4_1
      #define EIGEN_VECTORIZE_SSE4_2
    #endif

Due mainly to the fact Visual Studio assumes SSE1/SSE2 when compiling for x64.
Also note that is depends on EIGEN_DONT_VECTORIZE - perhaps some user customization.

So one solution would be to #include eigen/Eigen/Core and use the "EIGEN_VECTORIZE_SSE" symbols in the conditional code-guard ("#ifndef EIGEN_VECTORIZE_SSE"),
I'm not 100% sure about the build system and whether Eigen is the only source of SSE operations, so I'm not 100% sure that this is the right answer.

I'm also not sure what is the right thing to do if building a binary for distribution. Do you include AVX and risk it not running, or do you not include it and risk the warning (and low performance)? Ideally you would build with full vectorization and let the software choose at runtime. I guess another possibility would be to build 2 dlls, and dynamically load the right one at runtime.

@metorm
Copy link

metorm commented Mar 13, 2017

You can try this post and tell me if it really becomes faster.

@Carmezim
Copy link
Contributor

Carmezim commented Mar 16, 2017

@Juanlu001 Check this comment for how this variable works and @yaroslavvb's code below for a handy way to change it.

@RaviTezu
Copy link

Thanks @Carmezim
Setting the env. variable TF_CPP_MIN_LOG_LEVEL=3 via the os package inside the code worked 👍

@aselle
Copy link
Contributor

aselle commented Apr 1, 2017

@hughsando, these are in fact the debates we had internally. Ideally in the future we'd be able to ship all compiled versions and choose at runtime, but logistically that's actually quite time consuming and tricky to implement. We are looking into it, but this was the best solution we had for now. We wanted people to know that if they have that warning, things are working, but it could be faster if you build it yourself. I.e. if you benchmarking our system, it's not a valid benchmark w/o compiling it with the best optimizations.

@yaroslavvb
Copy link
Contributor

@hughsando PS people have been uploading wheels built for their favorite configuration to https://github.com/yaroslavvb/tensorflow-community-wheels

@hughsando
Copy link

hughsando commented Apr 3, 2017 via email

@aselle
Copy link
Contributor

aselle commented Apr 3, 2017

@hughsando, in spirit that is the idea we would like to pursue. However, it is more difficult than that in that we use Eigen for a lot of the implementations of kernels. Eigen would have to be compiled multiple ways without causing any symbol conflicts int the final binary, and also we would probably have to break up modules into more dsos so as not to have too large of a binary resident.

@hekimgil
Copy link

hekimgil commented Apr 3, 2017

So after reading these, I went ahead and reinstalled, this time from Source following instructions on https://www.tensorflow.org/install/install_sources. I still see the "The TensorFlow library wasn't compiled to to use XXXX instructions..." warnings. So did I miss something or is installing from source and building is not what you meant by "building it yourself"?

@aselle
Copy link
Contributor

aselle commented Apr 4, 2017

What did you put for the compiler optimization options ./configure asked you)? and what are the remaining warnings it shows you? Are you running the binary on the same machine you compiled it on?

@aselle aselle added stat:awaiting response Status - Awaiting response from author and removed stat:community support Status - Community Support labels Apr 4, 2017
@hekimgil
Copy link

hekimgil commented Apr 4, 2017

Thank you aselle, my problem is solved now but here is what happened:

  • I used all default options with ./configure except: 1) Y for CUDA support, and 2) compute capability
  • The warnings were about the SSE3, SSE4.1, SSE4.2, AVX, AVX2, and FMA capabilities of my machine (+ negative NUMA node read)
  • Yes, the same machine...

However, after reading your message, I changed my bazel build command from
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package to
bazel build -c opt --copt=-march=native --config=cuda //tensorflow/tools/pip_package:build_pip_package
with that extra --copt=-march=native in there and that did the trick. The warnings on instructions disappeared (albeit the NUMA warning still remaining). So although my problem seems to be solved, I wonder if it is possible that the -march=native is not really the default option for the "optimization flags to use" question in the ./configure options?

@apacha
Copy link

apacha commented May 11, 2017

Just as a matter of interest: How comes, these warnings were not present in Tensorflow 1.0.1 but only in newer version Tensorflow 1.1.0?

@akors
Copy link

akors commented May 11, 2017

Just as a matter of interest: How comes, these warnings were not present in Tensorflow 1.0.1 but only in newer version Tensorflow 1.1.0?

Not quite sure what you mean, I am using the binaries from PIP for TF 1.0.1, and the warnings have been there already.

@apacha
Copy link

apacha commented May 11, 2017

Aha... then maybe my cached wheel of TF 1.0.1 was somehow different. It displayed some warnings regarding OpKernels but no warnings regarding SSE instructions.

@ProgramItUp
Copy link

@jshin49 and others who have tried it
How much performance improvement are you seeing by recompiling with compiler optimizations?

@apacha
Copy link

apacha commented May 21, 2017

No performance improvements here, when activating AVX, but probably because I was building and using the GPU-version. I thought it would speed up some parts at least, but I didn't find it to make a big impact. Probably when using the CPU-version it does make an impact.

@lgalke
Copy link

lgalke commented Jun 12, 2017

Single Instruction Multiple Data makes sense when you perform vector computations on the CPU. Thanks for the build-command examples.

@mphz
Copy link

mphz commented Jun 15, 2017

@apacha you should uninstall tensorflow and then reinstall again to kill the OpKernels warnings

@neelkadia-zz
Copy link

If I do export TF_CPP_MIN_LOG_LEVEL=2 then console don't show results, it just empty.
and if I do export TF_CPP_MIN_LOG_LEVEL=0 then only it popsup with all the warning and results

2017-06-24 07:02:26.650752: W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2017-06-24 07:02:26.851332: I tensorflow/examples/label_image/main.cc:251] withoutshadow (0): 0.801019
2017-06-24 07:02:26.851356: I tensorflow/examples/label_image/main.cc:251] withshadow (1): 0.198981

any thoughts why this happening?

@guruprasaad123
Copy link

guruprasaad123 commented Aug 5, 2017

I am also getting this Warning while using tensorflow api ' s in java,I have no idea why i am getting this.Any idea to resolve this warning!

@ghost
Copy link

ghost commented Aug 22, 2017

if you want to disable them, you may use the code below

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

this should silence the warnings. 'TF_CPP_MIN_LOG_LEVEL' represents the Tensorflow environment variable responsible for logging. Also if you are on Ubuntu you may use this code below

export TF_CPP_MIN_LOG_LEVEL=2 

I hope this helps.

@g10guang
Copy link

@Carmezim If I compute with GPU, will I got this warning?
And how to know if tensorflow run with GPU or CPU?
Thanks.

@scotthuang1989
Copy link
Contributor

scotthuang1989 commented Sep 3, 2017 via email

@Carmezim
Copy link
Contributor

Carmezim commented Sep 3, 2017

@g10guang Yeah, TF can use both CPU and GPU but even if you're using GPU only it will inform you of the SIMD instructions available when you run the code.

To know in which device TF is running you can set log_device_placement to True when creating the session as in:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

You can see more details on this under Logging Device placement in the documentation.

@sreeragh-ar
Copy link

sreeragh-ar commented Sep 21, 2017

Follow instructions here (Just one instruction !!!)
It is amazing. Time taken for for a training step is halved. #8037

@shanuka
Copy link

shanuka commented Sep 23, 2017

Thanks @Carmezim !

@giker17
Copy link

giker17 commented Dec 15, 2017

I am getting the same warning while using tensorflow_gpu 1.1.0 on win10, the version of Python is 3.6.3 installed in Anaconda.
Since the tf for GPU version is used, then what does that matters CPU?
I want to know what do these warnings mean, and what should I do or just leave them?

Here is my warnings:

C:\DevTools\Anaconda3\envs\py36_tfg>python
Python 3.6.3 | packaged by conda-forge | (default, Dec  9 2017, 16:22:46) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session()
2017-12-15 09:59:27.506604: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.507839: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.509196: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.509641: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.510098: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.510475: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.512253: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.512821: W c:\l\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-15 09:59:27.824267: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1060
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:01:00.0
Total memory: 3.00GiB
Free memory: 2.43GiB
2017-12-15 09:59:27.824508: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 0
2017-12-15 09:59:27.826709: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0:   Y
2017-12-15 09:59:27.827566: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0)

Another question:
How will I know if the tf using Gpu or Cpu while runing a program such as object detection in a video, are there any tools suggested for monitoring my devices?

Thanks a lot.
: )

@gknight7
Copy link

Instead of removing the warning is there any way to use those SSE instructions to speed up the training

@sreeragh-ar
Copy link

@gknight7
Please check my comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

No branches or pull requests