Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: (unix time) try if you are using GNU date #1993

Closed
ShijianTang opened this issue Feb 27, 2015 · 28 comments
Closed

Error: (unix time) try if you are using GNU date #1993

ShijianTang opened this issue Feb 27, 2015 · 28 comments

Comments

@ShijianTang
Copy link

Hi,

When I tried to train the model of bvlc_reference_caffenet by my own data set, I have a problem: can anyone tell me how to fix it?

screen shot 2015-02-26 at 11 18 01 pm

@bunelr
Copy link

bunelr commented Mar 4, 2015

If this is of any help, the same error can be obtained just by running the tests. I recompiled caffe to use cuDNN, and wanted to run the tests: obtained this: (I skipped all the passing tests before)

[----------] 1 test from GaussianFillerTest/1, where TypeParam = double
[ RUN      ] GaussianFillerTest/1.TestFill
*** Aborted at 1425466715 (unix time) try "date -d @1425466715" if you are using GNU date ***
PC: @           0x50efd4 caffe::GaussianFillerTest_TestFill_Test<>::TestBody()
*** SIGSEGV (@0x6e0a000) received by PID 6705 (TID 0x2b18bebb3900) from PID 115384320; stack trace: ***
    @     0x2b18c4e9fd40 (unknown)
    @           0x50efd4 caffe::GaussianFillerTest_TestFill_Test<>::TestBody()
    @           0x70aa63 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x7016a7 testing::Test::Run()
    @           0x70174e testing::TestInfo::Run()
    @           0x701855 testing::TestCase::Run()
    @           0x704b98 testing::internal::UnitTestImpl::RunAllTests()
    @           0x704e27 testing::UnitTest::Run()
    @           0x4449fa main
    @     0x2b18c4e8aec5 (unknown)
    @           0x449b19 (unknown)
    @                0x0 (unknown)
make: *** [runtest] Segmentation fault (core dumped)

Anything I can do to help?

@bunelr
Copy link

bunelr commented Mar 4, 2015

I tried to start bisecting to identify where this bug could have come from but after a make clean and a reinstall, I can't reproduce anymore.
@ShijianTang , might want to try to do that if you're still having this problem.

@ShijianTang
Copy link
Author

Hi,

Thanks for your help.

Now, I have solved this problem. The problem is that the the format of train.txt file for generating the lmdb is in incorrect format.

@lefromage
Copy link

having same problem in caffe test when doing: make runtest

[----------] 1 test from LayerFactoryTest/0, where TypeParam = N5caffe8FloatCPUE
[ RUN ] LayerFactoryTest/0.TestCreateLayer
*** Aborted at 1426466145 (unix time) try "date -d @1426466145" if you are using GNU date ***
PC: @ 0x10af701b7 caffe::CuDNNConvolutionLayer<>::~CuDNNConvolutionLayer()
*** SIGSEGV (@0x0) received by PID 15366 (TID 0x7fff7eb7b300) stack trace: ***
@ 0x7fff93fcdf1a _sigtramp
@ 0x10b432b26 fatbinData
@ 0x10af7039f caffe::CuDNNConvolutionLayer<>::~CuDNNConvolutionLayer()
@ 0x10acfb314 caffe::LayerFactoryTest_TestCreateLayer_Test<>::TestBody()
@ 0x10aefe4fc testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10aeeda9a testing::Test::Run()
@ 0x10aeee892 testing::TestInfo::Run()
@ 0x10aeeefa0 testing::TestCase::Run()
@ 0x10aef4b17 testing::internal::UnitTestImpl::RunAllTests()
@ 0x10aefed54 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10aef4829 testing::UnitTest::Run()
@ 0x10ab8432d main
@ 0x7fff902365c9 start
@ 0x3 (unknown)
/bin/sh: line 1: 15366 Segmentation fault: 11 .build_release/test/test_all.testbin 0 --gtest_shuffle
make: *** [runtest] Error 139

@lefromage
Copy link

redid a:
make clean
make test
make runtest

previous failed test ran OK but,
[----------] 3 tests from SplitLayerTest/0, where TypeParam = N5caffe8FloatCPUE
[ RUN ] SplitLayerTest/0.Test
[ OK ] SplitLayerTest/0.Test (0 ms)
[ RUN ] SplitLayerTest/0.TestSetup
[ OK ] SplitLayerTest/0.TestSetup (0 ms)
[ RUN ] SplitLayerTest/0.TestGradient
[ OK ] SplitLayerTest/0.TestGradient (5 ms)
[----------] 3 tests from SplitLayerTest/0 (5 ms total)

this on failed:
[----------] 1 test from LayerFactoryTest/1, where TypeParam = N5caffe9DoubleCPUE
[ RUN ] LayerFactoryTest/1.TestCreateLayer
*** Aborted at 1426468417 (unix time) try "date -d @1426468417" if you are using GNU date ***
PC: @ 0x11479b13e cudnnDestroy
*** SIGSEGV (@0x30) received by PID 25063 (TID 0x7fff7eb7b300) stack trace: ***
@ 0x7fff93fcdf1a _sigtramp
@ 0x7fff5128dd62 (unknown)
@ 0x10ed611e8 caffe::CuDNNPoolingLayer<>::~CuDNNPoolingLayer()
@ 0x10ed6122f caffe::CuDNNPoolingLayer<>::~CuDNNPoolingLayer()
@ 0x10eae8cf4 caffe::LayerFactoryTest_TestCreateLayer_Test<>::TestBody()
@ 0x10ecec4fc testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10ecdba9a testing::Test::Run()
@ 0x10ecdc892 testing::TestInfo::Run()
@ 0x10ecdcfa0 testing::TestCase::Run()
@ 0x10ece2b17 testing::internal::UnitTestImpl::RunAllTests()
@ 0x10ececd54 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10ece2829 testing::UnitTest::Run()
@ 0x10e97232d main
@ 0x7fff902365c9 start
@ 0x3 (unknown)
/bin/sh: line 1: 25063 Segmentation fault: 11 .build_release/test/test_all.testbin 0 --gtest_shuffle
make: *** [runtest] Error 139

@gavinmh
Copy link

gavinmh commented Apr 26, 2015

I am encountering the same problem when using IMAGE_DATA layers:

I0426 16:48:21.890173 23626 layer_factory.hpp:74] Creating layer data
I0426 16:48:21.890197 23626 net.cpp:84] Creating Layer data
I0426 16:48:21.890213 23626 net.cpp:338] data -> data
I0426 16:48:21.890239 23626 net.cpp:338] data -> label
I0426 16:48:21.890254 23626 net.cpp:113] Setting up data
I0426 16:48:21.890269 23626 image_data_layer.cpp:36] Opening file 
I0426 16:48:21.890297 23626 image_data_layer.cpp:51] A total of 0 images.
*** Aborted at 1430081301 (unix time) try "date -d @1430081301" if you are using GNU date ***
PC: @     0x7f70e1333090 (unknown)
*** SIGSEGV (@0x0) received by PID 23626 (TID 0x7f70e224aa40) from PID 0; stack trace: ***
    @     0x7f70e0cced40 (unknown)
    @     0x7f70e1333090 (unknown)
    @     0x7f70e1b1c95c std::operator+<>()
    @     0x7f70e1b7a465 caffe::ImageDataLayer<>::DataLayerSetUp()
    @     0x7f70e1b4e986 caffe::BaseDataLayer<>::LayerSetUp()
    @     0x7f70e1b4ea89 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
    @     0x7f70e1b97432 caffe::Net<>::Init()
    @     0x7f70e1b98ef2 caffe::Net<>::Net()
    @     0x7f70e1bd8260 caffe::Solver<>::InitTrainNet()
    @     0x7f70e1bd9373 caffe::Solver<>::Init()
    @     0x7f70e1bd9546 caffe::Solver<>::Solver()
    @           0x40c4b0 caffe::GetSolver<>()
    @           0x406481 train()
    @           0x404a21 main
    @     0x7f70e0cb9ec5 (unknown)
    @           0x404fcd (unknown)
Segmentation fault (core dumped)

Compiling with #USE_CUDNN := 1 and USE_CUDNN := 1 both produced the error.

@RafaRuiz
Copy link

RafaRuiz commented May 1, 2015

anyone found a workaround ?

@spegoraro
Copy link

+1 same issue here

@bunelr
Copy link

bunelr commented May 3, 2015

I did a bit of digging and it seems to me that this the "Error: (unix time) try if you are using GNU date" is unrelated to Caffe and the problem that you are encountering.

See here, it appears this is just a result of the logging library (glog) that shows this when a failure happens. So all the problems that are posted here are unrelated.

If you get this error, you should look at the stacktrace provided instead of the unix time thing.
This issue should be closed because it doesn't reflect an error in caffe.

@acpn
Copy link

acpn commented May 5, 2015

Hi guys, I have the same problem, someone managed to solve?

@longjon
Copy link
Contributor

longjon commented May 8, 2015

@bunelr is quite right, there are many different unrelated errors here. The text which titles this issue is just a helpful hint for parsing the log message. You're welcome to open new tickets for specific, reproducible errors in master with DEBUG enabled.

@daerduoCarey
Copy link

I think most of this problem is due to the mis-use of cpu_data() mode and gpu_data() mode. I have encountered this problem while debugging something and I find that I should use cpu_data() instead of gpu_data(). The error is reasonable since the pointer to a gpu location doesn't mean anything in CPU. Data is in GPU. Hope this will help.

@breezedeus
Copy link

I got the same error. But after the following steps, everything is good:
make clean
make all
make test
make runtest

@wincle
Copy link

wincle commented Dec 1, 2015

I got this problem, because i use OpenBlas on Cents6.5,when i changed Atlas , make runtest successed.

@asanakoy
Copy link

@ShijianTang , Hi
What was the issue? I have the same errors as on your screeenshot.
What is train.txt? Maybe you meant train.prototxt ?

@javadba
Copy link

javadba commented Feb 15, 2016

I have this same issue - the SERIOUS one

LayerFactoryTest/1.TestCreateLayer
test_all.testbin(5455,0x7fff7788a000) malloc: *** error for object 0x206800000000: pointer being freed was not allocated

I am on cuda 7.5 on os/x

@yanleirex
Copy link

I have the same issue
*** Aborted at 1458527401 (unix time) try "date -d @1458527401" if you are using GNU date *** PC: @ 0x7f15b49464b3 std::operator+<>() *** SIGSEGV (@0x8) received by PID 3284 (TID 0x7f15b4f37740) from PID 8; stack trace: *** @ 0x7f15b30862f0 (unknown) @ 0x7f15b49464b3 std::operator+<>() @ 0x7f15b4a76936 caffe::ImageDataLayer<>::DataLayerSetUp() @ 0x7f15b49d15ea caffe::BasePrefetchingDataLayer<>::LayerSetUp() @ 0x7f15b49ab11c caffe::Net<>::Init() @ 0x7f15b49ac961 caffe::Net<>::Net() @ 0x7f15b494a0fa caffe::Solver<>::InitTrainNet() @ 0x7f15b494b477 caffe::Solver<>::Init() @ 0x7f15b494b81a caffe::Solver<>::Solver() @ 0x7f15b493fee3 caffe::Creator_SGDSolver<>() @ 0x40a028 train() @ 0x4070e8 main @ 0x7f15b3071a40 __libc_start_main @ 0x407859 _start @ 0x0 (unknown) Segmentation fault (core dumped)

@thesby
Copy link

thesby commented Jun 12, 2016

I met the problem too. It's very confusing since the same network, if I use database1, everything is ok, but database2, the error occured.

@sruthikesh-MU
Copy link

I was facing the same issue. Even though I recompiled the issue still exists.
My system has multiple GPUs and I fixed the problem by explicitly making only one GPU visible. You can do this by setting the environment variable(export CUDA_VISIBLE_DEVICES=0). give the current GPU number instead of '0'

@weiweikong
Copy link

@sruthikesh-MU is right. It seems that the multiple GPUs trigger GNU date issue

  • Checking the current GPU devices
$ nvidia-smi
Mon Sep 26 18:01:53 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 0000:02:00.0     Off |                    0 |
| 31%   69C    P0    74W / 235W |    150MiB / 11519MiB |     66%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 610      Off  | 0000:81:00.0     N/A |                  N/A |
| 40%   43C    P8    N/A /  N/A |    277MiB /  1023MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K40c          Off  | 0000:82:00.0     Off |                    0 |
| 30%   65C    P0    68W / 235W |    113MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
  • Selec two of them works fine.
export CUDA_VISIBLE_DEVICES=0 # only one K40c
export CUDA_VISIBLE_DEVICES=0,1 # one K40c + GT610
export CUDA_VISIBLE_DEVICES=0,2 # two K40c together
  • Result
[----------] Global test environment tear-down
[==========] 2081 tests from 277 test cases ran. (577020 ms total)
[  PASSED  ] 2081 tests.
[100%] Built target runtest

@alexeystrakh
Copy link

alexeystrakh commented Jul 8, 2017

Now, I have solved this problem. The problem is that the the format of train.txt file for generating the lmdb is in incorrect format.

@ShijianTang what was the issue with the train.txt format in your case?

I fixed the same issue by fixing my lmdb which I thought was created correctly while it was actually corrupted.

@wujiyoung
Copy link

@alexeystrakh How did you check your lmdb file ?
I met this problem too, but I do not know whether my lmdb file is corrupted. I used my lmdb file in another workstation and it workd fine, I transferred it to my current workstation by ftp. Was there anything wrong when transferred by ftp?

@ycsun19
Copy link

ycsun19 commented Sep 6, 2017

For me, the problem was caused by not assigning value to stepsize in solver.prototxt.

@MrMYHuang
Copy link

MrMYHuang commented Jan 4, 2018

My OS is CentOS 7.4.1708 and Python is Anaconda 3 5.0.1.
I got the same error when I ran make runtest in caffe compilation process:

make clean
make all
make test
make runtest

My problem seems to be related to linker use the boost libraries under the Anaconda installation, e.g., /opt/anaconda3/lib.
I solve the problem by removing Anaconda paths from PYTHON_LIB in Makefile.config.

@irazakharchenko
Copy link

irazakharchenko commented Jul 4, 2018

Hi!
I have the similar error.
I0704 14:48:50.318994 27276 net.cpp:380] data -> label
terminate called after throwing an instance of 'boost::python::error_already_set'
*** Aborted at 1530708530 (unix time) try "date -d @1530708530" if you are using GNU date ***
PC: @ 0x7ff1636b9fcf gsignal
*** SIGABRT (@0x36b700006a8c) received by PID 27276 (TID 0x7ff1498ec9c0) from PID 27276; stack trace: ***
@ 0x7ff1854840c0 (unknown)
@ 0x7ff1636b9fcf gsignal
@ 0x7ff1636bb3fa abort
@ 0x7ff163ddbd6d __gnu_cxx::__verbose_terminate_handler()
@ 0x7ff163dd9d36 __cxxabiv1::__terminate()
@ 0x7ff163dd9d81 std::terminate()
@ 0x7ff163dd9f98 __cxa_throw
@ 0x7ff163ef8532 boost::python::throw_error_already_set()
@ 0x7ff11895ad11 boost::python::api::object_operators<>::operator()<>()
@ 0x7ff11895af28 caffe::PythonLayer<>::LayerSetUp()
@ 0x7ff185c2c828 caffe::Net<>::Init()
@ 0x7ff185c2deee caffe::Net<>::Net()
@ 0x7ff185c39c47 caffe::Solver<>::InitTrainNet()
@ 0x7ff185c3a215 caffe::Solver<>::Init()
@ 0x7ff185c3a4ff caffe::Solver<>::Solver()
@ 0x7ff185c57231 caffe::Creator_SGDSolver<>()
@ 0x40ca7a train()
@ 0x4087d3 main
@ 0x7ff1636a72b1 __libc_start_main
@ 0x4092ca _start
Aborted
Can you help me?

@zh583007354
Copy link

I think most of this problem is due to the mis-use of cpu_data() mode and gpu_data() mode. I have encountered this problem while debugging something and I find that I should use cpu_data() instead of gpu_data(). The error is reasonable since the pointer to a gpu location doesn't mean anything in CPU. Data is in GPU. Hope this will help.

you are right! variables are used in CUDA_KERNEL_LOOP must be gpu_data(). However, if you make for() {} youself, you should keep corresponding variables as cpu_data(). Exciting!

@hailiang-wang
Copy link

hailiang-wang commented Jun 21, 2019

Come to the same problem with current version 04ab089

steps

make all
make runtest

trace

7ffe50bda000-7ffe50bdc000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
*** Aborted at 1561115206 (unix time) try "date -d @1561115206" if you are using GNU date ***
PC: @     0x7fd8cf719428 gsignal
*** SIGABRT (@0x65b8) received by PID 26040 (TID 0x7fd8d67a6740) from PID 26040; stack trace: ***
    @     0x7fd8cfabf390 (unknown)
    @     0x7fd8cf719428 gsignal
    @     0x7fd8cf71b02a abort
    @     0x7fd8cf75b7ea (unknown)
    @     0x7fd8cf76437a (unknown)
    @     0x7fd8cf76853c cfree
    @           0x652d17 caffe::MakeTempDir()
    @           0x687049 caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
    @           0x688b47 caffe::RMSPropSolverTest_TestRMSPropLeastSquaresUpdateWithRmsDecay_Test<>::TestBody()
    @           0x8eabb3 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x8e41ca testing::Test::Run()
    @           0x8e4318 testing::TestInfo::Run()
    @           0x8e43f5 testing::TestCase::Run()
    @           0x8e56cf testing::internal::UnitTestImpl::RunAllTests()
    @           0x8e59f3 testing::UnitTest::Run()
    @           0x46c1ed main
    @     0x7fd8cf704830 __libc_start_main
    @           0x473389 _start
    @                0x0 (unknown)
Makefile:542: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)

with gpu enabled

CUDA 8.0
NVIDIA-SMI 384.130
Tesla P100

@MarioCavero
Copy link

MarioCavero commented Sep 21, 2022

Any workarounds so far? make runtest runs correctly most tests. I keep getting some not passed tests (due to gpu device? Regardless of assigning cpu only) from time to time, but it seems alright.

 [----------] 8 tests from RMSPropSolverTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] RMSPropSolverTest/1.TestRMSPropLeastSquaresUpdateWithEverythingShare
unknown file: Failure
C++ exception with description "locale::facet::_S_create_c_locale name not valid" thrown in the test body.
[  FAILED  ] RMSPropSolverTest/1.TestRMSPropLeastSquaresUpdateWithEverythingShare, where TypeParam = caffe::CPUDevice<double> (1 ms)

Interestingly enough, the final error is:

[ RUN      ] ArgMaxLayerTest/0.TestCPUAxis
[       OK ] ArgMaxLayerTest/0.TestCPUAxis (28 ms)
[----------] 12 tests from ArgMaxLayerTest/0 (226 ms total)

[----------] 1 test from HDF5OutputLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] HDF5OutputLayerTest/1.TestForward
unknown file: Failure
C++ exception with description "locale::facet::_S_create_c_locale name not valid" thrown in the test fixture's constructor.
*** Aborted at 1663742486 (unix time) try "date -d @1663742486" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 5615 (TID 0xffff943c1010) from PID 0; stack trace: ***
    @     0xffffa216764c ([vdso]+0x64b)
    @     0xaaaad3930f50 (unknown)
    @     0xaaaad393643c (unknown)
    @     0xaaaad393063c (unknown)
    @     0xaaaad393073c (unknown)
    @     0xaaaad3930c08 (unknown)
    @     0xaaaad3930d0c (unknown)
    @     0xaaaad36a988c (unknown)
    @     0xffffa01cfd24 __libc_start_main
    @     0xaaaad36b1518 (unknown)
Segmentation fault
make: *** [Makefile:543: runtest] Error 139

System: Mendel GNU / Linux (Eagle), 10.0

My solution:
I also tried afterwards to sudo apt update and sudo apt upgrade and it seems I had a problem with my locales, regardless of having generated them with sudo locale-gen. It also seems variables LANGUAGE and LC_ALL were unset. Maybe an ssh connection bug/problem. so:

sudo locale-gen en_US en_US.UTF-8
export LC_ALL="en_US.UTF-8"
make runtest

Worked unbelievably miraculously well. No more random single errors in a few tests. All passed. No GNU time error.

[----------] Global test environment tear-down
[==========] 1162 tests from 152 test cases ran. (170439 ms total)
[  PASSED  ] 1162 tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests