Closed
Description
Hello. 4.0.32.Final, Ubuntu 14.04. I have a production system running for few weeks. Was reviewing logs and found this:
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
at io.netty.buffer.AbstractDerivedByteBuf.release(AbstractDerivedByteBuf.java:50)
at io.netty.buffer.CompositeByteBuf$Component.freeIfNecessary(CompositeByteBuf.java:1370)
at io.netty.buffer.CompositeByteBuf.deallocate(CompositeByteBuf.java:1625)
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:106)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:252)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:152)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:805)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:346)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
18:13:29.114 ERROR - Handler class : class cc.blynk.common.handlers.common.decoders.MessageDecoder. Name : MessageDecoder#0
Not sure is that netty bug or my own. Decoder code - https://github.com/blynkkk/blynk-server/blob/master/common/src/main/java/cc/blynk/common/handlers/common/decoders/MessageDecoder.java
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
normanmaurer commentedon Nov 4, 2015
@doom369 can you show me all the handlers you have in your pipeline ?
doom369 commentedon Nov 4, 2015
@normanmaurer sure. pipeline initialization - https://github.com/blynkkk/blynk-server/blob/master/server/tcp-server/src/main/java/cc/blynk/server/core/hardware/HardwareServer.java#L31
handlers - https://github.com/blynkkk/blynk-server/tree/master/server/tcp-server/src/main/java/cc/blynk/server/handlers/hardware
ninja- commentedon Nov 4, 2015
@doom369 I am pretty sure you broke the refcnt somewhere in your handlers, somewhere near composite buffers...
doom369 commentedon Nov 4, 2015
@ninja- how could that be if in pipeline where error happen there is no my code that manages refcnt at any way. Netty code itself manages this. Also server was running for a weeks with load ~700 req/sec and this is only such error I found.
doom369 commentedon Nov 5, 2015
One more, but look like in another pipeline (with SSL Handler)
Scottmitch commentedon Nov 5, 2015
@doom369 - It seems that buffer which is owned by
ByteToMessageDecoder
is being "double" released. However the decoder you linked to (MessageDecoder) copies the input buffer into a string, and does not release the input buffer, so that shouldn't create problems.Is there any place where you don't copy the contents of the buffer and instead use the buffer (or a slice of the buffer) directly? For example writing a buffer you decode to a channel.
Can you reproduce the issue? If so it would be good to know what the reference count
cumulation.refCnt()
is on lineByteToMessageDecoder:244
before and after the call tocallDecode
.I don't see any SSL initialization in
HardwareServer
, was this just for testing, or is there other code that we should be aware of?doom369 commentedon Nov 5, 2015
@Scottmitch
No. I deal with netty buffers only in MessageDecoder and MessageEncoder. In all other handlers I work with MessageBase that those classes produce.
No. It is just appearing in production system.
I have 4 different pipelines. 3 of them using SSL handlers. 1 of them (HardwareServer) doesn't. First stack I posted in ticket is without SSLHandler. So that means that exception came from pipeline of HardwareServer.
But second stack I posted is came from another pipeline (not HardwareServer) which one of those 3 I don't know.
All pipelines using MessageEncoder and MessageDecoder.
normanmaurer commentedon Nov 6, 2015
very strange... Currently I have no idea what's wrong :/
Scottmitch commentedon Nov 7, 2015
It is interesting that the original leak report you posted stems from the condition if your application code consumes all data in the
culmination
buffer thenByteToMessageDecoder
will release the buffer ... that may explain why you only see it rarely. @doom369 - Can you verify how often you enter the below code block inByteToMessageDecoder
?doom369 commentedon Nov 7, 2015
@Scottmitch I'll do new deployment (that is planned) of 4.0.33 version. And will monitor it. If no error I'll close issue, otherwise will add more debug.
doom369 commentedon Nov 11, 2015
4.0.33 didn't help. But I was able to reproduce issue on QA server. Will try to prepare some simple setup for you.
normanmaurer commentedon Nov 11, 2015
@doom369 thanks!
doom369 commentedon Dec 20, 2015
@normanmaurer sorry for huge delay. So finally I had a time to play with that.
I'm currently using epoll (4.0.33.Final) + native open SSL via netty-tcnative (1.1.33.Fork8). I made a test that reproduces issue locally. Changing open SSL to JDK makes issue not reproducible, but I saw another error with JDK ssl (may be related) :
So regarding reproducing. Here is a server jar. To run please use :
Also please checkout this project. Find test SimplePerformanceTest.java remove @ignore annotation from method
testConnectAppAndHardware
and run it.Now in directory from where you where running jar file you'll find
logs
folder withblynk.log
file with above exception. If log didn't appear you may need to run test few times (issue is not 100% reproducible).I have this problem on 3 envs. All Ubuntu, 12.04, 14.04. Java version "1.8.0_40", "1.8.0_45"
Please let me know if you need more info.
normanmaurer commentedon Dec 20, 2015
@doom369 thanks will check!
9 remaining items
normanmaurer commentedon Dec 21, 2015
@doom369 I now know what happens and also why (yay). Working on a fix... to make it short it has to do with the register/deregister that you do in your handler.
doom369 commentedon Dec 21, 2015
@normanmaurer Hm... I had a feeling regarding that. Should tell you before. Glad you found problem.
normanmaurer commentedon Dec 23, 2015
@doom369 could you please verify this as well: #4621
doom369 commentedon Dec 23, 2015
@normanmaurer I was not able to build epoll module, maybe you have ready build somewhere?
normanmaurer commentedon Dec 23, 2015
Nope :( what is the error?
doom369 commentedon Dec 23, 2015
@normanmaurer It was issue with JDK. Fixed, made new build, tested with it, seems not reproducible any more. Thanks!
[#4435] Always invoke the actual deregisteration later in the EventLoop.
[#4435] Always invoke the actual deregisteration later in the EventLoop.
[#4435] Always invoke the actual deregisteration later in the EventLoop.
normanmaurer commentedon Dec 24, 2015
Fixed via #4621. @doom369 thanks again for all the help to reproduce it and track down 👍
[netty#4435] Always invoke the actual deregisteration later in the Ev…