Skip to content

Strange IllegalReferenceCountException #4435

Closed
@doom369

Description

@doom369
Contributor

Hello. 4.0.32.Final, Ubuntu 14.04. I have a production system running for few weeks. Was reviewing logs and found this:

io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
    at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
    at io.netty.buffer.AbstractDerivedByteBuf.release(AbstractDerivedByteBuf.java:50)
    at io.netty.buffer.CompositeByteBuf$Component.freeIfNecessary(CompositeByteBuf.java:1370)
    at io.netty.buffer.CompositeByteBuf.deallocate(CompositeByteBuf.java:1625)
    at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:106)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:252)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:152)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:805)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:346)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)
18:13:29.114 ERROR - Handler class : class cc.blynk.common.handlers.common.decoders.MessageDecoder. Name : MessageDecoder#0

Not sure is that netty bug or my own. Decoder code - https://github.com/blynkkk/blynk-server/blob/master/common/src/main/java/cc/blynk/common/handlers/common/decoders/MessageDecoder.java

Activity

normanmaurer

normanmaurer commented on Nov 4, 2015

@normanmaurer
Member

@doom369 can you show me all the handlers you have in your pipeline ?

ninja-

ninja- commented on Nov 4, 2015

@ninja-

@doom369 I am pretty sure you broke the refcnt somewhere in your handlers, somewhere near composite buffers...

doom369

doom369 commented on Nov 4, 2015

@doom369
ContributorAuthor

@ninja- how could that be if in pipeline where error happen there is no my code that manages refcnt at any way. Netty code itself manages this. Also server was running for a weeks with load ~700 req/sec and this is only such error I found.

doom369

doom369 commented on Nov 5, 2015

@doom369
ContributorAuthor

One more, but look like in another pipeline (with SSL Handler)

io.netty.handler.codec.DecoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:380)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:152)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:805)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:346)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
    at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1177)
    at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1128)
    at io.netty.buffer.UnpooledUnsafeDirectByteBuf.internalNioBuffer(UnpooledUnsafeDirectByteBuf.java:480)
    at io.netty.handler.ssl.SslHandler.toByteBuffer(SslHandler.java:1146)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1138)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1028)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:968)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:349)
    ... 13 more
Scottmitch

Scottmitch commented on Nov 5, 2015

@Scottmitch
Member

@doom369 - It seems that buffer which is owned by ByteToMessageDecoder is being "double" released. However the decoder you linked to (MessageDecoder) copies the input buffer into a string, and does not release the input buffer, so that shouldn't create problems.

Is there any place where you don't copy the contents of the buffer and instead use the buffer (or a slice of the buffer) directly? For example writing a buffer you decode to a channel.

Can you reproduce the issue? If so it would be good to know what the reference count cumulation.refCnt() is on line ByteToMessageDecoder:244 before and after the call to callDecode.

I don't see any SSL initialization in HardwareServer, was this just for testing, or is there other code that we should be aware of?

doom369

doom369 commented on Nov 5, 2015

@doom369
ContributorAuthor

@Scottmitch

Is there any place where you don't copy the contents of the buffer and instead use the buffer (or a slice of the buffer) directly?

No. I deal with netty buffers only in MessageDecoder and MessageEncoder. In all other handlers I work with MessageBase that those classes produce.

Can you reproduce the issue?

No. It is just appearing in production system.

I don't see any SSL initialization in HardwareServer, was this just for testing, or is there other code that we should be aware of?

I have 4 different pipelines. 3 of them using SSL handlers. 1 of them (HardwareServer) doesn't. First stack I posted in ticket is without SSLHandler. So that means that exception came from pipeline of HardwareServer.
But second stack I posted is came from another pipeline (not HardwareServer) which one of those 3 I don't know.

All pipelines using MessageEncoder and MessageDecoder.

normanmaurer

normanmaurer commented on Nov 6, 2015

@normanmaurer
Member

very strange... Currently I have no idea what's wrong :/

Scottmitch

Scottmitch commented on Nov 7, 2015

@Scottmitch
Member

It is interesting that the original leak report you posted stems from the condition if your application code consumes all data in the culmination buffer then ByteToMessageDecoder will release the buffer ... that may explain why you only see it rarely. @doom369 - Can you verify how often you enter the below code block in ByteToMessageDecoder?

if (cumulation != null && !cumulation.isReadable()) {
  numReads = 0;
  cumulation.release();
  cumulation = null;
}
doom369

doom369 commented on Nov 7, 2015

@doom369
ContributorAuthor

@Scottmitch I'll do new deployment (that is planned) of 4.0.33 version. And will monitor it. If no error I'll close issue, otherwise will add more debug.

doom369

doom369 commented on Nov 11, 2015

@doom369
ContributorAuthor

4.0.33 didn't help. But I was able to reproduce issue on QA server. Will try to prepare some simple setup for you.

normanmaurer

normanmaurer commented on Nov 11, 2015

@normanmaurer
Member

@doom369 thanks!

doom369

doom369 commented on Dec 20, 2015

@doom369
ContributorAuthor

@normanmaurer sorry for huge delay. So finally I had a time to play with that.
I'm currently using epoll (4.0.33.Final) + native open SSL via netty-tcnative (1.1.33.Fork8). I made a test that reproduces issue locally. Changing open SSL to JDK makes issue not reproducible, but I saw another error with JDK ssl (may be related) :

Reason : javax.net.ssl.SSLException: Tag mismatch!

So regarding reproducing. Here is a server jar. To run please use :

java -jar server-0.12.2-SNAPSHOT.jar -dataFolder /home/xxx/test &

Also please checkout this project. Find test SimplePerformanceTest.java remove @ignore annotation from method testConnectAppAndHardware and run it.

Now in directory from where you where running jar file you'll find logs folder with blynk.log file with above exception. If log didn't appear you may need to run test few times (issue is not 100% reproducible).

I have this problem on 3 envs. All Ubuntu, 12.04, 14.04. Java version "1.8.0_40", "1.8.0_45"

Please let me know if you need more info.

self-assigned this
on Dec 20, 2015
normanmaurer

normanmaurer commented on Dec 20, 2015

@normanmaurer
Member

@doom369 thanks will check!

9 remaining items

added this to the 4.0.34.Final milestone on Dec 21, 2015
normanmaurer

normanmaurer commented on Dec 21, 2015

@normanmaurer
Member

@doom369 I now know what happens and also why (yay). Working on a fix... to make it short it has to do with the register/deregister that you do in your handler.

doom369

doom369 commented on Dec 21, 2015

@doom369
ContributorAuthor

@normanmaurer Hm... I had a feeling regarding that. Should tell you before. Glad you found problem.

normanmaurer

normanmaurer commented on Dec 23, 2015

@normanmaurer
Member

@doom369 could you please verify this as well: #4621

doom369

doom369 commented on Dec 23, 2015

@doom369
ContributorAuthor

@normanmaurer I was not able to build epoll module, maybe you have ready build somewhere?

normanmaurer

normanmaurer commented on Dec 23, 2015

@normanmaurer
Member

Nope :( what is the error?

Am 23.12.2015 um 17:57 schrieb Dmitriy Dumanskiy notifications@github.com:

@normanmaurer I was not able to build epoll module, maybe you have ready build somewhere?


Reply to this email directly or view it on GitHub.

doom369

doom369 commented on Dec 23, 2015

@doom369
ContributorAuthor

@normanmaurer It was issue with JDK. Fixed, made new build, tested with it, seems not reproducible any more. Thanks!

added 3 commits that reference this issue on Dec 24, 2015

[#4435] Always invoke the actual deregisteration later in the EventLoop.

[#4435] Always invoke the actual deregisteration later in the EventLoop.

[#4435] Always invoke the actual deregisteration later in the EventLoop.

normanmaurer

normanmaurer commented on Dec 24, 2015

@normanmaurer
Member

Fixed via #4621. @doom369 thanks again for all the help to reproduce it and track down 👍

added a commit that references this issue on Oct 19, 2023

[netty#4435] Always invoke the actual deregisteration later in the Ev…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @normanmaurer@doom369@ninja-@Scottmitch

      Issue actions

        Strange IllegalReferenceCountException · Issue #4435 · netty/netty