Skip to content

Memory leak on Alias #22013

@xgwu

Description

@xgwu

Elasticsearch version:
5.0.0
Plugins installed: []
None
JVM version:
1.8.0_77-b03
OS version:
CentOS release 6.4 (Final)
Description of the problem including expected versus actual behavior:

One of our data node's is suffering from high heap usage last night and old GC was not able to reclaim any heap space. At the time, either bulk or queries were light and all thread pools were pretty idle. The node is one of the 120 data node cluster for logs analysis. Every night we have maintenance job deleting/force merging cold data and creating indices/aliases for the new day.

The node is configured with 31GB of heap and holds about 450 shards, 2k-2.5k segments. For the past week, the segment count/memory remained constant and even dropping. Heap usage had been crawling up until I restarted the node last night. I looked at all memory related stats from our monitoring systems and could not find the culprit for increasing heap usage.
image
image
image
image

Before restarting the node, I took a heap dump and analyzed with MAT. The huge number of org.elasticsearch.cluster.metadata.AliasOrIndex$Alias objects looks suspicious. They retained nearly 7GB of memory.
2016-12-06 17 58 44
2016-12-06 17 58 10

We do use alias intensively and there are 40k aliases in total across the whole cluster. After the node was recovered, another heap dump was taken. This time the number of org.elasticsearch.cluster.metadata.AliasOrIndex$Alias objects dropped to 673,427 instances and retained only 16MB of memory.

Does this suggest memory leak on Alias metadata?

Activity

ywelsch

ywelsch commented on Dec 7, 2016

@ywelsch
Contributor
  • Can you expand the QueryShardContext in the class_references analysis above to see which objects are keeping a reference to this?
  • Would you be willing to share the heap dump? (e.g. upload to S3).
xgwu

xgwu commented on Dec 7, 2016

@xgwu
Author

@ywelsch

  • There are currently about 40,000 aliases created in the cluster.

  • Below is the screenshot of expanded QueryShardContext
    image

  • I am willing to share the heap dump but it's 30GB in size. It would be hard for me to upload it to S3 considering I'm located in China. :(

ywelsch

ywelsch commented on Dec 7, 2016

@ywelsch
Contributor

If I see this correctly, the indices request cache (IndicesRequestCache) is holding onto a search context which holds onto the cluster state when the request was started. This has been fixed in 5.0.1, see here: #21284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ywelsch@xgwu

        Issue actions

          Memory leak on Alias · Issue #22013 · elastic/elasticsearch