Memory leak upon partial TransportShardBulkAction failure

**Describe the feature**:



**Elasticsearch version** (`bin/elasticsearch --version`):
5.3.2 -  5.6.3

**Plugins installed**: [None]

**JVM version** (`java -version`):
1.8.0_77-b03

**OS version** (`uname -a` if on a Unix-like system):
Linux SVR14982HW1288 2.6.32-642.6.2.el6.x86_64 #1 SMP Wed Oct 26 06:52:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

**Description of the problem including expected versus actual behavior**:
A production cluster running version 5.3.2 was experiencing very high heap usage after bulk updating some documents.  The search load was very light and the cluster was almost idle.  Old GC could not reclaim memory even after the bulk requests had ceased running.  After some investigation,  log4j seems to be the culprit as it holds strong reference to Bulkshardrequests object whenever any exception was thrown during the bulk request execution.

The problem seems quite similar to [issues#23798](https://github.com/elastic/elasticsearch/issues/23798) .  But I can reproduce the problem on latest stable version 5.6.3, so the root cause could be different.


**Steps to reproduce**:
Update a bunch of documents with bulk API,  but purposely generate some exceptions, for example, leave a couple of requests with non-existence doc_id or with wrong field type.  From ES logs , DEBUG messages pop up showing document missing or mapper parsing exception.   The heap used increased significantly depending on a single bulk size.   Dump the heap and analyze with MAT, the Bulkshardrequests object is referenced by Log4j's ParameterizedMessage. 

The only way I can reclaim the memory is to issue a small bulk request that triggers another exceptions. In which case, the ParameterizedMessage object references the new request with small memory footprint.

Below are the sample heap dump stats for our production cluster:
![screen shot 2017-11-07 at 17 41 46](https://user-images.githubusercontent.com/10510416/32488441-3772ac12-c372-11e7-9061-36103a77dab8.png)
![screen shot 2017-11-07 at 17 42 29](https://user-images.githubusercontent.com/10510416/32488448-3c4ae2ea-c372-11e7-8b4f-c9ac2504d03c.png)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak upon partial TransportShardBulkAction failure #27300

5 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Memory leak upon partial TransportShardBulkAction failure #27300

Description

Activity

xgwu commented on Nov 8, 2017

xgwu commented on Nov 8, 2017

5 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions