-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Closed
Description
I have a cluster with 2 nodes and approximate 10000 index.Each index has one replication. Creating index with some preset mapping on this cluster is painfully slow( about 1 minute to create a index with 1 replication). There is sufficient memory , java heap , cpu and disk when creating index. I use hot_threads api and find that 95% period of time is spended on running the following code on master node:
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:164)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyDeletedShards(IndicesClusterStateService.java:256)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:167)
- locked <0x00000000f96ea8b0> (a java.lang.Object)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
is this a bug? Can I avoid this by setting any configuration?
Elasticsearch version:
2.3.3
JVM version:
1.8
OS version:
debian7
Activity
jasontedor commentedon Jun 8, 2016
When you create an index, that causes a change to the routing table. A cluster state update task is submitted to the nodes in the cluster. When that cluster state update task arrives, each node must process the new routing table to see if they need to remove indices, delete shards, start shards, etc. Currently, applying deleted shards is O(number of indices * number of shards). I opened #18788 to address this.
However, you're still going to be hurting here. Having 10000 indices on two nodes with one replica is asking for pain. This means that you have at a minimum 10000 shards on each node if you have one shard per index, and maybe 50000 shards on each node if you're using the default number of shards per index. Either way, this is way too many shards. So #18788 is not meant to address your issue directly, just improve performance for the general case. You'll still need to do something about how many indices and shards that you have.
No.
jilen commentedon Oct 8, 2016
@jasontedor I suffered from this. Is there any way to improve the index creation speed ?
jasontedor commentedon Oct 10, 2016
@jilen Creating an index requires a cluster state update which can be a slow thing indeed. The issue here was about the degradation in index-creation speed as the number of indices increased; that's what #18788 addressed. I'd say that if you rely on index creation being fast, you probably have an architecture that needs to be reconsidered.
bleskes commentedon Oct 10, 2016
@jilen to quantify what @jasontedor said (which is very true) - index creation is slow when compared to data level operations like indexing and search. You should expect it to run within a couple of seconds. Also note that we now wait (since 5.0) for the primaries to be fully allocated before responding to the call.
jilen commentedon Oct 11, 2016
@jasontedor @bleskes I am now applying
one-index-per-user pattern
, there are actually more than 20k shards.Parallel automatically index creationg(via bulk or update api) actually makes the cluster dead(no response).
What do you suggest for my situation ? Disable automatically index creation ?
nik9000 commentedon Oct 11, 2016
Don't have an index per user.
On Oct 10, 2016 10:22 PM, "jilen" notifications@github.com wrote:
NelsonBurton commentedon Jul 29, 2021
An alternative to index per user, is to put all customers in one index, and use your customer's identifier as Elasticsearch's routingId, https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html . Works well in our system with millions of users, and 1 index.