Skip to content

Creating Index painfully slow on cluster with large indices  #18776

@puuzll

Description

@puuzll

I have a cluster with 2 nodes and approximate 10000 index.Each index has one replication. Creating index with some preset mapping on this cluster is painfully slow( about 1 minute to create a index with 1 replication). There is sufficient memory , java heap , cpu and disk when creating index. I use hot_threads api and find that 95% period of time is spended on running the following code on master node:

    at com.google.common.collect.Iterators$3.hasNext(Iterators.java:164)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyDeletedShards(IndicesClusterStateService.java:256)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:167)
    - locked <0x00000000f96ea8b0> (a java.lang.Object)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

is this a bug? Can I avoid this by setting any configuration?
Elasticsearch version:
2.3.3
JVM version:
1.8
OS version:
debian7

Activity

jasontedor

jasontedor commented on Jun 8, 2016

@jasontedor
Member

is this a bug?

When you create an index, that causes a change to the routing table. A cluster state update task is submitted to the nodes in the cluster. When that cluster state update task arrives, each node must process the new routing table to see if they need to remove indices, delete shards, start shards, etc. Currently, applying deleted shards is O(number of indices * number of shards). I opened #18788 to address this.

However, you're still going to be hurting here. Having 10000 indices on two nodes with one replica is asking for pain. This means that you have at a minimum 10000 shards on each node if you have one shard per index, and maybe 50000 shards on each node if you're using the default number of shards per index. Either way, this is way too many shards. So #18788 is not meant to address your issue directly, just improve performance for the general case. You'll still need to do something about how many indices and shards that you have.

Can I avoid this by setting any configuration?

No.

jilen

jilen commented on Oct 8, 2016

@jilen

@jasontedor I suffered from this. Is there any way to improve the index creation speed ?

jasontedor

jasontedor commented on Oct 10, 2016

@jasontedor
Member

@jilen Creating an index requires a cluster state update which can be a slow thing indeed. The issue here was about the degradation in index-creation speed as the number of indices increased; that's what #18788 addressed. I'd say that if you rely on index creation being fast, you probably have an architecture that needs to be reconsidered.

bleskes

bleskes commented on Oct 10, 2016

@bleskes
Contributor

@jilen to quantify what @jasontedor said (which is very true) - index creation is slow when compared to data level operations like indexing and search. You should expect it to run within a couple of seconds. Also note that we now wait (since 5.0) for the primaries to be fully allocated before responding to the call.

jilen

jilen commented on Oct 11, 2016

@jilen

@jasontedor @bleskes I am now applying one-index-per-user pattern, there are actually more than 20k shards.

Parallel automatically index creationg(via bulk or update api) actually makes the cluster dead(no response).

What do you suggest for my situation ? Disable automatically index creation ?

nik9000

nik9000 commented on Oct 11, 2016

@nik9000
Member

Don't have an index per user.

On Oct 10, 2016 10:22 PM, "jilen" notifications@github.com wrote:

@jasontedor https://github.com/jasontedor I am now applying one-index-per-user
pattern, there are actually more than 20k shards.

Parallel automatically index creationg(via bulk or update api) actually
makes the cluster dead(no response).

What do you suggest for my situation ? Disable automatically index
creation ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#18776 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANLovR2fcqDQQkHcEd0qSWIhqX2DI-9ks5qyvLNgaJpZM4IwrBH
.

NelsonBurton

NelsonBurton commented on Jul 29, 2021

@NelsonBurton

An alternative to index per user, is to put all customers in one index, and use your customer's identifier as Elasticsearch's routingId, https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html . Works well in our system with millions of users, and 1 index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nik9000@jilen@bleskes@NelsonBurton@jasontedor

        Issue actions

          Creating Index painfully slow on cluster with large indices · Issue #18776 · elastic/elasticsearch