[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

Ryan19929 · 2025-05-16T02:06:22Z

Search before asking

I had searched in the issues and found no similar issues.

Version

2.1.8

What's Wrong?

During a full-sync with CCR, most of the traffic from the source cluster is concentrated on a single backend node.

I analyzed the relevant source code and found that the causes are as follows:
In the Doris system, the Round Robin algorithm is used to distribute replicas across backend nodes. However, under certain conditions, this algorithm may lead to unexpected mappings between Replica IDs and Backend IDs.

Example:
Assume a partition have 3 replicas and 3 backend nodes (BE1, BE2, BE3). The mapping is as follows:

TabletId	Replica ID	Backend ID
22907575	22907576	BE1
22907575	22907577	BE2
22907575	22907578	BE3
22907579	22907580	BE1
22907579	22907581	BE2
22907579	22907582	BE3
22907583	22907584	BE1
22907583	22907585	BE2
22907583	22907586	BE3

There is a phenomenon where the smallest Replica ID is always on the same backend node.
When Doris performs a backup, there is a chooseReplica function that always try selects the replica with the smallest ID. Consequently, during restoration, the destination cluster consistently sends download requests to a single backend node.

/*
 * Choose a replica order by replica id.
 * This is to expect to choose the same replica at each backup job.
 */
private Replica chooseReplica(Tablet tablet, long visibleVersion) {
    List<Long> replicaIds = Lists.newArrayList();
    for (Replica replica : tablet.getReplicas()) {
        replicaIds.add(replica.getId());
    }

    Collections.sort(replicaIds);
    for (Long replicaId : replicaIds) {
        Replica replica = tablet.getReplicaById(replicaId);
        if (replica.getLastFailedVersion() < 0 && replica.getVersion() >= visibleVersion) {
            return replica;
        }
    }
    return null;
}

What You Expected?

To enhance the randomness of the mapping between Replica IDs and Backend IDs.
I analyzed the createTablets code and discovered that when the number of replicas is equal to the number of backend nodes, the realIndex remains constant.
To address this issue, would it be possible to shuffle the results to enhance the randomness of the mapping? This might help break the predictability.

// SystemInfoService.java
if (policy.enableRoundRobin) {
    if (!policy.allowOnSameHost && hasSameHost) {
        // not allow same host and has same host,
        // then we compare them with their host
        Collections.sort(candidates, new BeHostComparator());
    } else {
        Collections.sort(candidates, new BeIdComparator());
    }

    if (policy.nextRoundRobinIndex < 0) {
        policy.nextRoundRobinIndex = new SecureRandom().nextInt(candidates.size());
    }

    int realIndex = policy.nextRoundRobinIndex % candidates.size();
    List<Long> partialOrderList = new ArrayList<Long>();
    partialOrderList.addAll(candidates.subList(realIndex, candidates.size())
            .stream().map(Backend::getId).collect(Collectors.toList()));
    partialOrderList.addAll(candidates.subList(0, realIndex)
            .stream().map(Backend::getId).collect(Collectors.toList()));

    List<Long> result = number == -1 ? partialOrderList : partialOrderList.subList(0, number);
    policy.nextRoundRobinIndex = realIndex + result.size();

    // Shuffle the result to increase randomness
    // Collections.shuffle(result);
    return result;
}

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

Ryan19929 commented May 16, 2025

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

Comments

Ryan19929 commented May 16, 2025

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct