Skip to content

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
Ryan19929 opened this issue May 16, 2025 · 0 comments
Open
3 tasks done

[Bug] Unexpected replica id and be id Mapping in Specific Scenarios #50963

Ryan19929 opened this issue May 16, 2025 · 0 comments

Comments

@Ryan19929
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.1.8

What's Wrong?

During a full-sync with CCR, most of the traffic from the source cluster is concentrated on a single backend node.

I analyzed the relevant source code and found that the causes are as follows:
In the Doris system, the Round Robin algorithm is used to distribute replicas across backend nodes. However, under certain conditions, this algorithm may lead to unexpected mappings between Replica IDs and Backend IDs.

Example:
Assume a partition have 3 replicas and 3 backend nodes (BE1, BE2, BE3). The mapping is as follows:

TabletId Replica ID Backend ID
22907575 22907576 BE1
22907575 22907577 BE2
22907575 22907578 BE3
22907579 22907580 BE1
22907579 22907581 BE2
22907579 22907582 BE3
22907583 22907584 BE1
22907583 22907585 BE2
22907583 22907586 BE3

There is a phenomenon where the smallest Replica ID is always on the same backend node.
When Doris performs a backup, there is a chooseReplica function that always try selects the replica with the smallest ID. Consequently, during restoration, the destination cluster consistently sends download requests to a single backend node.

/*
 * Choose a replica order by replica id.
 * This is to expect to choose the same replica at each backup job.
 */
private Replica chooseReplica(Tablet tablet, long visibleVersion) {
    List<Long> replicaIds = Lists.newArrayList();
    for (Replica replica : tablet.getReplicas()) {
        replicaIds.add(replica.getId());
    }

    Collections.sort(replicaIds);
    for (Long replicaId : replicaIds) {
        Replica replica = tablet.getReplicaById(replicaId);
        if (replica.getLastFailedVersion() < 0 && replica.getVersion() >= visibleVersion) {
            return replica;
        }
    }
    return null;
}

What You Expected?

To enhance the randomness of the mapping between Replica IDs and Backend IDs.
I analyzed the createTablets code and discovered that when the number of replicas is equal to the number of backend nodes, the realIndex remains constant.
To address this issue, would it be possible to shuffle the results to enhance the randomness of the mapping? This might help break the predictability.

// SystemInfoService.java
if (policy.enableRoundRobin) {
    if (!policy.allowOnSameHost && hasSameHost) {
        // not allow same host and has same host,
        // then we compare them with their host
        Collections.sort(candidates, new BeHostComparator());
    } else {
        Collections.sort(candidates, new BeIdComparator());
    }

    if (policy.nextRoundRobinIndex < 0) {
        policy.nextRoundRobinIndex = new SecureRandom().nextInt(candidates.size());
    }

    int realIndex = policy.nextRoundRobinIndex % candidates.size();
    List<Long> partialOrderList = new ArrayList<Long>();
    partialOrderList.addAll(candidates.subList(realIndex, candidates.size())
            .stream().map(Backend::getId).collect(Collectors.toList()));
    partialOrderList.addAll(candidates.subList(0, realIndex)
            .stream().map(Backend::getId).collect(Collectors.toList()));

    List<Long> result = number == -1 ? partialOrderList : partialOrderList.subList(0, number);
    policy.nextRoundRobinIndex = realIndex + result.size();

    // Shuffle the result to increase randomness
    // Collections.shuffle(result);
    return result;
}

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant