You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Version
2.1.8
What's Wrong?
During a full-sync with CCR, most of the traffic from the source cluster is concentrated on a single backend node.
I analyzed the relevant source code and found that the causes are as follows:
In the Doris system, the Round Robin algorithm is used to distribute replicas across backend nodes. However, under certain conditions, this algorithm may lead to unexpected mappings between Replica IDs and Backend IDs.
Example:
Assume a partition have 3 replicas and 3 backend nodes (BE1, BE2, BE3). The mapping is as follows:
TabletId
Replica ID
Backend ID
22907575
22907576
BE1
22907575
22907577
BE2
22907575
22907578
BE3
22907579
22907580
BE1
22907579
22907581
BE2
22907579
22907582
BE3
22907583
22907584
BE1
22907583
22907585
BE2
22907583
22907586
BE3
There is a phenomenon where the smallest Replica ID is always on the same backend node.
When Doris performs a backup, there is a chooseReplica function that always try selects the replica with the smallest ID. Consequently, during restoration, the destination cluster consistently sends download requests to a single backend node.
/* * Choose a replica order by replica id. * This is to expect to choose the same replica at each backup job. */privateReplicachooseReplica(Tablettablet, longvisibleVersion) {
List<Long> replicaIds = Lists.newArrayList();
for (Replicareplica : tablet.getReplicas()) {
replicaIds.add(replica.getId());
}
Collections.sort(replicaIds);
for (LongreplicaId : replicaIds) {
Replicareplica = tablet.getReplicaById(replicaId);
if (replica.getLastFailedVersion() < 0 && replica.getVersion() >= visibleVersion) {
returnreplica;
}
}
returnnull;
}
What You Expected?
To enhance the randomness of the mapping between Replica IDs and Backend IDs.
I analyzed the createTablets code and discovered that when the number of replicas is equal to the number of backend nodes, the realIndex remains constant.
To address this issue, would it be possible to shuffle the results to enhance the randomness of the mapping? This might help break the predictability.
// SystemInfoService.javaif (policy.enableRoundRobin) {
if (!policy.allowOnSameHost && hasSameHost) {
// not allow same host and has same host,// then we compare them with their hostCollections.sort(candidates, newBeHostComparator());
} else {
Collections.sort(candidates, newBeIdComparator());
}
if (policy.nextRoundRobinIndex < 0) {
policy.nextRoundRobinIndex = newSecureRandom().nextInt(candidates.size());
}
intrealIndex = policy.nextRoundRobinIndex % candidates.size();
List<Long> partialOrderList = newArrayList<Long>();
partialOrderList.addAll(candidates.subList(realIndex, candidates.size())
.stream().map(Backend::getId).collect(Collectors.toList()));
partialOrderList.addAll(candidates.subList(0, realIndex)
.stream().map(Backend::getId).collect(Collectors.toList()));
List<Long> result = number == -1 ? partialOrderList : partialOrderList.subList(0, number);
policy.nextRoundRobinIndex = realIndex + result.size();
// Shuffle the result to increase randomness// Collections.shuffle(result);returnresult;
}
Search before asking
Version
2.1.8
What's Wrong?
During a full-sync with CCR, most of the traffic from the source cluster is concentrated on a single backend node.
I analyzed the relevant source code and found that the causes are as follows:
In the Doris system, the Round Robin algorithm is used to distribute replicas across backend nodes. However, under certain conditions, this algorithm may lead to unexpected mappings between Replica IDs and Backend IDs.
Example:
Assume a partition have 3 replicas and 3 backend nodes (BE1, BE2, BE3). The mapping is as follows:
There is a phenomenon where the smallest Replica ID is always on the same backend node.
When Doris performs a backup, there is a
chooseReplica
function that always try selects the replica with the smallest ID. Consequently, during restoration, the destination cluster consistently sends download requests to a single backend node.What You Expected?
To enhance the randomness of the mapping between Replica IDs and Backend IDs.
I analyzed the
createTablets
code and discovered that when the number of replicas is equal to the number of backend nodes, therealIndex
remains constant.To address this issue, would it be possible to shuffle the results to enhance the randomness of the mapping? This might help break the predictability.
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: