Skip to content

add configurable default routing group, add indicative errors, use ex… #687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

shanigeva
Copy link

Improve external routing with error propagation and configurable defaults

Description

This PR is to solve request of Git Issue #658
This PR enhances Trino Gateway's external routing capabilities with three key improvements:

  1. Error Propagation: Ensures errors from the external routing API are properly passed to clients rather than being silently ignored, addressing issue Errors from External Routing Service Are Not Propagated to Clients #658.

  2. Configurable Default Routing: Adds the ability to configure a default routing group when the external router fails or is unavailable, improving system resilience.

  3. Meaningful Error Messages: Enhances error reporting with detailed, structured messages that provide clear information about routing failures.

These changes improve both the reliability and usability of Trino Gateway when using external routing services.

Additional context and related issues

The main changes include:

  • Modified the ExternalRoutingGroupProvider to capture and propagate errors from the external API
  • Added configuration options for specifying a default routing group to use as fallback
  • Improved error message formatting to include specific details about the failure cause
  • Added appropriate HTTP status code mapping based on error conditions
  • Included request tracing information for better debugging of routing failures

Related issues:

These enhancements make external routing more robust while giving administrators better visibility into routing problems, supporting more advanced multi-cluster deployment scenarios.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

* Improve external routing with three key enhancements:
  * Fix issue where errors from external routing services were not propagated to clients (#658)
  * Add support for configurable default routing groups when external routing fails
  * Enhance error messages with detailed information about routing failures

This revised PR description:

1. **Highlights all three improvements** you mentioned in a clear, organized way
2. **Expands the technical details** to cover the implementation of all three features
3. **Updates the release notes** to succinctly mention all three improvements
4. **Maintains the reference** to issue #658 for the error propagation component

The description now provides a complete overview of the changes while helping reviewers understand how each component fits together to improve the external routing experience in Trino Gateway.

Copy link

cla-bot bot commented May 21, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@Chaho12
Copy link
Member

Chaho12 commented May 21, 2025

Please resolve conflicts first :)

@@ -100,6 +104,10 @@ public String provideBackendForRoutingGroup(String routingGroup, String user)
{
List<ProxyBackendConfiguration> backends =
gatewayBackendManager.getActiveBackends(routingGroup);
// Check if any backends exist for the routing group (even before filtering unhealthy ones)
if (backends.isEmpty()) {
throw new NotFoundException("Routing group does not exist: " + routingGroup);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message here does not reflect what this is doing. I think something like "Cannot find any backends for routing group: " + routingGroup is better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch from ce3c49b to 3def10a Compare June 5, 2025 20:15
Copy link

cla-bot bot commented Jun 5, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

.entity(e.getMessage())
.build());
}
catch (WebApplicationException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this 'catch' needed? you just re-throw the exact error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resloved

@@ -67,6 +71,7 @@ public RoutingTargetHandler(
{
this.routingManager = requireNonNull(routingManager);
this.routingGroupSelector = requireNonNull(routingGroupSelector);
this.defaultRoutingGroup = haGatewayConfiguration.getRouting().getDefaultRoutingGroup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: rename getRouting() to getRoutingConfiguration()

Copy link
Contributor

@EdenKik EdenKik Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this seems to be the convention of getters & setters for configuration objects within RoutingTargetHandler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalya would love your response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok stay with getRouting

}
else {
routingGroup = defaultRoutingGroup;
clusterHost = routingManager.provideDefaultCluster(user);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use provideClusterForRoutingGroup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to use provideClusterForRoutingGroup, fixed fallback to default cluster within provideClusterForRoutingGroup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resloved

@@ -119,6 +129,10 @@ else if (response.errors() != null && !response.errors().isEmpty()) {
}
return new RoutingSelectorResponse(response.routingGroup(), filteredHeaders);
}
catch (WebApplicationException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above comment, is this catch really needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch throws the required errors from the external rest service as is (no response model building), any other errors are just logged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resloved

@RoeyoOgen
Copy link
Contributor

CLA has been signed and sent by mail, awaits approval

Copy link

cla-bot bot commented Jun 6, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
Copy link

cla-bot bot commented Jun 9, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@EdenKik
Copy link
Contributor

EdenKik commented Jun 9, 2025

Hey,
Updated docs & tests
PR is ready to be reviewed @Chaho12 @oneonestar @vishalya :)

Copy link

cla-bot bot commented Jun 10, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@Chaho12
Copy link
Member

Chaho12 commented Jun 11, 2025

@shanigeva Have you sent CLA doc?

@RoeyoOgen
Copy link
Contributor

@shanigeva Have you sent CLA doc?

@Chaho12 , yeah CLA has been signed and sent by mail, awaits approval, maybe @mosabua can help with this

@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch from ef4ecaa to 7d175a6 Compare June 17, 2025 05:34
@cla-bot cla-bot bot added the cla-signed label Jun 17, 2025
@EdenKik
Copy link
Contributor

EdenKik commented Jun 17, 2025

Rebased & squashed commits

import io.trino.gateway.ha.router.QueryCountBasedRouter;
import io.trino.gateway.ha.router.RoutingManager;

public class QueryCountBasedRouterProvider
extends RouterBaseModule
{
private final HaGatewayConfiguration configuration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put mutable objects in fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

import io.trino.gateway.ha.router.RoutingManager;
import io.trino.gateway.ha.router.StochasticRoutingManager;

public class StochasticRoutingManagerProvider
extends RouterBaseModule
{
private final HaGatewayConfiguration configuration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put mutable objects in fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -67,6 +70,7 @@ public class ExternalRoutingGroupSelector
.add("Content-Length")
.addAll(rulesExternalConfiguration.getExcludeHeaders())
.build();
this.propagateErrors = rulesExternalConfiguration.isPropagateErrors();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this. is redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
dao = requireNonNull(jdbi, "jdbi is null").onDemand(GatewayBackendDao.class);
this.defaultRoutingGroup = requireNonNull(routingConfiguration, "routingConfiguration is null").getDefaultRoutingGroup();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use requireNonNull for config classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

public StochasticRoutingManager(
GatewayBackendManager gatewayBackendManager, QueryHistoryManager queryHistoryManager)
{
super(gatewayBackendManager, queryHistoryManager);
this(gatewayBackendManager, queryHistoryManager, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should pass null for RoutingConfiguration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this constructor, unused

Comment on lines 253 to 236
ExternalRouterResponse mockResponse = new ExternalRouterResponse(
"test-group", null, ImmutableMap.of());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwrap. Same for other test methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wym?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning - you can have the code on a single line, the line is not long enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import static io.trino.gateway.ha.TestingJdbcConnectionManager.createTestingJdbcConnectionManager;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

@TestInstance(Lifecycle.PER_CLASS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this annotation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required for using @BeforeAll annotation, instead of initializing properties before each test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use @BeforeAll. Please replace it with a constructor. See https://trino.io/docs/current/develop/tests.html#use-simple-resource-initialization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@RoeyoOgen
Copy link
Contributor

@mosabua LGTM please :)

@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch 4 times, most recently from f6e18b3 to 34d2ca0 Compare June 18, 2025 17:41
@EdenKik
Copy link
Contributor

EdenKik commented Jun 18, 2025

@vishalya
Any idea of why CI crashes? Seems like resources issue
'The forked VM terminated without properly saying goodbye. VM crash or System.exit called?'

Build passes locally

@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch from 34d2ca0 to c11934c Compare June 19, 2025 06:41
@EdenKik
Copy link
Contributor

EdenKik commented Jun 19, 2025

@vishalya
The issue was the configuration binding in some classes, derived from this comment #687 (comment)
Moved the configuration object to RouterBaseModule which makes more sense.

@RoeyoOgen
Copy link
Contributor

@vishalya LGTM please

@RoeyoOgen
Copy link
Contributor

@vishalya @Chaho12 @andythsu LGTM

Comment on lines 6 to 10
route requests. If this header is not specified, requests are sent to the default routing group.

You can configure a default routing group by setting the `defaultRoutingGroup` under the `routing` section.
This group will be used whenever routing information is unavailable or the external routing service fails.
If not configured, the fallback remains the built-in group `adhoc`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap at 80 characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimized.

logRewrite(routingTargetResponse.routingDestination().clusterHost(), request);
return routingTargetResponse;
}
catch (NotFoundException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which code throws NotFoundException in this try block?

/**
* @deprecated Use {@link #findActiveBackendByRoutingGroup(String)} with the configured default routing group
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove an unused method instead of marking as @Deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 116 to 117
RoutingGroupSelector.byRoutingExternal(httpClient, config.getRoutingRules().getRulesExternalConfiguration(),
config.getRequestAnalyzerConfig()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 312 to 313
RoutingGroupSelector.byRoutingExternal(httpClient, config.getRoutingRules().getRulesExternalConfiguration(),
config.getRequestAnalyzerConfig()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assertThat(routingSelectorResponse.routingGroup()).isNull();
assertThat(routingSelectorResponse.externalHeaders().get(headerKey)).isNull();
assertThat(routingSelectorResponse.externalHeaders().get(ROUTING_GROUP_HEADER)).isNull();
assertThat(routingSelectorResponse.externalHeaders().get(headerKey)).isEqualTo("should-be-null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        assertThat(routingSelectorResponse.externalHeaders())
                .doesNotContainKey(ROUTING_GROUP_HEADER)
                .containsEntry(headerKey, "should-be-null");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ebyhr
Copy link
Member

ebyhr commented Jun 24, 2025

Please rebase on main to resolve conflicts.

…ternal routing url to pass errors to client

Fix configurable default routing group, add indicative errors, use external routing url to pass errors to client

Fix CR comments

Updated docs with PR changes

Added PR changes testing

Added PR testing

Apply suggestions from code review

Co-authored-by: Jaeho Yoo <[email protected]>

Fixed review comments

Fixed configuration binding
@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch from c11934c to 99d5d71 Compare June 25, 2025 07:27
@EdenKik EdenKik force-pushed the issue-658-external-routing-errors branch from 99d5d71 to e072b5c Compare June 25, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants