Skip to content

Increase initial timeout and add retry loop in gae-interop-testing#12868

Open
kannanjgithub wants to merge 1 commit into
grpc:masterfrom
kannanjgithub:gae_interop_test_startup_time_bump
Open

Increase initial timeout and add retry loop in gae-interop-testing#12868
kannanjgithub wants to merge 1 commit into
grpc:masterfrom
kannanjgithub:gae_interop_test_startup_time_bump

Conversation

@kannanjgithub

@kannanjgithub kannanjgithub commented Jun 17, 2026

Copy link
Copy Markdown
Contributor
  1. Allow the newly deployed App Engine instance up to 2 minutes to complete its JVM initialization and handle the cold start, to avoid the SocketTimeoutException seen in GAE interop testing action (to fix failure logs).
  2. Wait and Retry Loop: Added a loop that attempts the check up to 5 times with a 2-second sleep between attempts. This will gracefully absorb any transient warmup or App Engine routing latency (to fix failure logs even with initial 2 min delay).
  3. Detailed Failure Logs: If all attempts fail, we now retrieve the response bodies (result.body().string()) and include them along with the status codes in the final error message, allowing for straightforward debugging of any persistent downstream issues. (failure logs even after retry failures)

The 3rd logs above show grpc-test.sandbox.googleapis.com must be experiencing a backend outage.

@kannanjgithub kannanjgithub force-pushed the gae_interop_test_startup_time_bump branch from 6bd7810 to 26aab7f Compare June 17, 2026 13:28
@kannanjgithub kannanjgithub changed the title Increase timeout in gae-interop-testing Increase initial timeout and add retry loop in gae-interop-testing Jun 17, 2026
@kannanjgithub kannanjgithub force-pushed the gae_interop_test_startup_time_bump branch from 26aab7f to fbeb6e1 Compare June 17, 2026 13:32
…omplete its JVM initialization and handle the cold start, to avoid the SocketTimeoutException seen in GAE interop testing action http://shortn/_WEppvnsGXs.

   2. Wait and Retry Loop: Added a loop that attempts the check up to 5 times with a 2-second sleep between attempts. This will gracefully absorb any transient warmup or App Engine routing latency (http://shortn/_ONTi0mqEdB)
   3. Detailed Failure Logs: If all attempts fail, we now retrieve the response bodies (result.body().string()) and include them along with the status codes in the final error message, allowing for straightforward debugging of any persistent downstream issues.
@kannanjgithub kannanjgithub force-pushed the gae_interop_test_startup_time_bump branch from fbeb6e1 to a506170 Compare June 17, 2026 13:37
def result2 = null
String result1Body = ""
String result2Body = ""
for (int attempt = 0; attempt < maxChannelReuseRetries; attempt++) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since OkHttp relies on the response body being closed to recycle connections, failing to close the bodies inside the loop on a non-200 response (or when an exception is thrown) will leak connections. If result1 succeeds but result2 throws an exception, the reference to result1 is overwritten in the next iteration and its connection is leaked.

Consider adding a finally block or explicitly closing the bodies before retrying ??

}
if (result1 == null || result2 == null || result1.code() != 200 || result2.code() != 200) {
if (result1 != null) {
result1Body = result1.body().string()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.body().string() can throw an IOException (e.g., if the body is too large or the stream breaks). If this happens, the script will throw that IOException instead of your detailed GradleException, masking the actual HTTP codes you're trying to surface.
You might want to wrap the body parsing in a safe try-catch block

Comment on lines +143 to +145
} catch (Throwable t) {
logger.log(LogLevel.WARN, "Channel reuse attempt ${attempt + 1} caught exception: ${t.message}. Retrying...", t)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching Throwable might be a bit too broad as it catches things like OutOfMemoryError or StackOverflowError which usually shouldn't be retried. Consider changing catch (Throwable t) to catch (Exception e) ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants