Skip to content

Conversation

@jbertram
Copy link
Contributor

No description provided.

@gemmellr
Copy link
Member

gemmellr commented Jan 30, 2026

Garys idea seems reasonable, in some respects it is a bit weird to warn about something not happening yet, when being explicitly instructed not to wait for it to happen. Could possibly drop it to info level if feeling strongly that it should still be noted, though its not clear why this bit would be noted when it seems the prior/existing usage of that config wasnt causing any warnings.

That said, I also dont actually know that we really want this set to 0 all the time in the test suite? Until yesterday the related bit previously waited for as long as needed during the entire test suite, and now because of reusing an existing configuration value it will never wait at all. Feels a bit weird.

@jbertram
Copy link
Contributor Author

Garys idea seems reasonable...

Agreed. I implemented his suggestion.

...I also dont actually know that we really want this set to 0 all the time in the test suite?

I don't know if we do or not. There's not much detail on https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.

Until yesterday the related bit previously waited for as long as needed during the entire test suite...

The problem, as outlined on the Jira, is that the call to awaitUninterruptibly() can apparently hang forever so a timeout is needed for these calls. Rather than create and document a new parameter I simply re-used the existing, but undocumented, shutdownTimeout parameter.

I certainly could create a new parameter specifically for closing the Netty ChannelGroup instances. I could name it something like channelGroupShutdownTimeout, but then that would introduce a naming asymmetry with shutdownTimeout which is specifically aimed at the Netty EventLoopGroup instance. Since shutdownTimeout was undocumented I could potentially just rename it to eventLoopGroupShutdownTimeout and then document both new parameters, hoping that nobody was actually using shutdownTimeout, or I could deprecate shutdownTimeout and let it live alongside the new parameter. I'd probably need to do the same with quietPeriod as well.

Ultimately we just need a timeout here so these calls can't hang indefinitely. Using shutdownTimeout seems the simplest path forward to me.

@gemmellr
Copy link
Member

gemmellr commented Feb 2, 2026

...I also dont actually know that we really want this set to 0 all the time in the test suite?

I don't know if we do or not. There's not much detail on https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.

I meant the specific change from this PR rather than the change in ARTEMIS-2428 ~ 7 years ago. FWIW the PR that added that change shows it to be a related follow-up along with changes around ARTEMIS-2408, and suggests that timeout=0 change was made after the initial 2428 changes had first caused the test suite to hang on one specific test, and then once its changes were updated and committed again later caused the test suite to take considerably longer, so it was set to 0 to never wait to get back to previous run times.

Until yesterday the related bit previously waited for as long as needed during the entire test suite...

The problem, as outlined on the Jira, is that the call to awaitUninterruptibly() can apparently hang forever so a timeout is needed for these calls. Rather than create and document a new parameter I simply re-used the existing, but undocumented, shutdownTimeout parameter.

Yep I realise. Though from the PR itsnt clear you knew that change also meant it would then never wait for channel [group] shutdown at all during most of the tests given it wasnt mentioned.

I certainly could create a new parameter specifically for closing the Netty ChannelGroup instances. I could name it something like channelGroupShutdownTimeout, but then that would introduce a naming asymmetry with shutdownTimeout which is specifically aimed at the Netty EventLoopGroup instance. Since shutdownTimeout was undocumented I could potentially just rename it to eventLoopGroupShutdownTimeout and then document both new parameters, hoping that nobody was actually using shutdownTimeout, or I could deprecate shutdownTimeout and let it live alongside the new parameter. I'd probably need to do the same with quietPeriod as well.

Ultimately we just need a timeout here so these calls can't hang indefinitely. Using shutdownTimeout seems the simplest path forward to me.

Its certainly simple, yes, though I do wonder on what the full implications may be of it not waiting during the tests.

*/
package org.apache.activemq.artemis.tests.integration.cluster.warnings;

import static org.junit.jupiter.api.Assertions.assertFalse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no changes on this file.. can you remove these changes here.. just to take it out of the diff? easier for eventual cherry-picks.

@jbertram jbertram force-pushed the ARTEMIS-5861 branch 2 times, most recently from 3d01381 to 75719bc Compare February 3, 2026 15:26
@clebertsuconic
Copy link
Contributor

a good thing about setting this to Zero, being inifinite.. is that if we ever have a deadlock, we would rather just freeze the testsuite instead of hiding the issue.

I prefer we keep the default as Zero, and that meaning infinite.

@jbertram jbertram force-pushed the ARTEMIS-5861 branch 2 times, most recently from 2ea0a3f to e960bf4 Compare February 3, 2026 16:01
@clebertsuconic
Copy link
Contributor

I understand what Robbie was saying now.. we should have the pom.xml putting a very large defaut... at least we would rather block an eventual deadlock as opposed to just ignore it.

@clebertsuconic
Copy link
Contributor

@jbertram can you add an explanation on the comments for the large timeout.. on why we are doing that?

Or if you don't like to add it there, add the explanation on the commit message.. so if someone ever wonder why this timeout is that large, they can read the comment on either the commit or on the code..

or do both :) ... just wanted to have this information somewhere in the commit and code

@jbertram
Copy link
Contributor Author

jbertram commented Feb 3, 2026

The full test-suite finished 100% green with no discernible increase in run time.

There's not really a good place to add a comment in the pom.xml, but I'll add something to the commit message and push soon.

This commit also increases the DEFAULT_SHUTDOWN_TIMEOUT used on the
test-suite from 0 to 7200000 (i.e. 2 hours) in order to ensure the
timeout code is exercised and to expose any issues with closing the
Netty ChannelGroups or EventLoopGroup.
@clebertsuconic clebertsuconic merged commit 175d32d into apache:main Feb 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants