Skip to content

[security] Redact sensitive config values in startup logs#3486

Open
litiliu wants to merge 1 commit into
apache:mainfrom
litiliu:codex/3485-redact-sensitive-config-logs
Open

[security] Redact sensitive config values in startup logs#3486
litiliu wants to merge 1 commit into
apache:mainfrom
litiliu:codex/3485-redact-sensitive-config-logs

Conversation

@litiliu

@litiliu litiliu commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Redact sensitive configuration values when GlobalConfiguration logs startup properties.
  • Match sensitive keys case-insensitively using Flink-style substring matching.
  • Cover Fluss access-key variants such as fs.s3a.access.key, s3.access-key, and fs.oss.accessKeyId.

Fixes #3485

Test Plan

  • mvn -pl fluss-common -Dtest=ConfigurationTest,GlobalConfigurationTest test

@litiliu litiliu marked this pull request as ready for review June 15, 2026 07:16
@litiliu

litiliu commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@Prajwal-banakar PTAL

@Prajwal-banakar

Copy link
Copy Markdown
Contributor

HI @litiliu Thanks for the fix, this looks good to me and i've one small question, the implementation uses substring matching (contains) for all sensitive key parts. Is this intentionally aligned with Flink's behavior? I'm asking because patterns such as token and secret may also match non-sensitive configuration keys.

@litiliu

litiliu commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

HI @litiliu Thanks for the fix, this looks good to me and i've one small question, the implementation uses substring matching (contains) for all sensitive key parts. Is this intentionally aligned with Flink's behavior? I'm asking because patterns such as token and secret may also match non-sensitive configuration keys.

@Prajwal-banakar Yes, this is intentional and aligned with Flink's existing behavior. The implementation follows the same semantics as GlobalConfiguration.isSensitive(...), which lowercases the key and checks whether it contains any of the configured sensitive key parts.

This can lead to conservative masking for keys containing terms like token or secret, even if they are not actually sensitive. I think that is acceptable here because it is safer to over-mask than to accidentally expose credentials, and it keeps the behavior consistent with the rest of Flink's configuration/logging handling. If we want stricter matching in the future, it should probably be changed centrally in Flink's sensitive-key detection logic rather than only at this call site.

@litiliu

litiliu commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@luoyuxia please help review

@litiliu

litiliu commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@fresh-borzoni PTAL

@fresh-borzoni fresh-borzoni left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@litiliu Thank you for the PR, left some comments, PTAL

/** Utility class for {@link Configuration} related helper functions. */
public class ConfigurationUtils {

private static final String[] SENSITIVE_KEY_PARTS = {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also #3526 adds a second S3-cred list that already diverges, is it worth unifying?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree it is worth unifying the S3 credential detection, but I would prefer to handle that in #3526 or a focused follow-up.

The matching in this PR is intentionally generic and conservative for log redaction, following Flink-style sensitive-key substring matching. The #3526 logic is S3-specific and should probably use a more precise matcher for supported S3 config prefixes/suffixes, such as s3., s3a., fs.s3a. plus credential-bearing suffixes.

I will keep this PR scoped to the generic redaction path and avoid folding a filesystem config refactor into it. We can extract a shared S3 credential matcher separately so #3526 and the S3 filesystem code do not drift.
What do you think?

@litiliu litiliu force-pushed the codex/3485-redact-sensitive-config-logs branch from 20682dc to b45b839 Compare June 26, 2026 05:50
@litiliu

litiliu commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@fresh-borzoni Thanks for the careful review. I pushed updates for the actionable comments:

  • Added private.key / private-key to the sensitive-key matching and covered the GCS private key cases in tests.
  • Reused ConfigurationUtils.hideSensitiveValue(s) for dynamic reconfiguration logs, so datalake.* credential changes are redacted as well.
  • Added an exact non-sensitive allowlist for the token-renewal configs using existing ConfigOptions, normalized case-insensitively, with lowercase/uppercase test coverage.

For the S3 credential matcher unification with #3526, I’d prefer to keep that as a focused follow-up since that path is S3-specific while this PR keeps the generic log-redaction behavior.

@fresh-borzoni fresh-borzoni left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@litiliu Thank you for the changes, LGTM overall, one comment:
ZooKeeperClient.upsertEntityConfigs seems to also log secrets by the glance, shall we fix it as well, WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[security] Redact sensitive configuration values in startup logs

3 participants