Skip to content

Revert "Use private Thrift members (#6430)"#6455

Open
dlmarion wants to merge 1 commit into
apache:mainfrom
dlmarion:revert-thrift-private-members
Open

Revert "Use private Thrift members (#6430)"#6455
dlmarion wants to merge 1 commit into
apache:mainfrom
dlmarion:revert-thrift-private-members

Conversation

@dlmarion

Copy link
Copy Markdown
Contributor

GarbageCollectorIT.gcLotsOfCandidatesIT has consistently failed on a server since #6430 was merged. The failure happens when the garbage collector process is started with 32 MB of memory. This PR branch exists so that I can test this on that server.

This reverts commit bdcd798.

@dlmarion dlmarion self-assigned this Jun 30, 2026
@dlmarion

Copy link
Copy Markdown
Contributor Author

I think I found a real issue with #6430 and we might want to actually revert it. The original code would access a ByteBuffer directly. #6430 made those member variables private and the methods for accessing the ByteBuffer's make a copy before returning them.

@dlmarion

Copy link
Copy Markdown
Contributor Author

Here's a good example of the impact of this change.

ArrayList<Entry<Key,Value>> entries = new ArrayList<>(scanResult.getResults().size());
for (TKeyValue kv : scanResult.getResults()) {
entries.add(new SimpleImmutableEntry<>(new Key(kv.getKey()), new Value(kv.getValue())));
}

The current version of Key(TKey) looks like:

public Key(TKey tkey) {
this.row = ByteBufferUtil.toBytes(tkey.bufferForRow());
this.colFamily = ByteBufferUtil.toBytes(tkey.bufferForColFamily());
this.colQualifier = ByteBufferUtil.toBytes(tkey.bufferForColQualifier());
this.colVisibility = ByteBufferUtil.toBytes(tkey.bufferForColVisibility());
this.timestamp = tkey.getTimestamp();
this.deleted = false;
if (row == null) {
throw new IllegalArgumentException("null row");
}
if (colFamily == null) {
throw new IllegalArgumentException("null column family");
}
if (colQualifier == null) {
throw new IllegalArgumentException("null column qualifier");
}
if (colVisibility == null) {
throw new IllegalArgumentException("null column visibility");
}
}

The same code prior to #6430 looks like:

public Key(TKey tkey) {
this.row = toBytes(tkey.row);
this.colFamily = toBytes(tkey.colFamily);
this.colQualifier = toBytes(tkey.colQualifier);
this.colVisibility = toBytes(tkey.colVisibility);
this.timestamp = tkey.timestamp;
this.deleted = false;
if (row == null) {
throw new IllegalArgumentException("null row");
}
if (colFamily == null) {
throw new IllegalArgumentException("null column family");
}
if (colQualifier == null) {
throw new IllegalArgumentException("null column qualifier");
}
if (colVisibility == null) {
throw new IllegalArgumentException("null column visibility");
}

The calls to bufferForX create a copy of the ByteBuffer.

@dlmarion dlmarion marked this pull request as ready for review June 30, 2026 18:48
@dlmarion dlmarion requested review from DomGarguilo and ctubbsii June 30, 2026 18:49
@ctubbsii

Copy link
Copy Markdown
Member

I'm currently looking into alternatives. I think the private thrift members is a nice change, but I also recognize the problem with the extra protective copy. Please hold on merging this for now, while I investigate.

@ctubbsii

Copy link
Copy Markdown
Member

I created #6457 as an alternative to reverting all this. It will remove the unwanted protective copies that are likely causing the performance problem.

@ctubbsii ctubbsii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to resolve the issue with #6457 instead, if works.

@dlmarion

dlmarion commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

GarbageCollectorIT is now passing again with these changes. Will probably end up closing this in favor of #6457 though if that also fixes the issue.

@ctubbsii

ctubbsii commented Jul 1, 2026

Copy link
Copy Markdown
Member

GarbageCollectorIT is now passing again with these changes. Will probably end up closing this in favor of #6457 though if that also fixes the issue.

I saw it fail due to the same timeout on the first pass, and then pass on the second attempt. I think that test is flaky already, and a revert will not entirely fix the issue. It's possible the changes in #6430 contributed to a performance issue, but I do not think the performance of that particular test is a reliable enough metric to determine whether #6430 is a good change or not.

Even though I am reluctant to revert #6430, for fear of reversing progress, I do think there's more that probably needs to be done here. As it turns out, the getters that return a byte array is not even the code path that creates redundant protective copies. It's the versions of the methods that look like .bufferForValue() that are the ones that are doing unnecessary copies. The method that returns a byte array only makes a copy when it needs to truncate a byte buffer's backing array to size, and that is done at most once (and it's also probably important to do that, because the original backing array is a buffer in the TFramedTransport, and I'm not entirely convinced that it isn't reusable and unsafe as a backing array at this layer; I'd need to dig more into the deserialization code to understand that better).

So really, we should be avoiding the .bufferForValue() style methods completely. In the vast majority of cases, we actually want the byte array anyway, and were just doing our own array extraction from the ByteBuffer. But we don't even need to do that if we use the correct getter that returns an array. Previously, I was thinking that the problem might be that we need to use the getter that returns the ByteBuffer more... but I now understand that is the opposite of what we need to do.

I have abandoned #6457

@ctubbsii

ctubbsii commented Jul 1, 2026

Copy link
Copy Markdown
Member

My next attempt to address the issue is #6459

@dlmarion

dlmarion commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

As it turns out, the getters that return a byte array is not even the code path that creates redundant protective copies. It's the versions of the methods that look like .bufferForValue() that are the ones that are doing unnecessary copies.

I don't think that's correct. It appears to me that the code currently in main performs a copy for both calls to getX and bufferForX. Here's what I'm seeing using TKey as an example:

public byte[] getRow() {
setRow(org.apache.thrift.TBaseHelper.rightSize(row));
return row == null ? null : row.array();
}
public java.nio.ByteBuffer bufferForRow() {
return org.apache.thrift.TBaseHelper.copyBinary(row);
}

The call to bufferForRow calls TBaseHelper.copyBinary. The call to getRow calls TBaseHelper.rightSize, then calls setRow(below) which also calls TBaseHelper.copyBinary.

public TRowRange setStartRow(@org.apache.thrift.annotation.Nullable java.nio.ByteBuffer startRow) {
this.startRow = org.apache.thrift.TBaseHelper.copyBinary(startRow);
return this;
}

Your change in #6457 to use the unsafe_binaries Thrift option removed the copy from the bufferForX methods but not the getX methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants