Skip to content

Define bounded lists#640

Open
Felix-El wants to merge 4 commits into
WebAssembly:mainfrom
Felix-El:main
Open

Define bounded lists#640
Felix-El wants to merge 4 commits into
WebAssembly:mainfrom
Felix-El:main

Conversation

@Felix-El

Copy link
Copy Markdown

Add bounded lists (list<T, ..N>)

Closes #385.

@cpetig cpetig left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly indentation doubts

Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated

@lukewagner lukewagner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking generally good, a few comments:

Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
Comment thread design/mvp/canonical-abi/definitions.py Outdated
def flatten_list(elem_type, maybe_length, maybe_variable, opts):
if maybe_length is not None:
if maybe_variable:
return flatten_type(varint_type(maybe_length), opts) + flatten_type(elem_type, opts) * maybe_length

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, but a question: for flattening bounded- or fixed-length lists: should we have some low (e.g., 4) per-value maximum flattening length, beyond which the list gets passed via pointer? I'm mostly thinking of the case where there is some list that has a large bound, and it gets used as a function parameter, and it "blows" the MAX_FLAT budget, causing all the parameters to go into the heap when instead you probably wanted the bounded-/fixed-length list to go in the heap.

@Felix-El Felix-El Apr 22, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective (one who wants to reduce/avoid allocations) this would make the data type harder to reason about. Fixed and bounded lists should be guaranteed to be "static".
TBH I'd rather say that some datatypes (like these special lists) should enforce* the use of a param or return area for passing - because lists are indexed dynamically unlike tuples (-> indirect addressing works only on memory locations and compilers would otherwise have to compensate for such lists split across registers - pretty terrible).

(enforce: such lists should definitely be placed into param/return area but other types, if possible could continue to be passed by args... but yeah, this obviously complicates the flattening logic)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so then could we say that fixed- and bounded-length lists never get flattened, and are always passed as pointer (and maybe length if fixed)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is what I mean (except: length if bounded, that could still be in a register).
I will explore how that logic extension could look (in Python) in the next commit.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a proposal for hybrid lifting/lowering where register and memory passing can coexist.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukewagner,

  • when lowering: force all params/results to fall back to using the in-memory param/result region (perhaps by adding a contains_inline_list predicate styled on contains_borrow that is tested first in flatten_functype)

This part clearly sounds like "let's just enforce param/return area use if any fixed/bounded list exists outside of unbounded lists".

However, kindly elaborate on

  • when lifting: work just like regular lists (flatten to i32 pointer and, if bounded, i32 length)

which I just can't follow. Are you suggesting that the host would get this inlined from the caller (import side) but then passes as flat (ptr, len) to the callee (export side)? And the reverse for the result path?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpetig Several parts of that match what I'd expect, but a few questions:

The caller of the function (guest or host side) allocates b: [u32; 66] (32+34)*4 bytes somewhere

Just to clarify: while the caller could allocate the param and result buffers in a single contiguous allocation, because the core function signature of the import is (addr, len, integer, retarea), the caller could also allocate disjoint regions for addr and retarea, yes? I.e., there's not a requirement of contiguity.

The function implementation can handle the list argument like a normal list, but must not free the memory.

(I believe we're talking about the core function signature of guest exports here.) So a normal list will call cabi_realloc for the list elements of each list parameter, but I'm guessing that's not what you mean here (b/c of the "must not free memory" clause). Now what I suggested in the second bullet of my last comment was that all parameters go into the heap as a single contiguous tuple (just like when > MAX_FLAT_PARAMS), and so there is a single cabi_realloc call (containing r with the bounded list inline). But it sounds like you mean something different; given the current eager ABI, I think a cabi_realloc has to happen somewhere for bounded-/fixed-length list parameters of core wasm exports, so what are you thinking?

Because multi-value is not implemented yet, the result can't be flattened and thus is stored in the return value pointer provided by the caller (b[32] = length, b[33..64]= data, b[65]=integer).

If we're still taking about the core function signature of a guest export, I don't know where this "return value pointer provided by the caller" comes from... is it allocated at the end of the cabi_realloc (for params) mentioned above? That seems like kindof a bigger departure from the rest of the current ABI. The way it works today if you return a list (which never fits in MAX_FLAT_RESULTS=1) is you have to return an i32 pointer to a record {ptr: u32, len:u32} in memory. Instead, for bounded-/fixed-length lists I would imagine that we'd say that the return value would be an i32 pointer to r in memory (which contains the bounded list and len inline, avoiding an indirection).

@Felix-El Sorry, what I wrote about "lifting" was imprecise and only considered the "guest lifting params to import call" case (which matches what @cpetig wrote above, iiuc).

@cpetig cpetig Jul 2, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukewagner You caught me misremembering details about the argument/result representation (I keep pruning the number of calling convention variants in my mind over time, likely because I added another one)

Guest calling guest imported function of above type:
Caller reserves 34 u32 somewhere (return area, e.g. on linear stack), passes (addr, len, integer, retarea) calling into the host. Host stores result in retarea and returns.

Host calling guest exported function of above type:
Host finds len u32 in linear memory from somewhere (e.g. a previous remembered cabi_realloc) places the active members of the argument in there and calls the guest with (addr, len, integer). Guest returns 34 u32 in arbitrary location (e.g. static return area), host reads from there and then calls cabi_post_X (preferably before calling again into the function).

The host can reuse the argument storage buffer many times.

If the number of flat elements grow beyond 16 guest calling host will simplify into void f(R const*input, R*retarea) and host calling guest into R const* f(R const*input). (all R point to u32[34])

Lazy lifting will likely replace the pointers with a handle.

PS: Yes, I meant that the two buffers wouldn't need to be contiguous, but this is moot as I was mixing up import and export allocation schemes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPS: I tend to propose to change the representation of flattened fixed length lists to align with this new convention (passing pointer) - at the expense of becoming different from tuples (which are deeply flattened)

@cpetig cpetig Jul 2, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3S: This should be identical to what currently happens, but with the new proposal in flattened form these arrays would pass a pointer to some arbitrary (!) place where a tuple would pass the values themselves.

Comment thread design/mvp/Binary.md
`none` case of an optional immediate.)
* 🔧 for fixed-sized lists the length of the list must be larger than 0 to pass
validation.
* 🔧 for fixed-sized lists (`0x67`) the length of the list must be larger than

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing, but it looks like the grammar already covers this twice: once by using <u32> (unsigned) and once with the (if maxlen > 0). We could also remove the (if maxlen > 0). But should we specify a maximum for maxlen?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For u32, zero is valid and the maxlen > 0 checks forbids zero. What would be a legitimate upper bound... i32::MAX?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already (just recently added) a MAX_LIST_BYTE_LENGTH = 228-1. That's just bytes, but even still, it seems like a reasonable upper bound.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, defined that bound in BINARY.md but where would we check this in definitions.py ?

Felix-El and others added 4 commits June 12, 2026 12:45
Cherry-picked the essence of cpetig's commit d2874eb
from https://github.com/cpetig/component-model/tree/bounded-lists, adapted to
the current codebase (ptr_type/opts threading, updated class names).

Bounded strings are intentionally excluded.

Co-authored-by: Christof Petig <christof.petig@arcor.de>
- Add trap_if(actual_len > maybe_length) to lift_flat_list, mirroring
  the existing trap in load_list's heap path
- Add over-length trap tests for both flat and heap lifting
- Add alignment test for bounded list of U32 (verifies 3-byte padding
  after U8 length prefix)
- fix memory bounds checking
- improve integration with existing list load/store code
- fix indentation
- avoid default argument
- more readable load/store recipe
@cpetig

cpetig commented Jul 2, 2026

Copy link
Copy Markdown

Associated with Implement WIT Fixed-Length-Lists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bounded lists and strings

4 participants