*Thanks to Guillaume Ballet, Gajinder Singh and Dankrad Feist for their review and feedback.*

- Preamble
- Refresher on the motivation for Verkle Trees
- How would a stateless client work?
- Technical preliminaries
- Minimal cryptography concepts to understand
- Proof of presence and absence
- Case 1: Proof of presence for a single key
- Case 2: Proof of absence â€” empty
- Case 3: Proof of absence â€” other.
- Taking a step back and thinking about multiple key values
- Witness format definition
- StateDiff definition
- VerkleProof definition
- Witness generation and verification
- StateDiff field
- VerkleProof field
- Group A fields - Allow the verifier to rebuild a partial view of the tree
- Group B fields - Prove to the verifier the rebuilt tree is correct
- Conclusion
- Appendices
- Cryptography and implementation optimizations
- Witness completeness & soundness
- Point 2: Multiproof/IPA soundness
- Point 3: Non-injective mapping between Pointâ†’Fr
- Witness post-state values

The proposed Verkle Tree change in the Ethereum protocol includes a new concept of *block witness. *In this article, Iâ€™ll dive deep into it since I believe itâ€™s a surface area of this EIP that is still quite opaque to many people (i.e: even client teams working on this change).

As you might already realize, this is a long article â€” not because I believe this topic is rocket science but because Iâ€™ll try to dive deep and go slow so this can be useful for many people like client team developers implementing this change, auditors, security researchers, and testing teams getting a deep insight into the potential risks in implementations, etc. Of course, I hope this article is also helpful for protocol enthusiasts, dapp developers, or anyone really since Verkle proofs due to their small size can be used for many other use cases like: on-chain contracts state proofs, other protocols (e.g: Portal Network), etc.

Iâ€™ve (tried to) structure this article from high-level explanations to more detailed ones, so you can stop reading at any time if you donâ€™t want to dive into more details and hopefully leave with some coherent knowledge. Also, it should be fine if you prefer to skip some sections if you feel you already know that topic.

# Preamble

Apart from helping people understand how Verkle witnesses/proofs (i.e: Iâ€™ll clarify these terms soon) work, I hope this article can motivate some people to help us with Verkle efforts, like:

- Have more implementations of all this logic.
- Find important test vectors for clients to be included in testing efforts.
- Help with (differential) fuzzing libraries.
- Help with documentation and spec efforts.

Any help is appreciated!

## Refresher on the motivation for Verkle Trees

The goal of this article isnâ€™t to explain the usefulness of Verkle Trees since it has already been well explained in multiple videos and articles you can find on verkle.info.

As a TL;DR: The main goal of Verkle Trees is to allow having stateless clients. That means clients can join the network, donâ€™t sync the full state, and start validating blocks immediately. This means that as a chain validator, you donâ€™t need a 2TiB disk to validate the chain, nor trust external parties giving you partial views of Ethereum state.

For this to work as a stateless client, apart from receiving the next block, youâ€™d also receive an *execution witness*. A *execution witness* is a subset of the Ethereum state that allows you to execute a particular block. This *witness* also contains cryptographic proof, verifying that the (untrusted) provided data is correct.

Note you can already have stateless clients using the Merkle Patricia Tree (MPT). However, the problem is that proving this data is correct requires a proof which is too big to be sent over the network fast enough to be validated by all nodes in the 12s window of a slot. One of the main goals of Verkle Trees is to prove to these clients that the *witness* is correct with proof small enough to be sent over the network.

*Block witnesses* that allow stateless execution is just a single use-case for these proofs, but you should think about Verkle Proofs as general *state proofs*. You can prove any arbitrary set of key values from Ethereum state. This can be useful in a wide range of use cases. For example, to prove the balance of arbitrary addresses at some block height in the chain can be useful for bridges or escape hatches. Considering that these proofs are small, you could send them on-chain by leveraging a *proposed* precompile to allow verifying these proofs in a future-proof way. Moreover, these smaller proofs can be useful for other protocols, such as the Portal Network, which might need to prove the Ethereum state between nodes.

As a quick (related) tangent, PĂ©ter SzilĂˇgyi proposed an interesting idea to improve EL client diversity. If you read that article, you can confirm how stateless execution is quite independent of Verkle Trees. In his proposal, the witness size problem is mostly irrelevant *only* if transmitted between processes in the same machine. Verkle Trees are needed to share the witness over the network where the size **is** relevant. Whenever Verkle Trees are deployed to mainnet, you can switch from the MPT witness to the VKT witness â€” so this proposal would become even more practical with Verkle Trees.

## How would a stateless client work?

I think itâ€™s helpful to dive into more details on how a stateless client would work â€” since it is the primary motivation of the EIP and main target use-case for *block witnesses*.

Letâ€™s say that youâ€™re a client that has just been started, you donâ€™t know anything about the current state of the network, and you want to start validating blocks. After you do some warmup of network bootstrapping and connecting with other peers, youâ€™re ready to receive the next block.

At some point, your CL client will send you the next block (i.e: the first block youâ€™ll validate). If you tried to execute the block at this point, without the state, the execution would fail. At some point in the block execution, youâ€™ll need to read the current state of some account data or storage slot, and you donâ€™t have any data.

For example, if you need to execute a transaction where I send 100 ETH to your address, your node will try checking if Iâ€™ve, at a minimum, 100 ETH. But since you donâ€™t have any state, you donâ€™t know if thatâ€™s true. (Note: Iâ€™m simplifying things here since your block execution will probably fail before even trying to execute the first transaction for other reasons â€” but we can ignore that).

Now, it should be clear that we need something extra apart from the block â€” **some** data. You donâ€™t need *all* the data; you need enough to execute this block. If this block only contains the transaction Iâ€™ve mentioned before, then thereâ€™s no reason to know how much balance is in Vitalikâ€™s account since Iâ€™m not sending ETH to him (unless youâ€™re Vitalik â€” in that case, my example is broken :).

Letâ€™s say that apart from this first block we received, we also get all this needed data. Technically, we can execute the block without failing, which is good. But thereâ€™s an important catch: you must ensure this data is *correct*. By *correct,* I mean itâ€™s the true data from the Ethereum state and not some random data.

The transaction should fail if I donâ€™t have 100 ETH but 90 ETH. The *witness* you received came from some untrusted random client in the network. If this client is malicious, it can give you wrong data, and although you can execute the block, it will lead to the wrong result. This is similar to a known saying in Machine Learning: *garbage in, garbage out*. If you train a model with garbage data, even if you use a super-advanced model, youâ€™ll get garbage predictions.

Now, weâ€™re getting to the gist of the problem: we need some way to validate that this data is correct. This is an old problem that most people know can be solved with Merkle Proofs. As in, proof that a data set belongs to a tree root.

So, this *witness* should contain, apart from the raw data from the state, a proof that this data is correct. Note that this proof proves the following claim: â€śThis data is part of a tree with state root Xâ€ť. This *root X* should be the tree root of the Ethereum state precisely after the previous block was executed (i.e: the current state where this block we want to execute will act on).

Since youâ€™ve just joined the network, this is a problem since how do you know/verify the state root of the previous block without getting into a recursive problem? (i.e: not verifying the chain from genesis). I wonâ€™t go into details since this isnâ€™t strictly related to stateless clients. This is an *old* bootstrapping problem. You can rely on the sync committee or any trusted checkpoint.

Note that this is only a bootstrapping problem to get the state root of the previous block of the first one you want to verify. For subsequent blocks, th*e stateless client* calculated each last block state root in the previous step, assuming it is correct (i.e, you can trust your computations).

In summary, a stateless client end goal is the following:

- I have the previous block (
*block A*)`N-1`

. This parent block is assumed to be correct (e.g, all the fields, particularly the state root, are valid). - Receive block (
*block B*)`N`

. This block provides inputs for the execution weâ€™ll attempt, in particular, the ordered list of transactions â€” and we want to verify some fields to check this block is correct (e.g: the state root, gas used, etc). - Receive a
*witness*needed for the execution of*block B*. - Verify that the witness proof verifies against the state root of
*block A*. Remember that*block A*fields are assumed to be correct, thus the state root is our source of truth for a valid Ethereum state. - Execute
*Block B*transactions using the provided witness data and verify the state root, gas used and other results it against claimed values in*Block B*. (e.g: state root check). - Go to step 2.

Note that the witness proof is what connects/validates the auxiliary data provided to execute the block with the previous block expected state.

# Technical preliminaries

If you havenâ€™t read the article describing the verkle tree structure, itâ€™s highly recommended you do so before continuing to read it. Iâ€™ll assume that you at least understand *Figure 1* of that link. Iâ€™ll be quite detailed in this article, so Iâ€™ll still be refreshing that knowledge so you can come back to that article too.

Most of this article sees a Verkle Tree as a way to store a set of key-value pairs. In the Ethereum context, these key values are account information and smart-contract data, but this fact is irrelevant to this article.

Similarly, as in a Merkle Patricia Tree (or a Merkle tree), building proof consists of collecting information from the tree to convince the verifier that some data is part of it. The only cryptography tool used in those trees is a hashing function. For Verkle trees, a new type of cryptography is used. Since not everyone is interested in cryptography details, in this article I try to refer to the cryptography part as a black box with some API. I leave some of the details of the cryptography in an appendix if youâ€™re interested.

## Minimal cryptography concepts to understand

Iâ€™ll only explain some basic concepts needed to understand most of this article. I wonâ€™t be super strict with the explanation since the idea is to describe the minimum stuff possible to continue with the rest of the article.

In this article's diagrams, youâ€™ll see that nodes in the tree are vectors with 256 items. Each item is a *scalar field element* of a curve, which you can imagine as a 253-bit integer. If you have one of these vectors, you can compute a *vector commitment*, a succinct vector representation (i.e: you can think of this as a â€śhashâ€ť, but itâ€™s an elliptic curve point).

This *commitment* allows you to do a *vector opening*, proving to somebody that an item of this committed vector has a particular value. This proof is shorter than giving the verifier all 256 items and letting her compute the commitment again, which would be a *naive* proof. This construct is at the crux of what makes Verkle Trees useful compared with any Merkle Tree. A vector opening can be thought as: `OpenVector(vector, position_index, value) -> Proof`

, which later a verifier can verify with `VerifyVectorOpening(vector_commitment, position_index, value, Proof) -> bool`

.

The last thing to know is that we have a magical function `MapToScalarField : P -> x`

that can transform a *vector commitment* (i.e: an elliptic curve point) into a *scalar field element *(a vector item)*. *This allows us to â€śputâ€ť vector commitments into vector items.

Thatâ€™s what you need to know for 95% of this article. If things were confusing, I wouldnâ€™t worry too much. Hereâ€™s a diagram that tries to tie together many of the above concepts and how theyâ€™re used in a Verkle Tree:

When you see an *arrow* from a vector item to another vector, this reference is a *scalar field element,* a ~â€śhashedâ€ť version of the vector commitment.

# Proof of presence and absence

Before jumping into technical details about the *witness* format or how it's built, letâ€™s try to get a better understanding at a high level.

Recall that we have a set of key values we want to prove are part of a tree (i.e, the previous block's state tree). These key values contain keys that will be read by the block execution logic when this block gets executed by the EVM, and we can separate them into two groups:

- Keys that are present in the tree.
- Keys that are missing from the tree.

If my balance is 100 ETH, that means that if I read my balance in this tree, I must find a corresponding leaf node containing the value 100. As in, this key value is part of the tree.

On the contrary, if I try to read the balance of an address I just â€ścreatedâ€ť in a wallet, the leaf node corresponding to this balance wonâ€™t exist in the tree since this account was never used, so the accountâ€™s balance is implicitly set to 0. Note, this is different from the case of my account having 100 ETH, and then me sending 100 ETH to another account and having a balance of 0 ETH. In this case, my account balance **is** in the tree, and the value is explicitly 0.

Itâ€™s normal for some transactions to access the state for tree addresses that donâ€™t exist in the tree (i.e: being EOA or contract storage slots).

The point here is to be aware that proving the key values that we have in the witness means that for some keys, we might need to prove their presence in the tree (with their corresponding value), and for other keys, we need to prove their absence (thus, the usual interpretation is the zero value). This distinction is important since proving each case is conceptually different.

Since I want this article to be comprehensive, letâ€™s explore each case of proof of presence or absence for a single key value that we want to prove. When proving multiple keys, we need to understand some extra cases, but they will become more obvious after we know how to prove only one key.

## Case 1: Proof of presence for a single key

Letâ€™s start with the case that most people start thinking when trying to understand how to prove *something*.

Imagine that I want to prove that my balance is 100 ETH. To start talking more concretely, imagine my account balance maps to the following key in the tree: `0x01fe...01`

. I want to prove that this keyâ€™s value is `100`

.

Letâ€™s have a look at the following tree:

*Note: this diagram shows a partial view of the whole tree*

Assuming youâ€™ve read the suggested blog post to understand the basics of the verkle tree structure, this image (hopefully) shouldnâ€™t be surprising.

My balance lives in a Verkle Tree where:

- Level 0 always contains a single internal node, the tree's root node. Since itâ€™s an internal node, it has a vector 256 entries. The
*commitment*to this vector is what we call the tree's*state root*. - Level 1: to simplify things, Iâ€™m only showing a single internal node at this level, but other internal nodes can be required for other keys. Note that the second entry of the root node vector is the commitment to this internal node (which also contains 256 commitment elements).
- Level 2: is the leaf node that contains the key for my account information. We came here from the 255-th vector element in the Level 2 node.

Note how each step in this path corresponds to the bytes of `0x01fe...`

.

Letâ€™s unpack this *leaf node* in the dotted rectangle. As you might already know, the first vector of four elements is called *extension-level commitment. *Then, each *C1* and *C2* commitment â€śpointsâ€ť to a vector of 256-elements (called *suffix level commitment*) to represent the first and second half of the 256 values stored in this leaf node.

Since our balance corresponds to the key offset `01`

(i.e: the last byte of the key), weâ€™ll find the value in the vector corresponding to *C1*. Moreover, since two elements in these vectors represent each value, our value of 100 (ETH) is encoded in the (0-indexed) elements 2 and 3. Hopefully, all this makes sense to you since Iâ€™m not describing anything new compared with what was explored in the referenced blog post.

Now comes a critical part of understanding how proof works, which should be naturally intuitive, but itâ€™s important to understand the details since some things might not be entirely apparent.

To prove that my balance is 100 ETH, we need to provide a verifier the path of the tree from the root to the leaf node â€” similar to a Merkle tree proof. Instead of doing hashes, we need to do *vector openings*. Each vector opening at index *n* proves that a node is indeed the *nth* child of its parent. All these chained relationships check that the root node opened values are indeed openings for the state root we expect (i.e: the previous block state).

For internal nodes, the vector openings prove the â€śarrowsâ€ť, connecting the internal node with a claimed children node (i.e: child vector). For leaf nodes, the vector openings are *extension level* values (i.e: stem, C1/C2, etc), and *suffix level* values (i.e: real values expressed by pairs of vector elements) with the leaf node commitment and C1/C2 respectively.

Note that by doing this *vector openings* donâ€™t depend on the vector size, which is the main benefit compared with Merkle Patricia Tree proofs where you need to provide all siblings.

Letâ€™s see which openings we need to prove for my balance (shadowed vector items in nodes):

These openings are:

- Each internal node vector item value corresponds to the path to our target leaf node. This shouldnâ€™t be surprising since itâ€™s similar to a Merkle proof with the difference that the stride is a byte and not a nibble.
- At the leaf node level:
- At the
*extension level*section, we need to open the following: - The first item: is an
*extension marker*with value of 1. (Note: today thereâ€™re only leaf nodes with this value, but in the future, we could have other formats for leaf nodes, thus other*extension marker*values). - The second item: is the
*stem*of the key. - The third item: the
*C1*commitment, since our balance lives in the first 128 values of this leaf node values. - At the
*suffix level*, we need to open items at index 2 and 3 since our 100 value lives in these two slots (i.e: remember that values are encoded as two*scalar field*values)

Note that we need to do the *extension marker* (i.e: the 1 value) opening to be sure weâ€™re describing a leaf node *extension level* of the right type*. *We also have to prove the *stem* since*,* if we donâ€™t do that, then youâ€™d be simply proving that thereâ€™s a leaf node here with the expected value â€” but you also want to prove that this leaf node corresponds to the expected stem and not any other. (i.e: if our key has prefix `0x01fe00...01`

, in this same leaf node we could have values for `0x01fe01..01`

, or `0x01fe02...01`

, etc, so we need to prove that isnâ€™t the case).

You might be surprised that we need to do many vector openings to prove a single key-value, and thatâ€™s true. But note that if we need to open more than one key in this leaf node, many of these openings cost gets amortized. For example, if a smart contract accesses many storage slots close to each other, or we need to prove the *nonce* and *balance* of an EOA â€” all cases are quite common. Also note when we do more openings we do (potentially in different leaf nodes), theyâ€™ll share openings in the top layers of the tree â€” a similar effect that happens on Merkle trees.

In summary, proving the presence of a key value involves:

- Providing all internal nodes in the path with (only!) their respective item element pointing to the right child.
- At the leaf node level, prove that this is a leaf node corresponding to the expected stem, and prove the expected values by opening C1 or C2 depending on whether the index is more or less than 127 (i.e: is in the first or second commitment).

As a last note, I want to point out two special cases that donâ€™t change anything about what I explained before, but I think are worth mentioning:

- What if I want to prove a key that is part of a leaf node but was never written before? For example, letâ€™s say only storage slot 0 was written in an SC â€” this means that storage slot 1 lives on the same leaf node but has the default zero value. This is an interesting case that might seem like a proof of absence since, technically, the value was never written, but the value explicitly exists in the tree. The prover will prove that for this value, the two corresponding slots in the
*suffix tree*are the zero value (for the corresponding C1 or C2). - What if the values in the leaf node in the index range [128, 255] are all empty, and I want to prove one of those values? In this case,
*C2*will be the empty commitment (i.e: more technically, the identity element of the group). Again, it's fine since you can still do an opening of the zero values in this vector. You*could*avoid doing the opening since this empty commitment is a special one, considering that if a commitment is an empty element, that already can prove that all values in the vector are zero.

If any of the above points are confusing, they could be safely ignored since it doesnâ€™t change the general logic explained before. I just wanted to point them out since these are some border cases that Iâ€™d like implementers to at least have in the back of their minds. Also, the latter point could be an optimization opportunity to avoid some vector openings, but this isnâ€™t done in the design to avoid extra complexity and special cases.

Now, letâ€™s jump on proof of absences since itâ€™s important to understand their logic.

## Case 2: Proof of absence â€” empty

If the block execution reads the value of a key that isnâ€™t present in the tree, it can find two situations while looking for the corresponding leaf node:

- Not finding any leaf node.
- Find a leaf node, but this leaf node corresponds to another key.

In this section, weâ€™ll focus on the former, and in the next, we'll focus on the latter.

Letâ€™s say that Iâ€™m looking for the key `0x0100...`

(i.e: the key has this prefix; you can imagine whatever suffix you want) in the following tree:

What happened here is that in the second level, the branch at index 0 is empty. Thereâ€™s no child internal node or leaf node â€” itâ€™s empty. This means that to prove to a verifier that this key doesnâ€™t exist in the tree, we need to prove that the 0-th element in this internal node has a defined value â€” the empty commitment. An empty commitment value indicates no value for this branch of the tree; the verifier knows this key doesnâ€™t exist (and thus interprets the value as zero or whatever was defined). This proves that any key with the prefix `0x0100...`

is absent.

In summary, the proof of absence for this case is proving internal node openings down the key path until an internal node proves the opening of the *empty commitment* value and stops there. This proves there isnâ€™t a leaf node where it should be if the key exists in the tree.

## Case 3: Proof of absence â€” other.

As mentioned before, another situation can happen when we try to read a key that doesnâ€™t exist in the tree. Our ending step can be a *leaf node* whose *stem* value doesnâ€™t match our key stem.

Letâ€™s imagine weâ€™re trying to prove the absence of `0x010011...`

again, but in the following tree:

In this case, when we walk down the `0x010011...`

path, we end up in a leaf node that doesnâ€™t correspond to this key but one for the stem `0x010042...`

.

To prove this case, we need to do the following openings:

- All openings of internal nodes in the path, as usual.
- At the leaf node level, we need to open:
- The extension marker with value 1 indicates this is a leaf node.
- The
*stem*. Note that this*stem*wonâ€™t be the`0x010011...`

stem, so this proves to the verifier that this key doesnâ€™t exist in the tree. If it existed, at this level in the tree, another internal node should have â€śforkedâ€ť into two leaf nodes â€” one for our key and the other for the leaf node we found.

## Taking a step back and thinking about multiple key values

Hopefully, now you have a better intuition on how we can prove the presence or absence of a key.

For a verifier to be convinced some key value is or isnâ€™t part of a tree with a known state root, it asks the prover to provide:

- A list of tree nodes with partially filled vector items.
- Vector openings that prove the relationship between parent-children but connect the i-th parent vector item with the commitment of the children node.
- Carefully select which vector items get opened in the leaf node vector(s) to prove the presence or absence of the key.

Another way to think about these proofs is to provide a partial view of the tree. With this proof information, the verifier can build a partial view of the tree. **Assuming that this partial view of the tree** is one from the expected tree if it reads the corresponding key in this partial view, it can be sure about their presence (and corresponding value), or their absence (and corresponding default zero-value).

Proving multiple key values simultaneously means providing more paths to build a bigger partial view of the tree. Again, in this tree, whenever the verifier tries to read any of the claimed keys, it can get the corresponding claimed value by the prover.

Most of the logic of creating the Verkle witness/proof and its format is a way to serialize all this information that the verifier needs to build and verify this partial view of the tree.

You can already imagine that proving multiple keys creates some interesting situations. For example, letâ€™s look again at the following diagram:

and imagine we want to prove the following keys:

`0x010001...12`

: this is a proof of absence.`0x010042...03`

: this is a proof of presence.`0x010042...ff`

: this is a proof of presence.

Note that these three cases all end up in the same leaf node. In each case, the intention and *vector openings* that we have to do are different:

`0x010001...12`

: wants to prove the absence, thus only wants to open the first two items of the*extension level commitment*vector*.*`0x010042...03`

: wants to prove the presence in â€śthe first halfâ€ť of the vector values. Thus opening the first two items of the*extension level commitment*plus an opening for C1 in two items of the*suffix level commitment*.`0x010042...ff`

: same as the previous case, but for C2.

The interesting part is that many of these three keys want to provide open values that overlap, so the proof format should provide an efficient way to provide all this information without being wasteful.

You can also imagine another case where two keys are absent â€śempty", meaning both end up in an empty internal node vector item value. Note that providing the verifier with this partial view of the tree indirectly proves both cases or any key that would end up in this place in the tree.

This is a good stopping point if youâ€™re not interested in getting into the fine details of how the proof format works and how itâ€™s built/verified. Most of the conceptual explanation of how a prover constructs the proof, which allows the verifier to have a partial view of the tree, is what has been explained up to now.

# Witness format definition

Now that we know *what* to do, Iâ€™ll explain how all this information is packed into bytes to be sent to a verifier â€” or, said differently, the witness format definition.

Please be aware that at the moment, Iâ€™ll describe the current proposed format, but this can slightly change in the future.

A *block/execution witness* (i.e: the verkle proof required to execute a block statelessly) is an SSZ-encoded serialization of the following *ExecutionWitness* structure:

```
class ExecutionWitness(container):
state_diff: StateDiff
verkle_proof: VerkleProof
```

This structure is composed of two parts:

- The
*StateDiff*is a flat description of key values â€” all the key-values that the EVM will need when trying to execute this block. - The
*VerkleProof*allows the client to verify that*StateDiff*data is part of a tree with a state root equal to the previous block. Said differently, all the extra information we explored in the earlier sections: internal node commitments, auxiliary data, etc, to allow the verifier to rebuild a partial view of the tree containing the key-values of*StateDiff*.

*StateDiff* definition

Letâ€™s look at *StateDiff* definition:

```
MAX_STEMS = 2**16
VERKLE_WIDTH = 256
class SuffixStateDiff(Container):
suffix: Byte
# Null means not currently present
current_value: Union[Null, Bytes32]
# Null means value not updated
new_value: Union[Null, Bytes32]
class StemStateDiff(Container):
stem: Stem
# Valid only if list is sorted by suffixes
suffix_diffs: List[SuffixStateDiff, VERKLE_WIDTH]
# Valid only if list is sorted by stems
StateDiff = List[StemStateDiff, MAX_STEMS]
```

The above definition is self-descriptive. Itâ€™s a list of key values grouped by *stem*. Recall that a key is composed of a *stem,* the first 31-bytes of the key, and an *offset,* which is the last byte. This grouping is done since, by design, itâ€™s very common that multiple accessed values share the same stem.

For example, a particular EOA nonce and balance key in the tree will share the same *stem* (i.e: 31-bytes):

- With an address, we can calculate the
*stem*, say:`0x42000000000000000000000000000000000000000000000000000000000011`

(note this is a hex-encoded 31-byte string). - To read the account balance in the tree, you look up the value in the tree with a key of
`stem || 0x01`

(i.e:`0x4200000000000000000000000000000000000000000000000000000000001101`

) - To read the account nonce in the tree, you look up the value in the tree with a key of
`stem || 0x02`

(i.e:`0x4200000000000000000000000000000000000000000000000000000000001102`

)

There are other relevant offset values for EOA accounts that you can find here.

The point of explaining this is that we expect the keys to be proven to be naturally grouped, so to reduce the size of serializing them in the *StateDiff* it makes sense to group them by *stem* and provide the values per *offset*.

Note that we provide two values for each full key:

`current_value`

: this is what we refer to as the*values*a stateless client will use to execute the block (i.e: prestate values). As in, the value of this key before the block is executed.`new_value`

: if this value isnâ€™t`null`

, the block execution changed the value of this key to the described value. This information isnâ€™t required for stateless clients since they will calculate that themselves since they will execute the block anyway. Iâ€™ll touch on this*post-state*values in an appendix â€” from now forward, you can assume this`new_value`

doesnâ€™t exist (i.e: ignore it).

Thatâ€™s mostly it for *StateDiff*. In a nutshell, it is a data structure that lists a list of key values. Nothing fancy.

*VerkleProof* definition

Here is where things start to get a bit more interesting:

```
BandersnatchGroupElement = Bytes32
BandersnatchFieldElement = Bytes32
MAX_COMMITMENTS_PER_STEM = 33 # = 31 for inner nodes + 2 (C1/C2)
IPA_PROOF_DEPTH = 8 # = log2(VERKLE_WIDTH)
class IpaProof(Container):
C_L = Vector[BandersnatchGroupElement, IPA_PROOF_DEPTH]
C_R = Vector[BandersnatchGroupElement, IPA_PROOF_DEPTH]
final_evaluation = BandersnatchFieldElement
class VerkleProof(Container):
// [Group A]
other_stems: List[Bytes32, MAX_STEMS]
depth_extension_present: List[uint8, MAX_STEMS]
commitments_by_path: List[BandersnatchGroupElement, MAX_STEMS * MAX_COMMITMENTS_PER_STEM]
// [Group B]
D: BandersnatchGroupElement
ipa_proof: IpaProof
```

Letâ€™s try to unpack this since it can feel a bit overwhelming.

The *VerkleProof* data structure fields can be separated into two groups:

*Group A*:`other_stems + depth_extension_present + commitments_by_path`

fields.*Group B*:`D + ipa_proof`

fields.

The *Group A* fields contain data the verifier will use to build the partial view of the pre-state tree containing the values previously explained in *StateDiff*.

Up to this point, this *Group A* fields allows you to build a partial view of a tree. But to prove this partial view is correct, you need to verify the *vector openings,* proving the correct relationship between parent and children nodes. Thatâ€™s what *Group B* fields come in.

The *Group B* fields contain the cryptographic proof that this rebuilt tree **is** correct. Verifying this proof requires the client to provide the expected root node commitment of this partial tree (i.e: trusted from somebody or previously calculated by itself in a previous block execution).

These `D + ipa_proof`

fields are the output of a *multiproof*, a proving scheme used to do multiple openings of vector commitments using *inner product argument*s. If you donâ€™t know what all this means, thatâ€™s fine since Iâ€™ll get into more details about this cryptographic proof later. In summary, this is the information the verifier needs to *magically* verify all the required *vector openings *(i.e: partial tree correctness).

Hopefully, now things are a bit more clear since the steps to do for the verifier are as follows:

- Using
*Group A*fields, reconstruct the pre-state partial tree. - Using this rebuilt partial tree and the
*Group B*fields, verify that this rebuilt tree**is**a correct/valid partial view of the tree for the root node that is our source of truth (i.e: assumed to be valid).

After you do both things, you can assume that the *StateDiff* (i.e: pre-state key-values) are correct and thus proceed with executing the block using this data â€” done!

In the next sections, Iâ€™ll try to describe how the prover and verifier algorithms work for both group fields.

# Witness generation and verification

Finally, weâ€™re jumping into the gist of this story.

Letâ€™s recall for a moment that the *ExecutionWitness* was composed of:

*StateDiff*: the list of key values.*VerkleProof*: auxiliary data and a cryptographic proof to prove*StateDiff*is part of the tree of the previous block state root.

Generating the witness means generating these two sub-structures â€” the actor generating the witness is the block builder. The block builder has already decided which ordered set of transactions will be included in the block, and now itâ€™s about to execute this (to be proposed) block.

*StateDiff* field

Generating the *StateDiff* is quite simple. The actual implementation in the client isnâ€™t rocket science, but it might depend on each client's strategy.

When the block builder executes this (to be proposed) block, it needs to record every key that was read or written during the block execution â€” the resulting list will be saved in *StateDiff*.

This clearly shows why this is sufficient info for future stateless clients. If a stateless client executes that same block, since block execution is deterministic, it will try reading and writing to the same keys. Since these are part of the *StateDiff* field of the witness, itâ€™s clear their values will always be found when (re)executing the block.

This witness should be considered invalid if a stateless client does not find a required key in the *StateDiff* structure. if hte witness has missing data, the partial view of the tree is incomplete and insufficient to allow stateless clients to re-execute the block. (e.g: I provide you all the required keys but one, thus, youâ€™ll be unable to execute the block).

But note that a missing key is not the only reason a witness can be considered in valid. Imagine a witness which:

- Deserializes correctly to the
*ExecutionWitness*structure. - All internal values of the fields are valid (e.g: the elliptic curve points are valid, the stems have the right size, expected lengths between lists matches, etc).
- The order of values of the list follows the rules of the encoding.
- A partial tree can be reconstructed correctly.
- The cryptographic proof is verified and has the green check.

Andâ€¦ this witness is invalid. Can you spot why?

The reason in this case is that, if the witness contains an extra key never used by the block execution, it should also be considered invalid. This isnâ€™t currently explicitly defined in the spec, but it can be required if we want deterministic/canonical proofs for each block, which I believe is a good idea.

So, apart from satisfying many of the bullet points mentioned above*,* a valid *StateDiff* should contain the minimum number of key values required to execute the block successfully. I want to mention this fact since a cryptographic proof that verifies isnâ€™t sufficient to claim that the whole witness is valid. I'm pointing this out to provide more (pedantic?) details for implementors not to lose perspective of what weâ€™re trying to do as a verifier.

This is most of what you need to know about generating the *StateDiff*.

*VerkleProof* field

Now, weâ€™re getting closer to the *fun* part. Recall that I mentioned this can be separated into:

*Group A*:`other_stems + depth_extension_present + commitments_by_path`

fields providing auxiliary data to rebuild a partial view of the state tree containing the*StateDiff*key-values.*Group B*:`D + ipa_proof`

fields provide cryptographic proof that the partial tree nodes are â€ślinked togetherâ€ť correctly. (and thus, all connected to the verifierâ€™s expected state root)

Letâ€™s set up a theoretical Verkle Tree that Iâ€™ll use to guide the explanation:

A quick explanation of what weâ€™re seeing here:

- Iâ€™ve used a â€śthunder-arrowâ€ť symbol to show that some nodes contain references to nodes that exist in the tree but arenâ€™t shown in the diagram. The diagram is a partial view of the whole tree.
- I used an
`E`

value for tree node references that donâ€™t have children. e.g: the left-most node at level 1 at position`FF`

has no children. - For cases where we show children, Iâ€™ve named the commitment/reference
`CM{X}`

since weâ€™ll need a name for these commitments in our explanations. *Stem*values that have format`XXYY{ZZ}^29`

I mean a 31-byte hexadecimal value`XXYYZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ`

. Note that this is a format only for this example, so the last 29 bytes have the same value. In reality, stems can have 31 bytes of different values!- Dotted squares are marking conceptual boundaries for leaf nodes. In the case of
`FF00[22]^29`

, Iâ€™m only showing the*extension level commitment*.

Weâ€™ll be exploring how to build the proof when trying to prove the following keys in the tree **in a single witness**:

`0x01020000000000000000000000000000000000000000000000000000000000FF`

: this is a present key with value`210`

.`0x0102030000000000000000000000000000000000000000000000000000000005`

: this is an absent key.`0x01FF035000000000000000000000000000000000000000000000000000000005`

: this is an absent key.`0xFF00210000000000000000000000000000000000000000000000000000000006`

: this is an absent key.

These keys were carefully selected to have decent coverage of all the important cases.

If youâ€™re interested in how proof generation works in Python, Iâ€™d recommend looking at this research implementation: generating the proof and verifying the proof. What Iâ€™ll do next is some hand-holding so you donâ€™t have to figure out the logic of how this works by reading the code.

*Group A* fields - Allow the verifier to rebuild a partial view of the tree

In these fields, we must pack all the needed data so the verifier can build a partial view of the tree containing all the key values from *StateDiff*.

Recall that our intention as a prover is to allow the verifier to build this partial view of the tree to extract the appropriate result whenever it tries to read the target keys. If you think for a moment, this means that we need to provide enough information for each of the relevant paths in the tree that cover all the keys.

For the target keys mentioned above, letâ€™s try looking at which tree paths they correspond to:

`0x01020000000000000000000000000000000000000000000000000000000000FF`

and`0x0102030000000000000000000000000000000000000000000000000000000005`

correspond to the same path, which ends in the left-most leaf node. Note that the latter isnâ€™t part of the tree, but if you try looking up for it, youâ€™ll end up in this leaf node, which has a different*stem,*thus proving that it is absent.`0x01FF035000000000000000000000000000000000000000000000000000000005`

corresponds to a path that doesnâ€™t end up in a leaf node. This path has two nodes: the root and the left-most node at level 1. The â€śpath endâ€ť is the empty commitment showing that there arenâ€™t further nodes, thus proving this key is absent.`0xFF00210000000000000000000000000000000000000000000000000000000006`

corresponds to the path that ends up in the right-most leaf nodâ€”a similar case as`0x0102030000000000000000000000000000000000000000000000000000000005`

mentioned before.

Note that although we have four keys we want to prove, weâ€™re primarily interested in providing enough information to the verifier to build three paths.

Iâ€™ll show what the *Group A *fields for this proof look like and explain afterward why thatâ€™s the case:

`depth_extension_present = [2<<3|2, 2<<3|1, 1<<3|0, 2<<3|1]`

`other_stems = [0xFF002100000000000000000000000000000000000000000000000000000000]`

`commitments_by_path = [CM1, CM3, CM5, CM2, CM4]`

Much of the above probably wonâ€™t make sense to you, so letâ€™s explain this in parts.

The `depth_extension_present`

is the most important field to understand. This field is a list of bytes, each of which is defined as `EXTENSION_PRESENCE_TYPE || (PATH_DEPTH << 3)`

where `EXTENSION_PRESENCE_TYPE`

can have three values:

`EXTENSION_PRESENCE_EMPTY = 0`

`EXTENSION_PRESENCE_OTHER = 1`

`EXTENSION_PRESENCE_PRESENT = 2`

*Note: EXTENSION_PRESENCE_* is a name not used in the referenced Python code, nor the current geth implementation. It might probably be the name to use in the specs. In any case, itâ€™s just a name; the concept is the same.*

The idea is to have one of these *extension statuses* per stem in our key list. For each stem, weâ€™ll have an *extension status* that indicates:

- If all the keys associated with this
*stem*(i.e: remember that we can have multiple keys per stem) are present or absent. Youâ€™ll note that each*extension status*value corresponds with all the cases I described in the*Proof of presence and absence*section before. i.e:*Case 3*,*Case 1*, and*Case 2*, respectively. - At which depth in the tree do we find the corresponding leaf node or the last internal node.

Letâ€™s go through our example, and this will become clearer. Letâ€™s list all the (lexicographically ordered) stems in our key list:

`0x01020000000000000000000000000000000000000000000000000000000000`

`0x01020300000000000000000000000000000000000000000000000000000000`

`0x01FF0350000000000000000000000000000000000000000000000000000000`

`0xFF002100000000000000000000000000000000000000000000000000000000`

Now letâ€™s match this with the `depth_extension_present`

values that I provided before:

`0x01020000000000000000000000000000000000000000000000000000000000 = 2<<3|2 = 2<<3|EXTENSION_PRESENT_PRESENT ~= Present at depth 2`

`0x01020300000000000000000000000000000000000000000000000000000000 = 2<<3|1 = 2<<3|EXTENSION_PRESENT_OTHER ~= Absent other at depth 2`

`0x01FF0350000000000000000000000000000000000000000000000000000000 = 1<<3|0 = 1<<3|EXTENSION_PRESENT_EMPTY ~= Absent empty at depth 1`

`0xFF002100000000000000000000000000000000000000000000000000000000 = 2<<3|1 = 2<<3|EXTENSION_PRESENT_OTHER ~= Absent other at depth 2`

(Note that in this example, we have the same number of stems as keys â€” but this doesnâ€™t have to be the case)

This already provides the verifier with much information to start building the tree. For each stem, it knows what that path looks like in the tree.

Now comes the gist of this tree rebuild algorithm. **Iâ€™d recommend reading these points while looking at our partial tree diagram before so you can get convinced you can build that diagram by doing the following interpretation of ***extension statuses*

Following the same order as the above list, each *extension* *status * provides the following information:

- This
*extension status*shows that the key`0x01020000000000000000000000000000000000000000000000000000000000FF`

lives in a leaf node (since it has a`EXTENSION_PRESENT_PRESENT`

flag) at depth 2. So we know we have a leaf node with path`0x01`

,`0x02`

(two internal nodes), and then the final leaf node with stem`0x01020000000000000000000000000000000000000000000000000000000000`

with a value of`210`

(i.e: we got this from`current_value`

in*StateDiff*). So, we start building this path corresponding to all the keys with**path prefix**`0x0102`

. - This
*extension status*provides similar information describing the same path as point 1. So, technically speaking, it is redundant information. (Note: itâ€™s very tempting to come up with clever twists in the algorithm to avoid this extra byte, but they make the algorithm more complex, and it isnâ€™t clear if it is worth saving a few bytes for this corner case). - This
*extension status*signals that this stem is â€śemptyâ€ť with depth one. We have an empty commitment at the second level at position`0xFF`

. Thus, after walking the root node and jumping to the reference at`0x01`

, weâ€™ll be in an internal node with an empty reference at`0xFF`

. - This
*extension status*signals that this stem is â€śabsent otherâ€ť, meaning that we reach a leaf node â€” but this leaf node corresponds to another stem. Again, since the described depth is`2`

, we know weâ€™ll walk two internal nodes and find a leaf node with another stem. Now comes an interesting part: how do we know which is the*stem*of this leaf node? It isnâ€™t the stem of this*extension status*, since the flag is saying that this leaf node corresponds to**other**stem. Thatâ€™s where`other_stems`

field in*Group A*comes in. It has these auxiliary stems that allow us to fill this information. Each time you need to find**another stem**, you must look for it in this`other_stems`

list. Since the absent stem is`0xFF002100000000000000000000000000000000000000000000000000000000`

and the depth is`2`

, this means we need to look for an item of`other_stems`

with prefix`0xFF00`

.

The above points are pretty dense, so hopefully, you didnâ€™t panic reading that. If youâ€™re left feeling that he is quite complex, remember that the example I created was intentionally created to cover much of the complexity with the least number of keys.

An avid reader might wonder why step 2. didnâ€™t add an extra value to `other_stems`

. From what I described in point 4., I mentioned that `EXTENSION_PRESENT_OTHER`

use `other_stems`

to assist the tree builder on which stem lives in that leaf node. But note that this *stem* overlaps with a proof of presence in the same leaf node, so technically, adding this auxiliary stem in `other_stems`

is unnecessary. This absent other *stem* can see that there's another *stem* with the same two-bytes prefix in a proof of presence, so processing that *extension status* will already fill the *stem* of the leaf node, so we donâ€™t need that auxiliary `other_stems`

item.

With all this, we know what these `other_stems`

and `depth_extension_present`

are used for. Now, note that when we rebuild these paths in the partial view of the tree, we can build *almost* the same tree shown in the diagram from scratch. i.e: internal nodes, at which depth are the leaf nodes, the *stem* value of leaf nodes, values of leaf nodes, *extension markers* now are hardcoded with `1`

, etc). But weâ€™re still missing something: filling the commitments.

On each *internal node,* we need to fill which are the *commitments* at each vector value position in the paths. Also, we need the same to fill the *C1/C2 *items in leaf nodes *extension level commitments*. As you can imagine, this is where `commitments_by_path`

comes in.

The interpretation of this `commitments_by_path`

list is simple. If you walk the tree in a depth-first search (DFS) style, each required *commitment* will be found in that same order in `commitments_by_path`

. Try to see our partial tree diagram image above again, walk it DFS, and see how the required commitments are precisely in order in `commitments_by_path`

. Remember that the values in the vectors arenâ€™t commitments but field elements, where you use the `MapToScalarField(...)`

explained before to map the elliptic curve point (*commitment*) to a scalar field (vector item).

And.. thatâ€™s it! Weâ€™ve seen how, with `other_stems + depth_extension_present + commitments_by_path`

we can provide the verifier with all the information to build a partial view of a Verkle Tree. Note that the verifier still has to do some extra work to check that all these commitments and vector item values match each other up to the final link of the root node commitment (provided by the verifier), tying it all together.

Doing that will be explained in the next section, but it isnâ€™t hard. So youâ€™ve already passed the most complex part of this article.

### Group B fields - Prove to the verifier the rebuilt tree is correct

At this point, the verifier could rebuild a partial view of the tree with all the *Group A* fields. The remaining part checks that this partial view of the indeed has a root node with a commitment equal to the expected one. (e.g: the state root of the previous block).

In a Merkle Tree, you do this check indirectly by recomputing the hashes of each node from bottom to top, expecting that at the top, youâ€™ll get a matching hash for the root youâ€™re expecting. In Verkle Trees, this is a bit different.

Letâ€™s focus on a particular section of the partial tree diagram weâ€™ve been working with before:

The verifier has this â€śchunkâ€ť of the reconstructed partial tree, but remember the prover provided all these vector item values, so how do you know theyâ€™re correct? (e.g: you were provided with `CM1`

there and â€śsome vectorâ€ť `[_, _, CM3, ..., E]`

that *in theory* has commitment `CM1`

, but you need to check it).

That means:

- You must check that
`CM1`

is the second item in a vector with commitment`RootCM`

. - Check that
`CM2`

is the 255-th vector element of`RootCM`

. At this point, we know this top-level rebuilt node is correct to our source of truth`RootCM`

. - Check that
`CM3`

and`E`

are the third and 255-ith element of a vector with commitment`CM1`

.

When I say â€ścheckâ€ť, I mean that the verifier should verify a proof of a *vector opening. *As I mentioned in the *Minimal cryptography concepts to understand *section before, this is a short proof that a vector with commitment *C* has an element of value *V* at a particular index in the vector.

In summary, the verifier should check that each provided value in node vectors cryptographically verifies against its corresponding vector commitment. The diagram above shows an example for internal nodes, but for leaf node vectors, itâ€™s the same.

In this reconstructed tree, we can deterministically describe all the *vector openings* we should check as verifiers. Iâ€™ll describe them in the format `(CM, index, value)`

:

`(CMRoot, 1, CM1)`

`(CMRoot, 255, CM2)`

`(CM1, 2, CM3)`

`(CM1, 255, E)`

`(CM3, 0, 1)`

`(CM3, 1, 0102[00]^29)`

`(CM3, 2, CM5)`

`(CM5, 254, 201_low)`

`(CM5, 255, 201_high)`

`(CM2, 0, CM4)`

`(CM4, 0, 1)`

`(CM4, 1, FF00[22]^29)`

The deterministic ordering is walking the tree DFS, and every time weâ€™re in a tree node, append in an accumulator list all the required openings in that node.

The fields `D + ipa_proof`

contain the cryptographic information that the verifier can use to verify all these *vector openings efficiently*. The total size of these fields is independent of the number of openings!

At this point, I wonâ€™t get into details on how this magic works, but Iâ€™ll dive more into that in an appendix if youâ€™re interested. If youâ€™re not into cryptography details, you can imagine the cryptography libs having an API where you can provide all these openings that you want to be verified plus `D + ipa_proof`

and receive a `Yes`

or `No`

result.

The main work for the verifier is to walk this rebuilt tree to collect all these openings (i.e: the list Iâ€™ve described before), then call this cryptographic API and conclude that the rebuilt partial tree is correct.

# Conclusion

Letâ€™s take another step back and summarize what we described before.

With the *Group A* fields in the witness, the prover provides all the needed information to build a partial view of a tree, which the prover claims is the correct one for the previous state root.

With the *Group B* fields in the witness, the prover proves to the verifier that all the provided values in the vectors correspond with their commitments. This chained-linked verification connects every child with its parent to the final root commitment the verifier wants to check. All this magic is done with cryptography efficiently thanks to a proof aggregation scheme for additively homomorphic commitments.

At this point, the verifier is convinced it has a **correct** partial view of a tree with an expected root. In the context of a stateless execution, the client can execute the block. Whenever the EVM asks for some state, it can pull this from this tree or *StateDiff*. After (or during) the block execution, it can use this tree to update all the commitments and get the new tree root, which should match the claimed one in the executed block.

This concludes the journey of understanding how a Verkle Proof works.

In the next section, Iâ€™ll dive deeper into cryptography and other satellite stuff.

# Appendices

I mention here a set of topics that I find relevant but orthogonal to the article's main goal. I expect not many people to be interested in these details, which is fine.

Apart from that, Iâ€™m a paranoid person regarding how things can go wrong, so Iâ€™m also sharing some dark corners that I havenâ€™t seen many people discussing or that Iâ€™ve touched on with very few people, and I believe more people should have an eye on them â€” just in case.

## Cryptography and implementation optimizations

Some articles explain most of what you need to know to understand how the cryptography part of this story works, but they might be based on some old design of Verkle Trees. Also, to get the full picture, you should read multiple articles. Iâ€™ve found some people being confused by this, so what Iâ€™ll try to do here is to give a clearer view of stuff or hint at how external links should be read.

The cryptographic proof serialized in *Group B* fields mentioned before is called *Multiproofs*. This is a generic proving scheme to aggregate multiple polynomial commitment openings if the commitment is additively homomorphic. This is the case for Verkle Trees since we use *Pedersen Commitments*.

To understand how m*ultiproofs* work, I recommend reading Dankradâ€™s article. The important part is in the *Proofs* section, with those listed bullets. His article uses KZG as polynomial commitments so that part can be confusing. This is due to historical reasons since initially, Verkle Trees design planned to use KZG and not IPAs. You can try ignoring KZG references and imagining it says *inner product argument.*

Since *Multiproofs* sees the polynomial opening part as a black box (assuming is additively homomorphic), in Verkle Trees, we use *inner product arguments. Inner Product Arguments *is a well-known construct for vector commitments, which can also be used as polynomial commitments. You can refer to this paper or Dankradâ€™s article to understand *inner product arguments*.

This is a good general view of how things work:

- IPAs for polynomial commitments/openings.
- Multiproofs to aggregate multiple openings into a single one â€” to have shorter proofs.

If you want to go a step further, you should know that these vectors we use in the tree nodes are interpreted as polynomials in the evaluation form. Contrary to what you might imagine, the evaluation domain isnâ€™t the roots of unity but the range `[0, 255]`

. This is explained in Dankradâ€™s article, but if you want to dive into even more details for an implementation/audit Iâ€™d suggest reading this great HackMD doc from Kevaundray. This is quite relevant since when doing *Multiproofs* youâ€™ll reach a division by zero situation that needs to be handled, plus for an efficient implementation, it provides hints on how some coefficients of the barycentric formula can be precomputed.

If you want even more details about efficiently implementing all this, I wrote a HackMD doc a while ago that tries going through many tricks. Some of these are mentioned in articles before, and others are only explained in my article. I wrote it because we have many good ideas in our implementation, but explaining them further is a good idea so others can leverage them.

## Witness completeness & soundness

By *witness sound*n*ess,* I refer to how easy it is for a malicious prover to create a verkle witness that would convince a perfectly implemented verifier (i.e: no implementation bugs) that a wrong witness is correct.

There are two dimensions here:

- The witness generation/verification algorithm (i.e: is there a border case where a set of keys makes the algorithm do the wrong thing).
- The cryptography proving the relationship between tree nodes isnâ€™t sound.
- A flaw in the tree design.

I think all three points are important to analyze correctly, but in this appendix, Iâ€™ll mostly focus on points 2. and 3. This appendix contains some thoughts about those points but __is not an exhaustive analysis__.

### Point 2: Multiproof/IPA soundness

The TL;DR for this is that we shouldnâ€™t worry since the EF research team reviewed these proving schemes at least. Of course, getting more cryptographers to do their analysis is always good.

The i*nner product argument *is a widely used construct, and papers have formal soundness proofs, so diving into soundness details here doesnâ€™t make much sense since thereâ€™re detailed proofs available.

For *Multiproofs*, probably the best (and for now only) place you can see a soundess argument is in Dankradâ€™s blog post. In the future, other cryptographers can write an even more formal soundness proof.

The main argument mentioned is that if the provided value openings arenâ€™t correct, then at least one term of the aggregated polynomial would be a rational function instead of a polynomial. Since we use IPAs as a polynomial commitment scheme, you can't commit to a rational function; the malicious prover wonâ€™t be able to generate an IPA proof for this rational function â€” at the end of the day, the IPA proof is opening a polynomial in the evaluation form, so the chance this â€śworks by pure luckâ€ť against a rational function is negligible.

### Point 3: Non-injective mapping between Pointâ†’Fr

I realized this a while ago while improving the implementation of the witness verifier for geth. Iâ€™m unaware of any public discussion on this topic, which doesnâ€™t mean it wasnâ€™t accounted for.

After thinking about this for a while, I had a coffee talk (thus not super formal) with Gottfried Herold, which helped me formalize some aspects. The conclusion is that this shouldnâ€™t be a problem â€” but still, itâ€™s fun to share.

Remember, a `MapToScalarField(P) -> Fr`

allows us to transform *commitments* into vector item values.

The way `MapToScalarField(P)`

works is:

`P`

is a point with coordinates`(x, y)`

(i.e: a representative point in the Banderwagon abstraction, but this is irrelevant â€” think of an EC point).- We do
`a = x/y`

, thus`a`

is a*base field*element. - We interpret
`a`

bytes as a*scalar field*element.

If `MapToScalarField(P)`

is injective (i.e: `P -> Fr`

), then we know we only can have a single point for a `Fr`

-ied value, which doesnâ€™t give a malicious prover room for providing *another* commitment that would map to the same scalar element.

You can prove that the `x/y`

step is an injective map between `P`

and `Fp`

. Thanks Gottfried for explaining this fact â€” it isnâ€™t entirely trivial, but if you play with the Bandersnatch equation and `x = C * y`

and have the right subgroup considerations, you canâ€™t have more than one solution.

But, the last bullet point above motivated me to surface this topic. We do a `Fp->Fr`

transformation. Since `Fr`

is smaller than `Fp`

, we can have a wrapping here (i.e, `Fr`

field side is ~four times smaller than `Fp`

). This clearly shows that you could have multiple `Fp`

s mapping to the same `Fr`

, thus `MapToScalarField(P)`

isnâ€™t injective. As in, there could be multiple `x/y`

(for different `(x, y)`

s) that maps to the same scalar field, so you canâ€™t distinguish which is the â€ścorrectâ€ť one.

This means you *could* map multiple commitments to the same scalar field element â€” which sounds *interesting*. In fact, given some `X/Y`

value, getting the (potentially) corresponding point to that value shouldnâ€™t be hard (thanks Gottfried again, for mentioning that fact).

So technically, you *could* find some point that maps to the same scalar field of â€śthe realâ€ť commitment for some tree node. This point might or might not be in the correct subgroup, but more importantly, the malicious prover doesnâ€™t know the vector corresponding to this â€ścrafted commitmentâ€ť.

The bottom line is that you should be careful using this `MapToScalarField(P)`

against an expected value. To be completely sure the provided commitment is correct, you should always ask for a *vector opening* to know that the prover indeed knows the underlying vector of this commitment. If you donâ€™t, you canâ€™t tell if this *commitment* point is correct, where â€ścorrectâ€ť means the one used in the tree. If you donâ€™t do that, you might get convinced that some parent node points to a *commitment* that *seems* to be the child, but it isnâ€™t.

In the context of our witness proofs, we always ask for at least one vector opening for every commitment. This forces the prover to know the whole vector, so it canâ€™t inject a *specially crafted* commitment to exploit this non-injective mapping in a way that wouldnâ€™t fail the proof verification. But if youâ€™re building another algorithm/protocol, be careful about this fact.

## Witness post-state values

When we explored the *StateDiff* struct, I glossed over the `new_value`

field:

```
class SuffixStateDiff(Container):
suffix: Byte
# Null means not currently present
current_value: Union[Null, Bytes32]
# Null means value not updated
new_value: Union[Null, Bytes32]
```

The key value we always referred to in this article is `current_value`

, which gives you the key value before the block is executed (i.e: pre-state). The `new_value`

is `null`

if this key still has the same value after the block execution or the new key value otherwise.

As mentioned before, the `new_value`

field isnâ€™t useful for stateless clients since they will execute the block to validate it â€” which means theyâ€™ll calculate the `new_value`

s themselves. These `new_values`

are usually called *post-state.*

Be aware that including the post-state in the block witness is still being explored, and it isnâ€™t clear yet if it will be part of the final design. Theyâ€™re part of the Kaustinen testnet weâ€™re using for testing Verkle Tree implementations so you can experiment with them.

If stateless clients calculate these values themselves, why was this considered as something useful? The answer is quite simple: these values allow a client which isnâ€™t validating blocks to inspect which keys have changed values, thus reacting faster for whatever intentions they have.

For example, if youâ€™re a block builder, you might receive a block, see the post-state in the witness, update your state directly, and try to react immediately to build the next block. In the background, you could verify the block and rollback all your operations if those values are incorrect. So, it is a kind of optimistic computation where you can parallelize the validation.

Another interesting example might be wallets, which can be tracked if some block changes the balance of a particular EOA address. Since they donâ€™t have to execute the block to know what changed during the execution, they can observe that effect.

A final example is for chain syncing. Using the post-state values you can quickly compute the updated state root of the chain, while in the background you execute the blocks. As the other examples, this can be useful to act optimistically and rollback later if needed.

But thereâ€™s a crucial catch here, and the main reason why I wanted to write this appendix. The witness proof verification isnâ€™t verifying that `new_value`

s are correct, only that `current_value`

s are correct. The cryptographic proof only proves that the partial view of the pre-state (i.e: using `current_value`

s) is correct.

This should seem apparent since, in all my explanations on how the proof generation or verification works, Iâ€™ve never mentioned using `new_value`

. (In reality, a stateless client should check that these `new_value`

provided are correct, but they arenâ€™t used for block execution â€” they just need to be checked for witness validity rules).

You might wonder why we canâ€™t also create a proof for these `new_value`

s, since the post-state tree is still a Verkle Tree, and thus we could create a proof as we did with the pre-state tree.

The answer is that, indeed, the prover could do that â€” thatâ€™s no problem. It can prove that the provided `new_value`

s are certainly part of the tree with the state root of the provided block. So, this means that at least as a verifier, you know those are part of the resulting tree.

Unfortunately, despite this at least helps to cover some tricks, it isnâ€™t enough to be safe about the validity of them since:

- As a verifier, to verify this post-state proof, you are taking for granted that the state root of this post-state tree (i.e: the claimed state root of the received block) is correct. If the party that gave you the block and witness forged the block state root, then youâ€™re screwed. Remember that youâ€™re not verifying the block. (If you were, you wouldnâ€™t be using these
`new_value`

s) - Even if the state root is correct, the proof could be missing keys mutated during the block execution. If the block execution changed
*key1*and*key2*, and I give you only`new_value`

for*key1*with a corresponding proof, thatâ€™s fine â€” youâ€™ll be convinced that*key1*new value is the one claimed (even assuming the state root wasnâ€™t forged). But youâ€™ll never know that*key2*was also mutated.

In summary, since youâ€™re receiving a block with a â€śclaimedâ€ť state root that youâ€™re not validating, it doesnâ€™t make sense to generate or verify a proof since the underlying assumption of the proof verification is that the state root youâ€™re verifying it against is correct. The crux of this point is that the witness gives you *data validity* proof for the pre-state, not an *execution validity* of the claimed state root. Thatâ€™s precisely why zkEVMs are for since they prove the execution, but weâ€™re not there yet at L1.

This means you must be careful about using the `new_values`

if youâ€™re getting this from an untrusted source. Regarding the examples mentioned before:

- This situation isnâ€™t problematic for the optimistic block-building example since these values are used to do something. In the background, youâ€™re still verifying the block execution and validating it â€” exploiting this extra information optimistically. If the validation fails, you rollback your action.
- For the wallet example, you have to be very careful. If youâ€™re never validating the block and taking irreversible actions because you just inferred an event from a post-state value, then you can get into problems. The recommendation here would be to get witnesses from trusted sources. Or, if the source is not fully trusted, infer any conclusion as hints and warn users about taking definitive conclusions. (e.g: instead of â€śYour wallet balance has changedâ€ť, say â€śLooks like your wallet balance has changedâ€ť. Of course, in this example, this is a terrible UX and might mean that it is a bad idea to use it in wallets â€” which is precisely why I put this as an example).

Apart from all the validity discussion, adding the `new_value`

s increases the witness size, which might be seen as a drawback. But considering that the writing state is more costly than the reading state, itâ€™s expected that most of `new_value`

would be `null`

. So maybe it doesn't significantly impact size in the grand scheme of things.