MPT tree
Merkle Patricia Tree Introduction
A Merkle Patricia Trie is a cryptographically authenticated data structure used to store all (key, value)
pairs.
The Merkle Patricia Trie is fully deterministic, meaning that dictionaries with the same (key, value)
pairs will always be exactly the same, even down to the last byte. This means they share the same root hash, making insertions, lookups, and deletions extremely efficient at O(log(n))
complexity. Furthermore, compared to more complex, comparison-based dictionaries like red-black trees, Merkle Patricia Tries are easier to understand and code.
prerequisites
To better understand this document, having the following foundational knowledge will be helpful:Hashing、Merkle Tree、Trie (Prefix Tree)和Serialization。
merkle-patricia-trees
A radix tree has a significant limitation: inefficiency. If you want to store a (path, value)
pair at a position with a path length of 64 characters (half-bytes32)
, you would need over a thousand bytes of additional space to store each character at a level, and each query or deletion would require executing a complete 64 steps. The Patricia dictionary tree described below can address this problem.
optimization
The nodes in a Merkel Patricia Trie can be any of the following types:
NULL
(represents an empty string)branch
,a 17-element node[ v0 ... v15, vt ]
leaf
,a two-element node[ encodedPath, value ]
extension
,a two-element node[ encodedPath, key ]
When traversing the prefix tree in a 64-character path, after traversing the initial layers, you will reach a node where there are no more branching paths downstream, or at least not many. To avoid creating up to 15 sparse NULL
nodes in the path, we streamline the traversal down by setting an extension
node in the form [encodedPath, key]
. The encodedPath
contains the "partial path" to skip (using the compressed encoding described below), and the key
is used for the next database query.
For leaf
nodes, you can use a flag in the first half-byte of the encodedPath
to mark that the path encodes all path segments of previous nodes, and you can query the value
directly.
However, this optimization introduces ambiguity. When traversing the path by half-bytes, you may end up having to traverse an odd number of half-bytes, but all data must be stored in bytes
format. There is no distinction between them, for example, a half-byte 1
and a half-byte 01
(both must be stored as <01>
). To specify the length of an odd number of half-bytes, this part of the path should use a flag as a prefix.
specification:Compressed encoding of hexadecimal sequences with optional terminators
As described above, the flag for the remaining part of the path length being odd vs. even and whether it's a leaf node vs. an extension node is found in the first nibble (half-byte) of the partial path in any two-element node, resulting in the following:
For the case where the remaining path length is even (0
or 2
), a 0
"placeholder" nibble is always appended.
example:
Here is the code to retrieve nodes in a Merkle Patricia Tree:
tries-in-MAPO
All the Merkle trees in MAPO
's execution layer use the same Merkle Patricia Trie as Ethereum.
In the block header, there are three roots from three prefix trees.
stateRoot
transactionsRoot
receiptsRoot
state-trie
There is a global state dictionary tree that gets updated each time a client processes a block. In this tree, the path
is always: keccak256(ethereumAddress)
, and the value
is always: rlp(ethereumAccount)
. More specifically, a MAPO account
is an array containing four elements: [nonce, balance, storageRoot, codeHash]
. It's worth noting that the storageRoot
is the root of another Patricia Trie dictionary tree.
storage-trie
The storage tree is where all contract data
is stored. Each account has its own separate storage tree. To retrieve a value at a specific storage position with a given address, you'll need the address, the integer position in storage where the data is stored, and the block ID. Afterward, you can use these parameters in the JSON-RPC application interface's eth_getStorageAt
function, for example, to retrieve data in storage slot 0 of address 0x295a70b2de5e3953354a6a8344e616ed314d7251
:
To retrieve other elements from storage, you first need to calculate the position in the storage tree. This position is calculated as the keccak256
hash of the address and the storage location, both left-padded with zeros to a length of 32 bytes. For example, the position of the data for address 0x391694e7e0b0cce554cb130d723a9d27458f9298
at storage slot 1 is:
You can calculate it in the console as follows::
so,path
is keccak256(<6661e9d6d8b923d5bbaab1b96e1dd51ff6ea2a93520fdc9eb75d059238b8c5e9>)
。 As before, this address can now be used to retrieve data from the storage tree.:
transaction-trie
每个区块都有一个独立的交易字典树,也用于存储 (key, value)
对。 路径为:rlp(transactionIndex)
,代表了对应一个值的键,值由以下决定:
Each block has a separate transaction trie, which is also used to store (key, value)
pairs. The path is formed by using rlp(transactionIndex)
, representing the key corresponding to a value. The value is determined by the following:
receipts-trie
Each block has its own receipt trie. Here, the path
is rlp(transactionIndex)
, where the transactionIndex
is its index within the mined block. The receipt trie is never updated. Similar to the transaction trie, it contains both current and previous receipts. To query a specific receipt in the receipt trie, you need to provide the index of the transaction within the block, the receipt payload, and the transaction type. The returned receipt can be of the Receipt type
, defined as a concatenation of transaction type
and transaction payload
, or it can be of the LegacyReceipt type, defined as rlp([status, cumulativeGasUsed, logsBloom, logs])
.
Last updated