Detecting Ethereum RPC censorship programmatically.
UPDATE: I was wrong, Infura doesn’t censor RPC reads! This post caught the eyes of Infura, and they set out and wrote a blog post which you should check out. Although, the party isn’t over yet. The option to censor is still there unless we use secure RPC. Which is why I ended up building out this, check it out- https://github.com/liamzebedee/eth-verifiable-rpc.
This is a spec, a request for either (1) grants or (2) builders. Please reach out on the Twitter thread / over DM’s if you’re interested in either.
Introduction.
Recently, Ethereum node providers like Infura/Alchemy started censoring parts of the Ethereum database from being read via the JSON-RPC API’s.
This proposal is to programatically detect this, by building a local EVM shim that verifiably loads state from a remote node during execution.
Problem.
Example: the ENS entry for tornadocash.eth
On a censoring provider like Infura, the contenthash key for tornadocash.eth
returns 0, where in fact we know it to be nonzero.
You can verifty this simply using cast
from the Foundry toolbelt:
(base) ➜ lib git:(main) ✗ ETH_RPC_URL=https://mainnet.infura.io/v3/84842078b09946638c03157f83405213 cast call 0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8 "contenthash(bytes32 node)" tornadocash.eth
0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000
As part of my work on Dappnet, I know that it’s being censored. But this isn’t being talked about.
What is worse, is that we don’t know how to detect it. So I’m implementing an RPC provider marketplace, and I’m unable to tell which providers will at a moment’s notice, block users from accessing their money/dapps.
How can we detect censorship?
This section outlines (1) how Ethereum works and then (2) how we can detect censorship.
(1) How Ethereum works.
What is happening when we call cast call 0x222... "contenthash(bytes32 node)" tornadocash.eth
?
- Ethereum is a database with a microservices layer called smart contracts, which run on the EVM.
- To write to the database, we send transactions. To read from the database, we call these smart contracts and get data.
- The read/write messages are sent over an RPC protocol, called JSON-RPC to an Ethereum node.
- The Ethereum node tracks two things - consensus (the hash of the latest block of transactions in the database) and execution (the world state and processing of txs).
cast call
translates to aneth_call
RPC, which translates to running the EVM with the following message (as EVM is a message-passing model):- ENS is the domain name system, mapping
(name => (key => value))
. To track this, we call a contract called the resolver. The resolver’s address is0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8
, which we’ll callENS_RESOLVER
. - We are calling
contenthash(bytes32 node)
(impl), a function on the resolver contract. - Our call data is encoded according to the EVM calling convention, wherein we concat the 4 byte function selector with its ABI-coded arguments.
cast abi-encode "contenthash(bytes32 node)(bytes memory)" $(cast --from-ascii "tornadocash.eth")
- This creates our message for the EVM to execute -
Message(from=0x0, to=$ENS_RESOLVER, data=0x746f726e61646f636173682e6574680000000000000000000000000000000000, value=0 ether)
,
- ENS is the domain name system, mapping
- When
eth_call
is run, the EVM executes the bytecode of the contract, and returns data from the storage.
How does storage work?
- Traditional databases use SQL to represent data, and we write SQL in order to read/write it. In Ethereum, the language is EVM bytecode, and operates purely on a key-value basis (
sstore
,sload
opcodes), no relational model (joins, etc). Smart contracts are like writing programs that natively use the database for storing their data structures. - EVM has two notions of memory locations -
memory
aka RAM, andstorage
aka disk. - Every contract has its own private namespace for
storage
, and other contracts cannot read it, they must use contract calls to interface with each other.
How does consensus work?
- Ethereum is a blockchain, meaning the latest block hash represents the state of your entire system - all of the transactions it has processed, the latest state of the database, the balances, the smart contract programs, etc.
- An easy way to think about it - each block represents a tick of the system, and the block hash is like the time.
- In Ethereum 1.0, the clock was based on proof-of-work. But since August 2022, it’s been upgraded to a new protocol, proof-of-stake.
- You can track the clock without tracking the rest of the database. This is called a light node, but since Eth 2.0, it just means running a “consensus node” - since the consensus layer has been split from the execution layer.
How does the block hash represent?
- The block hash represents the cryptographically authenticated state of Ethereum - which is a fancy way of saying, it’s a big fat merkle tree, and you can prove anything in the database by revealing a path from the root to the leaf.
- The seminal diagram for the Ethereum world state is here. Seriously, this was made in 2018 and is just that fucking good.
- Simply put, Ethereum’s world state is split into 3 tries - accounts, code, and storage.
- This looks like:
state => (Accounts(address => balance), Code(address => bytes), Storage(contract_address => (bytes32 => bytes)))
(2) How we can detect censorship.
Concept:
- If we have a consensus node, we know the block hash.
- If we know the block hash, we can verify proof of anything in the database.
- Looking up the
contenthash
fortornadocash.eth
is simply running a very small amount of EVM code, that interacts with a very small amount of state.state.Code[ENSResolver]
state.Code[ContentHashResolver]
state.Storage[ContentHashResolver][hashes][tornadocash.eth]
- If we ask the RPC node for this state using
eth_getStorageAt
, we can trivially verify if it was censored or not. How?- By requesting a merkle proof of the path:
(block_hash, storage, ContentHashResolver, hashes, tornadocash.eth)
- If the hash check fails, then we know the state leaf isn’t authentic.
- By requesting a merkle proof of the path:
Ideation:
- A lightweight consensus node like Helios.
- Requesting state directly from the RPC node using
eth_getStorageAt
- Executing EVM
eth_call
client side (ie. something like Wei/FUCory’s work), and lazily loading the storage from the remote execution node.- Load the
msg.to
contract’s code. - Execute a local EVM.
- When encounter
CALL
, load the corresponding contract’s code. - When encounter
SLOAD
, load the corresponding storage key. - Verify both of these through Merkle proofs, so we can detect inauthentic state.
- Return the value of the call like normal, ie.
contenthash(xx)
- Load the
Next steps:
- Sanity check this could work.
- Build this.
- Run it against every publicly available RPC provider - Infura, Quiknode, Alchemy, POKT.
Why? Because while we know which nodes censor transactions to Tornado, we don’t know which nodes censor read-access to Tornado.