How do contract creation and constructors work in the EVM?

This is a simple explainer of initcode, contract bytecode, and constructor code in the context of EVM smart contract deployment.

To begin, EVM consists of smart contracts. Smart contracts are usually written in Solidity, which compiles down to EVM bytecode.

To deploy a smart contract, there are two approaches:

Implicitly. Sending a transaction with a to address set to 0x0. This enacts an implicit EVM routine to create the contract.
Explicitly. Creating a contract within the EVM itself, using the create or create2 opcode. This usually happens within the context of a “contract factory”.

Today I’ll be explaining concepts relevant to both, but particularly focusing on (1).

Deployment and transaction

When a contract is deployed to the Ethereum blockchain using approach (1), a special transaction is created. This transaction doesn’t have a to address because it’s not being sent to any existing account on the network. Instead, this transaction includes the contract’s initcode in its data field. When this transaction is mined, the Ethereum network runs the contract’s initcode, and if a constructor is present, it gets executed.

Upon successful execution, a new address for the contract is created on the blockchain, and the transaction receipt includes this contract address. This new contract then becomes a permanent part of the Ethereum blockchain. The contract’s bytecode, along with any initial state set by the constructor, is stored on the blockchain at this address.

Initcode and bytecode

When you compile a Solidity contract, you get two important pieces of code:

Runtime bytecode is the EVM (Ethereum Virtual Machine) code that is executed every time a function in the contract is called. It includes the logic for all the functions defined in the contract except for the constructor. This is the code that gets stored on the blockchain when the contract is deployed.
Initcode (or Initialization code) is a piece of code that is only run once, during contract deployment. It includes the runtime bytecode and the constructor function. When a contract is deployed, it’s the initcode that gets run. The constructor logic inside the initcode is used to set the initial state of the contract. After the contract is deployed, the initcode is discarded, and only the bytecode remains.

The Ethereum Yellow Paper specifies that a contract creation transaction should be processed as follows:

The initcode is executed.
After execution, whatever output is left on the stack is saved as the contract’s bytecode.
This bytecode is what gets executed when you interact with the contract in the future.

What does `initcode` look like?

It begins with the constructor code, that can include various operations, like PUSH, MSTORE, CALLVALUE, etc. depending on the contract’s needs.
Then it includes the contract’s runtime bytecode, which is simply appended after the deployment bytecode.
Finally, it typically ends with the RETURN operation, which instructs the EVM to take the specified memory content (in this case, the runtime bytecode) and use it as the output of the contract creation, which is stored as the contract code.

What is the `return` opcode?

The RETURN opcode is a way for a contract to output data. When the EVM encounters the RETURN opcode during the execution of a contract, it halts the execution and returns the specified data.

The RETURN opcode takes two arguments from the stack: start and length. These specify a segment of memory from which data will be returned. When the EVM sees RETURN, it stops executing the contract, takes length bytes of memory starting from start, and returns them as the output data of the contract execution.

Here are a couple of common scenarios where RETURN is used:

Returning data from function calls: When you call a function that has a return value, the contract will use RETURN to provide that return value. For instance, in a function like function get() public view returns (uint256) { return x; }, the return x; statement is translated to EVM bytecode that involves loading x into memory and then using RETURN to output it.
Specifying a contract’s runtime bytecode during deployment: During contract creation, RETURN is used to specify what part of the initcode should be saved as the deployed contract’s runtime bytecode. The constructor of the contract, part of the initcode, does its work (like initializing state variables), and then RETURN is used to mark the end of the constructor and the start of the runtime code.

Wait so it doesn’t use `create` or `create2`?

Correct. The EVM includes an implicit routine to create a contract when a transaction is sent to address 0x0. This is the same routine that is used when the create or create2 opcodes are used.

Here is the sequence of events during a contract creation transaction initiated by a user:

Transaction Creation: The user creates a transaction with the to field empty (or null) and includes the contract’s initcode in the data field of the transaction. The initcode consists of the contract bytecode, including constructor logic and runtime bytecode.
Transaction Execution: The EVM sees that the to field is null and recognizes this as a contract creation transaction. It runs the initcode included in the data field.
Running the initcode: The EVM executes the initcode. This typically involves running the constructor logic (which can include SSTORE operations to initialize contract storage) and ends with a RETURN operation, which specifies what segment of memory should be saved as the contract’s runtime bytecode.
Contract Creation: The EVM takes the memory specified by the RETURN opcode and saves it as the contract’s bytecode. The contract now exists on the blockchain at a certain address, and this address is determined by the sender’s address and nonce.
Transaction Receipt: After execution, a transaction receipt is created. This receipt includes the contract’s address and other details about the transaction, like the amount of gas used.

So in essence, there is an implicit CREATE-like operation when a user sends a transaction to a null address, but it’s not an explicit CREATE opcode within the EVM’s execution of the transaction’s data. The EVM simply recognizes from the transaction format that it needs to create a contract, and it runs the included initcode to do so.

Where does the constructor code come from?

The solc compiler will take Solidity code and produce two outputs - the runtime code and the initcode.

The runtime code is the code that will be executed every time the contract is called. It includes the logic for all the functions defined in the contract except for the constructor.

What does runtime code actually do? It processes EVM messages. It parses the first 4 bytes of the msg.data into a function selector, and then uses a jump table to jump to the function that corresponds to that selector. Each function is denoted by a jumpdest. This is the ABI interface / calling convention of the EVM, a common standard that allows Solidity and other EVM languages like Vyper to interoperate. ABI’s are not a new idea - see the format of ELF, Mach-O and PE for other examples.

The initcode, by comparison, is the code that will be executed only once, when the contract is created. The constructor is compiled into constructor code. The pseudocode for the init function looks like so:

def init():
  constructor(args)
  # Push runtime code onto stack.
  for word of runtime_code:
    push(word)
  # Copy it into memory.
  CODECOPY
  # And return the address of the runtime code.
  RETURN

def constructor(args):
  # Your Solidity constructor is compiled into this.

The initcode consists of concatenating the constructor code, the runtime code, and then the constructor arguments.

Summary.

A list of concepts from this study:

initcode
init function
constructor code
runtime code
implicit contract creation via transactions with null to fields, and return
explicit contract creation via transactions that run create/create2
how constructors actually work