Writing Ethereum Contracts the Hard Way - Part 2

August 8, 2022


Yul and deployment

In this article, we will switch to the Yul language to write contracts. This is an experimental language supported by the Solidity compiler (solc should be installed). Still very close to the assembly, but helps to avoid some repetitive tasks. For example, PUSH instructions can be replaced with direct parameters. Instead of PUSH1 0x01, PUSH1 0x02 ADD, in Yul we can write add(0x01,0x02).

Another feature we need is handling multiple code block together: in the previous post we deployed our code using PUSH13, but it doesn't work with longer code as PUSH32 is a max (remember: stack slot size is 32 bytes). Therefore, we need a more dynamic approach to deploy. The idea is to combine the deployment code and contract code together:

image-20220401141102004

CODECOPY can access the code of the executing contract, and it can copy the bytes from a certain offset to the memory:

image-20220401141151695

In assembly it would look like something like this:

image-20220401141520444

This code can be compiled (600d600c600039600d6000f3), and the real contract code can be appended to it. Therefore, the only problem is that this code depends on the size of the contract which is appended (0x0d) and the size of the deployment code (0x0c).

Yul provides a helper method, which helps to get the offset and length of any binary block (or compiled source code), but this abstraction doesn’t hide the simplicity of using EVM instructions.

The simple ADD contract in Yul can be written as the following:

object "Contract1" {
   code {
      mstore(0x0,add(0x01,0x01))
      return(0,0x20)
   }
}
Which can be extended with the deployer code:
object "Contract1" {
   code {
      datacopy(0, dataoffset("runtime"), datasize("runtime"))
      return(0, datasize("runtime"))
   }
   object "runtime" {
      code {
         mstore(0x0,add(0x01,0x01))
         return(0,0)
      }
   }
}

Here we use the CODECOPY instruction (called as datacopy here) and having some magic helpers (dataoffset, datasize) to access the offset and size of the code in the second part (runtime).

This code can be compiled with solc --strict-assembly map.yul which prints out the code to the standard output. cethacea has a helper to parse the output and save the code to a file:

cethacea evm compile --compiler=yulc deployable.yul

Parse parameters

Now we have just enough abstraction to continue and enhance the ADD contract with parsing parameters. We can call the contract with one binary array (--data). Therefore we need to decide how it should be parsed. Let's use the following structure:

image-20220401142331854

To support the two distinct operations (ADD and SUB) we should check the first 4 bytes. It can be done with Yulc switch statement, which uses the following EVM instructions under the hood:

  • EQcompares the top two elements of stack and returns value
  • JUMPI conditional jump to an address in the code (based on destination and condition on the stack, can use the result of EQ)
  • JUMPDEST should be the instruction on the line which is targetted by JUMPI(just to avoid naugty behavior, doesn't do anything else)

Final Yul code looks like this:

object "Contract1" {
   code {
      datacopy(0, dataoffset("runtime"), datasize("runtime"))
      return(0, datasize("runtime"))
   }
   object "runtime" {
      code {
         switch shr(0xe0,calldataload(0))
            case 0x01 {
               mstore(0x00,add(calldataload(0x04),calldataload(0x24)))
               return(0,0x20)
            }
            case 0x02 {
               mstore(0x00,sub(calldataload(0x04),calldataload(0x24)))
               return(0,0x20)
            }
      }
   }
}

The only tricky part is the shr (right shift). We need only the first 4 bytes to identify which operation should be called, but CALLDATALOAD copies 32 bytes. Therefore, we shift the value to right to keep only the valuable 4 bytes:

image-20220401143312117

Let's try it out:

ceth evm compile --compiler=yulc math.yul
ceth tx submit --account=key1 --data=$(cat math.yul.bin)
04350360005260206000f35b50","time":"2022-04-01T14:34:39+02:00","message":"SendingTransaction"}
hash:       0x53d5597e79bf7469f4cfdbe57234932d06a87f414ce0eb0b44dedda76db35a0d
to:         nil
value:      0
...
data:       6039600d60003960396000f3fe60003560e01c60018114601757600281146027576037565b6024356004350160005260206000f35b6024356004350360005260206000f35b50
status:     1
block:      0xb03548abbf74171e364b4562b025680daf76f71d9b7966de461b5d508267faea
contract:   0xAf1006a590F16A027153acE89956Cb8D772ae425

And call the contract:

ceth tx submit --account=key1 --to=0xAf1006a590F16A027153acE89956Cb8D772ae425 --data=0000000100000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000002
hash:       0x7fb8052cdcdabda555d7eb82e26a38caebf82f9708f34736623b98b48e89a736
to:         0xAf1006a590F16A027153acE89956Cb8D772ae425
value:      0
...
data:       0000000100000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000002
status:     1
block:      0x7baed23f164c7291fbcbdf1b4b2c77f551c7c9b81565303df34aeb5d3d13ddcf
contract:   0x0000000000000000000000000000000000000000

ceth tx debug 0x7fb8052cdcdabda555d7eb82e26a38caebf82f9708f34736623b98b48e89a736 | grep Ret
Ret:       0000000000000000000000000000000000000000000000000000000000000003

Using Ethereum contract interface

While our previous contract works perfectly and can be deployed and used, it's not very convenient to type so many zeros to call it. The easiest way to solve this is following the convention of Contract Application Interface (ABI) which defines one type of encoding which is followed by the majority of smart contract tools (Solidity and others). Technically we can use any other encoding, but following ABI would help us to call our contract with other libraries, Metamask or anything else (and luckily cethacea also supports the encoding).

For our simple use case the encoding is very simple:

  1. First 4 bytes are the SHA3/Keccak hash of the method signature (eg. keccak("add(uint256,uint256)"))
  2. uint256 input is represented with the 32 bytes (ABI encoding is also based on 32 bytes words, and this number fits to it exactly)

Let's check first the hashes of our methods:

ceth util keccak 'add(uint256,uint256)'
771602f7f25ce61b0d4f2430f7e4789bfd9e6e4029613fda01b7f2c89fbf44ad

ceth util keccak 'sub(uint256,uint256)'                                                 
b67d77c567a0f5dc1f8c1e290b73aecb90edaef786ca7a0dc99b82d23316ed1e
And replace the conditions in our switch:
object "Contract1" {
   code {
      datacopy(0, dataoffset("runtime"), datasize("runtime"))
      return(0, datasize("runtime"))
   }
   object "runtime" {
      code {
         switch shr(0xe0,calldataload(0)) 
            case 0x771602f7 {
               mstore(0x00,add(calldataload(0x04),calldataload(0x24)))
               return(0,0x20)
            }
            case 0xb67d77c5 {
               mstore(0x00,sub(calldataload(0x04),calldataload(0x24)))
               return(0,0x20)
            }
      }
   }
}

After a new deployment, we can use the convention using the cethacea contract query instead of cethacea tx submit. With --debug option we can see that it calls exactly the same API, but the input parameters are encoded based on convention.

ceth contract call --debug --contract=0x9d575731D91516bbdF097e84fE9F440Ce3014A63 'add(uint256,uint256)' 1 2
{"level":"debug","from":"6fa89198e8b58f2715e691be8ce70da4edbb4a7c","to":"0x9d575731D91516bbdF097e84fE9F440Ce3014A63","value":"0","data":"771602f700000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000002","time":"2022-04-01T15:35:54+02:00","message":"eth_sendRawTransaction"}
hash:       0x2de0e9684d0f6937e3d4864244b9c6328285d49ab25e3b7881e6ca2d94760e4f
to:         0x9d575731D91516bbdF097e84fE9F440Ce3014A63
value:      0
...
data:       771602f700000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000002
status:     1
block:      0x6813bfe5d06dc8e2c9b3dbf0ac3051d073462bc7f9cdc724148cb218010e6e3a
contract:   0x0000000000000000000000000000000000000000


ceth tx debug 0x2de0e9684d0f6937e3d4864244b9c6328285d49ab25e3b7881e6ca2d94760e4f | grep Ret
Ret:       0000000000000000000000000000000000000000000000000000000000000003

Creating ABI json file

We can further simplify the call with defining the input/output parameters in a JSON file. Let's create a new file:

[
    {
        "constant": false,
        "inputs": [
            {
                "name": "a",
                "type": "uint256"
            },
            {
                "name": "b",
                "type": "uint256"
            }
        ],
        "name": "add",
        "outputs": [
            {
                "name": "",
                "type": "uint256"
            }
        ],
        "payable": false,
        "stateMutability": "pure",
        "type": "function"
    },
    {
        "constant": false,
        "inputs": [
            {
                "name": "a",
                "type": "uint256"
            },
            {
                "name": "b",
                "type": "uint256"
            }
        ],
        "name": "sub",
        "outputs": [
            {
                "name": "",
                "type": "uint256"
            }
        ],
        "payable": false,
        "stateMutability": "pure",
        "type": "function"
    }
]

It looks very long, but it's nothing more, just the input/output type definition of our methods to make it possible to call it with the right tooling.

Now we don't need to add the parameter types, it's enough to define the abi, and it works as before:

ceth contract call --contract=0x9d575731D91516bbdF097e84fE9F440Ce3014A63 --abi=mathabi.abi 'add' 1 2

Using persistent store

So far we always submitted new transactions to execute code in our contract, but it's not always what is intended. Contracts run in an isolated environment which has ephemeral stack and memory assigned to it (they are dropped after contract execution). But contracts can also use a persistent store which is committed to the blockchain (see SSTORE instruction).

image-20220401153758956

As a result, we can have two type of contract calls:

  1. If we plan to change the persistent state of a contract, we need to create a new transaction as we did it until now. This is where we use eth_send[Raw]Transaction api calls. A good example is ERC-20 transfer, which requires to store the new balance of the source and destination wallet.
  2. But there is another type of contract execution which doesn't change the state of the contract. This can be done without any transaction, as the persistent state of the contract is always available on the nodes and code can be executed any time. This can be done with the eth_call RPC call, which has exactly the same parameters as the eth_sendTransaction .

Our helper tool supports both of them. Transaction can be created with cethacea contract call , and read only calls can be initiated with cethacea contract query .

Let's try int with a simple contract which stores (uint256) values associated to keys (uint256):

object "Contract1" {
   code {
      datacopy(0, dataoffset("runtime"), datasize("runtime"))
      return(0, datasize("runtime"))
   }
   object "runtime" {
      code {
         switch shr(0xe0,calldataload(0)) 
            case 0x541aea0f {
               sstore(calldataload(0x04),calldataload(0x24))
               return(0,0x0)
            }
            case 0x9507d39a {
               mstore(0x00, sload(calldataload(0x04)))
               return(0,0x20)
            }
      }
   }
}

This is almost the same as the previous contract but instead of add/sub we store the value (SSTORE) or get the value (SLOAD).

ceth evm compile --compiler=yulc map.yul

ceth contract deploy --account=key1 map.yul.bin 
Contract: 0x467Ba501dE54a9Fc844BE9598E239F06359A2d33
Transaction: 0x2ec4e6743cfcedaac8b4b2162beb9c96849ddc63c1d08b97d44ce7cadf342c13

And let's try to use the query first:

export CETH_CONTRACT=0x467Ba501dE54a9Fc844BE9598E239F06359A2d33

cethacea contract query --debug 'get(uint256)uint256' 1
{"level":"debug","data":"9507d39a0000000000000000000000000000000000000000000000000000000000000001","to":"467ba501de54a9fc844be9598e239f06359a2d33","from":"6fa89198e8b58f2715e691be8ce70da4edbb4a7c","response":"0000000000000000000000000000000000000000000000000000000000000000","time":"2022-04-01T15:09:11+02:00","message":"eth_call"}
0

cethacea contract query --debug 'put(uint256,uint256)' 1 10
{"level":"debug","data":"541aea0f0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000a","to":"467ba501de54a9fc844be9598e239f06359a2d33","from":"6fa89198e8b58f2715e691be8ce70da4edbb4a7c","response":"","time":"2022-04-01T15:10:03+02:00","message":"eth_call"}

cethacea contract query --debug 'get(uint256)uint256' 1
{"level":"debug","data":"9507d39a0000000000000000000000000000000000000000000000000000000000000001","to":"467ba501de54a9fc844be9598e239f06359a2d33","from":"6fa89198e8b58f2715e691be8ce70da4edbb4a7c","response":"000000000000000000000000000000000000000000000000000000000000000a","time":"2022-04-01T15:10:44+02:00","message":"eth_call"}
0

Please note that the last call (get) return with value 0 even if we tried to set it to 10 in the previous line. But it was not a transaction, the persistent state was committed to the blockchain. Let's try to change the value with real transaction:

cethacea contract call 'put(uint256,uint256)' 1 10
hash:       0x10f6c722a1dcbf000e858ef2754536603441bac6192b2dcd9f9b0a3b6b4c8954
to:         0x467Ba501dE54a9Fc844BE9598E239F06359A2d33
value:      0
...
data:       541aea0f0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000a
status:     1
block:      0x987b16db7decea7d70c509cf9276e4a70ebd95820c650192c197d8a74ba33109


ceth contract query 'get(uint256)uint256' 1 
10

It's very important to know which method calls may have side effects. Therefore, it can also be added to the abi json:

[
  {
    "inputs": [
      {
        "internalType": "uint256",
        "name": "a",
        "type": "uint256"
      }
    ],
    "name": "get",
    "outputs": [
      {
        "internalType": "uint256",
        "name": "",
        "type": "uint256"
      }
    ],
    "stateMutability": "view",
    "type": "function"
  },
  {
    "inputs": [
      {
        "internalType": "uint256",
        "name": "a",
        "type": "uint256"
      },
      {
        "internalType": "uint256",
        "name": "b",
        "type": "uint256"
      }
    ],
    "name": "put",
    "outputs": [],
    "stateMutability": "nonpayable",
    "type": "function"
  }
]

Here view means that it reads the internal state (it was pure in our ADD contract where internal state won't be required to read). For the put we use nonpayable which means that transaction is required but without a value.

Gas cost

When we use SSTORE it requires storing new state on the blockchain, which means that all Ethereum nodes will save the state. That's an expensive operation, as Ethereum chain data already requires a lot of space. Therefore, it should be less tempting to use.

During the execution of the contract, each step has a specific gas cost. The sum of the costs (+21000 base amount for each call) is multiplied by the current gas fee (cost of one unit of gas in ETH) and paid together with the transaction (query/read only call is free).

Executing an ADD operation is not a big deal, but using STORE is, because it asks all nodes to store some data forever. Let's try out multiple scenarios:

ceth contract call 'put(uint256,uint256)' 11 10 | grep gasUsed
gasUsed:    43497

ceth contract call 'put(uint256,uint256)' 11 10 | grep gasUsed
gasUsed:    23597

ceth contract call 'put(uint256,uint256)' 11 11 | grep gasUsed
gasUsed:    26397

ceth contract call 'put(uint256,uint256)' 11 0 | grep gasUsed
gasUsed:    21585

The first call is very expensive as each node should store one more amount of data, and they hate to do it.

The second transaction is cheap as it doesn't really change the data nodes already stored. Not good, but not terrible either.

The third is slightly more expensive as nodes should change the data.

The last one is the cheapest. It requires touching the persistent store, but after that it's possible to keep a smaller amount of data (as 0 is the default for all slots it's not required to be stored).

This is the reason why ERC-20 transactions can be more expensive if the target wallet is not yet used.

Conclusion

To sum up, in the previous blog post we created a very simple smart contract, together with the deployer code which deployed our ADD contract to the blockchain. In this post we followed the path of the hard-way and enhanced the smart contract with parameter parsing.

For production use cases a higher level contract language (such as Solidity) might be a better choice. But we hope that writing contracts with low-level tools, helps the understanding of the internals of EVM, and shows its simplicity behind all the complexity.

Share this blog post

Build on the distributed cloud.

Get S3-compatible object storage with better security, performance and cost.

Start for free
Storj dashboard
Made in Webflow