Detailed explanation of Kakarot zkEVM: Starknet's journey towards EVM compatibility.

TL;DR

A virtual machine is a software simulation of a computer system that provides an execution environment for programs. It can simulate various hardware devices, allowing programs to run in a controlled and compatible environment. The Ethereum Virtual Machine (EVM) is a stack-based virtual machine used to execute Ethereum smart contracts.
zkEVM is an EVM that integrates zero-knowledge proof/validity proof technology. It allows for the verification of EVM execution using zero-knowledge proofs without requiring all verifiers to re-execute the EVM. There are various zkEVM products in the market, each with its own methods and designs.
The need for zkEVM arises from the demand for a virtual machine that supports smart contract execution on Layer 2. Additionally, some projects choose to use zkEVM to leverage the extensive user ecosystem of the EVM and design instruction sets that are more friendly to zero-knowledge proofs.
Kakarot is a zkEVM implemented on Starknet using the Cairo language. It simulates the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Kakarot faces challenges such as compatibility with the Starknet account system, cost optimization, and stability, as Cairo language is still in the experimental stage.
Warp is a converter that translates Solidity code into Cairo code, providing compatibility at the high-level language level. On the other hand, Kakarot provides compatibility at the EVM level by implementing EVM opcodes and precompiles.

What is a virtual machine?

To understand what a virtual machine is, we must first understand the execution process of computers under the mainstream von Neumann architecture. Various programs running on computers are usually written in high-level languages and undergo multiple transformations to generate machine-readable machine code for execution. Depending on the method of transformation into machine code, high-level languages can be roughly divided into compiled languages and interpreted languages.

Compiled languages refer to languages that, after the code is written, need to be processed by a compiler to convert the high-level language code into machine code and generate executable files. They can be executed multiple times with higher efficiency after compilation. The advantages of compiled languages are that they are fast in execution because the code is converted into machine code during compilation, and programs can be run without a compiler installed, making it easy for users to use without installing additional software. Common compiled languages include C, C++, and Go.

In contrast, interpreted languages refer to codes that are interpreted and executed line by line through an interpreter, running directly on the computer. Each time it runs, it needs to go through the translation process again. The advantages of interpreted languages are high development efficiency and easy code debugging, but the execution speed is relatively slow. Common interpreted languages include Python, JavaScript, and Ruby.

It is important to note that languages do not fundamentally distinguish between compiled and interpreted languages; it is just a tendency in their initial design. In most cases, C/C++ is executed in a compiled manner, but it can also be interpreted (Cint, Cling). Many traditionally interpreted languages are now compiled into intermediate code and executed on a virtual machine (Python, Lua).
Now that we understand the execution process of physical machines, let's talk about virtual machines.

A virtual machine typically provides a virtual computing environment by simulating different hardware devices. Different virtual machines can simulate different hardware devices, but they usually include a CPU, memory, hard disk, network interface, etc.

Taking the Ethereum Virtual Machine (EVM) as an example, the EVM is a stack-based virtual machine used to execute Ethereum smart contracts. The EVM provides a virtual computing environment by simulating CPU, memory, storage, and stack hardware devices.

Specifically, the EVM is a stack-based virtual machine that uses a stack to store data and execute instructions. The EVM's instruction set includes various opcodes, such as arithmetic operations, logical operations, storage operations, jump operations, etc. These instructions can be executed on the EVM's stack to complete the execution of smart contracts.

The memory and storage simulated by the EVM are devices used to store the state and data of smart contracts. The EVM treats memory and storage as two different areas and can access the state and data of smart contracts by reading and writing to memory and storage.

The stack simulated by the EVM is used to store the operands and results of instructions. Most of the instructions in the EVM's instruction set are stack-based, meaning they read operands from the stack and push results back to the stack.

In summary, the EVM provides a virtual computing environment by simulating CPU, memory, storage, and stack hardware devices. It can execute instructions of smart contracts and store the state and data of smart contracts. In actual operation, the EVM loads the bytecode of smart contracts into memory and executes the logic of smart contracts by executing the instruction set. The EVM effectively replaces the operating system + hardware part in the diagram.

The design process of the EVM is clearly bottom-up, with the simulation of hardware environment (stack, memory) determined first, followed by the design of its own set of assembly instructions (Opcode) and bytecode based on the corresponding environment. Although the assembly instruction set is for human readability, it involves a lot of low-level knowledge and requires high demands on developers. It can be cumbersome to develop, so a high-level language is needed to shield the obscure and cumbersome low-level calls and provide a better experience for developers. Due to the customized design of the assembly instruction set, it is difficult to directly use traditional high-level languages for the EVM. Therefore, a new high-level language was developed to adapt to this virtual machine. In order to optimize EVM execution efficiency, the Ethereum community has designed two compiled high-level languages, Solidity and Vyper, for the EVM. Solidity is widely used, while Vyper is an improved EVM high-level language designed by Vitalik to address some of the shortcomings of Solidity. However, Vyper has not gained high adoption in the community and has gradually faded out of the historical stage.

What is zkEVM?

Simply put, zkEVM is an EVM that applies zero-knowledge proof/validity proof technology to efficiently and cost-effectively verify the execution process of the EVM using zero-knowledge proofs, without requiring all verifiers to re-execute the EVM.

There are many zkEVM products in the market, and the competition is fierce. The main players include Starknet, zkSync, Scroll, Taiko, Linea, Polygon zkEVM (formerly Polygon Hermez), etc., which Vitalik has classified into 5 types (1, 2, 2.5, 3, 4). For more details, please refer to Vitalik's blog.

Why do we need zkEVM?

This question needs to be viewed from two perspectives.

In the initial attempts of zk Rollup, only simple transfer and transaction functions could be implemented, such as zkSync Lite, Loopring, etc. However, people have become accustomed to the Turing-complete EVM on Ethereum and have started to demand a virtual machine on Layer 2 that can create diverse applications through programming. The need to write smart contracts is one reason.

Due to the unfriendly design of some parts of the EVM for generating zero-knowledge proofs/validity proofs, some players choose to use instruction sets that are friendly to zero-knowledge proofs/validity proofs at the lower level, such as Starknet's Cairo Assembly and zkSync's Zinc Instruction. However, everyone is also unwilling to give up the extensive user ecosystem of the EVM, so they choose to provide compatibility with the EVM at the higher level, which is Type 3 and Type 4 zkEVM. Some players still insist on using the traditional instruction set Opcode of the EVM and focus on generating more efficient proofs for Opcode, which is Type 1 and Type 2 zkEVM. The extensive ecosystem of the EVM is another reason.

Kakarot: A virtual machine on a virtual machine?

Why can we create another virtual machine on a virtual machine? This is a common thing for computer professionals, but it may not be obvious to users who are not familiar with computers. It is actually quite easy to understand. It's like building with building blocks. As long as the lower layers are solid enough (with a Turing-complete execution environment), you can add blocks infinitely on top. However, no matter how many layers are built, the final execution still needs to be handed over to the physical hardware at the lowest level, so increasing the number of layers will reduce efficiency. At the same time, as the design of different blocks (virtual machines) varies, the higher the stack of blocks, the greater the possibility of collapse (runtime errors), which requires higher technical expertise to support.

Kakarot is an EVM implemented on Starknet using the Cairo language. It simulates the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Compared to implementing an EVM, which is not a difficult task, there are existing EVM implementations in various languages such as Go-Ethereum (written in Golang), Python, Java, JavaScript, and Rust.

The technical challenges of Kakarot zkEVM lie in the fact that the protocol exists as a contract on Starknet, which brings two key issues.

Compatibility: Starknet uses a completely different account system from Ethereum. In Ethereum, accounts are divided into EOA (Externally Owned Accounts) and CA (Contract Accounts), while Starknet supports native account abstraction, where all accounts are contract accounts. Additionally, due to the use of different cryptographic algorithms, users cannot generate the same addresses in Starknet using the same entropy as in Ethereum.
Cost: Since Kakarot zkEVM exists as a contract on the chain, there are high requirements for code implementation. It needs to be optimized for Gas to reduce interaction costs.
Stability: Unlike traditional high-level languages such as Golang, Rust, and Python, Cairo language is still in the experimental stage. From Cairo 0 to Cairo 1 and now Cairo 2 (or Cairo 1 version 2, if you prefer), the official team is still making modifications to the language features. At the same time, Cairo VM has not undergone sufficient testing, so there is a possibility of large-scale rewriting in the future.

The Kakarot protocol consists of five main components (the GitHub documentation mentions four, excluding EOA, but this article has been adjusted for better understanding):

Kakarot (Core): Responsible for executing Ethereum-style transactions and providing corresponding Starknet accounts for Ethereum users.
Contract Accounts: Equivalent to CA in Ethereum, responsible for storing the bytecode of contracts and the state of variables in contracts.
Externally Owned Accounts: Equivalent to EOA in Ethereum, responsible for forwarding Ethereum transactions to Kakarot Core.
Account Registry: Stores the mapping between Ethereum accounts and Starknet accounts.
Blockhash Registry: As a special opcode, Blockhash requires past block data, which Kakarot cannot directly obtain on-chain. This component stores the mapping between block_number and block_hash, which is written by the administrator and provided to Kakarot Core.

According to Elias Tazartes, CEO of Kakarot, in the latest version of the team, the design of the Account Registry has been abandoned, and a mapping from a 31-byte Starknet address to a 20-byte EVM address is used to store the corresponding relationship. In the future, to improve interoperability and allow Starknet contracts to register their own EVM addresses, the design of the Account Registry may be reconsidered.

Compatibility with EVM on Starknet: What are the differences between Warp and Kakarot?

In terms of the zkEVM types defined by Vitalik, Warp belongs to Type 4, while Kakarot currently belongs to Type 2.5.

Warp is a transpiler that translates Solidity code into Cairo code. It allows Solidity developers to maintain their original development state without having to learn a new language like Cairo. For many projects, Warp lowers the entry barrier to the Starknet ecosystem, as they do not need to rewrite a large amount of code using Cairo.

The translation approach is simple, but it has the worst compatibility. Some Solidity code cannot be translated well into Cairo, and modifications to the source code are required to complete the migration, especially for code logic related to account systems and cryptographic algorithms. The Warp documentation provides specific unsupported features. For example, many projects differentiate between EOA accounts and contract accounts in terms of execution logic, but all accounts in Starknet are contract accounts, so this part of the code needs to be modified before translation.

Warp provides compatibility at the high-level language level, while Kakarot provides compatibility at the EVM level.

By completely rewriting the EVM, implementing opcodes and precompiles, Kakarot achieves higher native compatibility. After all, executing in the same virtual machine (EVM) is always more compatible than executing in different virtual machines (Cairo VM). The Account Registry and Blockhash Registry cleverly shield the differences between different systems, minimizing friction for user migration.

Kakarot Team

Thanks to the Kakarot team, especially Elias Tazartes, for their valuable feedback on this article. Thank you, sir!