Hello Arithmetization

We are going to delve into arithmetization, particularly focusing on the R1CS method widely used in SNARK proof systems. Our journey will reveal how R1CS transforms programs into polynomial forms.

In previous posts, we delved into the fundamental primitives of cryptography and mathematics. Now, it’s time to explore the Zero-Knowledge Proof (ZKP) systems we’ve previously superficially talked about.

Prerequisites

Before we dive deep, there are a few basic prerequisites you should be familiar with to fully grasp the concepts discussed in this text. You don’t need to be an expert. A basic understanding will be more than enough. Feel free to continue reading after a brief review of the definitions below.

Dot product: It’s a way to multiply matrices. Imagine this as lining up two sets of numbers, multiplying each pair, and then adding them all up.
Polynomial Interpolation: Construct a polynomial that passes through the given points. The two most known algorithms are Lagrange and FFT. FFT is way faster than Lagrange. (way, way faster)
Basic Programming Knowledge: In some of the parts, we will use Python to not deal with long arithmetic operations (dot product, polynomial division, etc.). We don’t need to understand these codes. Our purpose is not to write code; code is simply a tool.

You probably heard about SNARKs and STARKs (sometimes referred to as zk-SNARKs and zk-STARKs). In this and subsequent posts, you’re going to learn about the steps involved in these proof systems and understand how they function. I am going to refer to this proof system as a ZKP system, but they don’t have to ensure the zero-knowledge property.

If a proof system doesn’t reveal anything about hidden inputs it is zero-knowledge proof system. We are not going to focus on this property.

General Steps of Zero-knowledge Proof

Arithmetization (This post)
Polynomial Commitment
IOP

Arithmetization

Our focus is on Arithmetization for this text. Polynomials have great properties that are going to help us when building proof systems. All the ZKP systems you will see rely on polynomials. During this initial step, we’ll transform our program into a set of polynomials. There are different ways to achieve this goal. But we can classify them into 2 categories.

Circuit computations: This method reduces the program to the level of gate level. However, the gates we’re referring to here are not the same as the physical gates found in processors. We’re talking about conceptual gates.
Machines computations: This approach is similar to how computer works. There are set of registers and a transition function that modifies the values held in these registers.

We are going to learn about one of the most popular arithmetization method R1CS which is used by SNARK proof systems.

R1CS-Rank One Constraint System

R1CS falls under the Circuit Computation category. To convert a program to a polynomial using R1CS, we have three tasks to complete.

Flattening

1The first step involves converting our program into simple statements. These statements typically look like x = y or x = y (op) z , where op represents basic arithmetic operations such as addition, multiplication, division, or subtraction.

Consider a function f that takes input x and returns x^3 + x + 5 :

def f(x):
    y = x**3
    return x + y + 5

Since we’re limited to using only basic arithmetic operations, we must transform exponents into multiplications. Thus, the flattened version of this code could be written as:

sym_1 = x * x
y = sym_1 * x
sym_2 = y + x
~out = sym_2 + 5

This process simplifies the original function into a sequence of basic operations. This step is necessary for further steps in R1CS.

Gates to R1CS:

Now we have a flattened version of our program in hand. For this program, our variables are one, x, out, sym1, y and sym2. Our next step is to convert this into an R1CS matrix representation. R1CS utilizes three vectors a, b, c , along with a solution vector s . This solution vector s should satisfy the condition that s·a * s·b - s·c = 0 ( where · represents dot product ). Each vector’s length is equal to the number of variables in our program, plus an extra constant variable always equals 1.

To clarify the next steps, let’s assume we executed the function f with the input 3 . We can represent our flattened version as:

9  = 3 * 3
27 = 9 * 3
30 = 27 + 3
35 = 30 + 5

For instance, to construct a valid R1CS representation for the first row

9 = 3 * 3 , we could use the following configuration:

Why do we need three vectors: The reason for having three vectors is due to the nature of the constraints R1CS is designed to represent. Remember, in circuit computation, we have conceptual gates. These gates have a left input, a right input, and one output. So, basically, the A vector represents the left input, the B vector represents the right input, and the C vector represents the output.

By constructing valid R1CS representations for each row, you will eventually compile these into three matrices.

import numpy as np

# Define the matrices A, B, and C
A = np.array([[0, 1, 0, 0, 0, 0],
              [0, 0, 0, 1, 0, 0],
              [0, 1, 0, 0, 1, 0],
              [5, 0, 0, 0, 0, 1]])
B = np.array([[0, 1, 0, 0, 0, 0],
              [0, 1, 0, 0, 0, 0],
              [1, 0, 0, 0, 0, 0],
              [1, 0, 0, 0, 0, 0]])
C = np.array([[0, 0, 0, 1, 0, 0],
              [0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 1],
              [0, 0, 1, 0, 0, 0]])

# Define the vector S
S = np.array([1, 3, 35, 9, 27, 30])

The s vector remains constant across all vectors since it represents our variables.

R1CS to QAP:

Having represented our program as matrices, our next goal is to transform these matrices into polynomials. This transformation is a necessary step to creating a succinct proof system. Thanks to the Schwartz-Zippel Lemma, we can build succinct SNARK-proof systems.

Schwartz-Zippel Lemma says that if the field over which the polynomials are defined is large enough, the probability that the two polynomials produce the same result at a randomly chosen point is very low. This probability is less than or equal to the degree of the polynomial divided by the size of the field. We can always compare the coefficients of these polynomials. However, it wouldn’t be succinct.

In this step, we are going to take a, b, c vectors and we will construct a Quadratic Arithmetic Program. This part is often the most challenging aspect of transitioning from R1CS. We are going to encode our vectors in this way:

Each column of the matrices will be encoded.
For example, matrix A’s first values to encode is going to be 0, 0, 0, 5 .
That means we will find a polynomial that equals tof(1) = 0, f(2) = 0, f(3) = 0, f(4) = 5

from scipy.interpolate import lagrange

# X values for Lagrange interpolation
x_vals = np.array([1, 2, 3, 4])

# Function to calculate Lagrange polynomials for all columns in a matrix
def calculate_lagrange(matrix):
    poly_list = []
    for column in range(matrix.shape[1]):
        y_vals = matrix[:, column]
        poly = lagrange(x_vals, y_vals)
        poly_list.append(poly.coefficients)
    return poly_list

lagrange_A = calculate_lagrange(A)
lagrange_B = calculate_lagrange(B)
lagrange_C = calculate_lagrange(C)

By applying this encoding process to each matrix, we’ll have 18 polynomials that represent our program. Let’s examine the polynomial created from the first column of matrix A:

0.833 · x³ — 5.000 · x² + 9.167 · x — 5.000

When we draw this curve, we see that it hits certain numbers 0, 0, 0, 5 at the points 1, 2, 3 and 4 along the x-axis.

Now for the last step, we’re going to use polynomials instead of matrices to show that our program is doing the right thing.

Over here, we need to apply the dot product to each matrix. This is a dot product of a 1x6 vector and a 6x4 vector, resulting in a 1x4 vector. Again, let’s use Python to make this easier.

A = np.dot(S, lagrange_A)
B = np.dot(S, lagrange_B)
C = np.dot(S, lagrange_C)

> [43, -73.33, 38.5, -5.16]
> [-3, 10.33, -5 ,0.66]
> [-41, 71.66, -24.5, 2.83]

Now that we have these values, we can calculate Z(x) = A(x) · B(x) — C(x). But we need to think of them as polynomials because they’re coefficients.

from numpy.polynomial.polynomial import polymul, polysub

T = polysub(polymul(A, B), C)

So coefficients of T are -88.0, 592.66, -1063.77, 805.83, -294.77, 51.5, -3.44 .

Coefficients are in ascending order

But we are not going to check the polynomial T directly. Instead, we’re going to divide it by another polynomial Z. This curve is the simplest polynomial that is equal to zero at all points for our program. So in our case, the simplest Z is (x-1) · (x-2) · (x-3) · (x-4).

Let’s find this curve using python:

p1 = [1, -1] #(x-1)
p2 = [1, -2] #(x-2)
p3 = [1, -3] #(x-3)
p4 = [1, -4] #(x-4)

Z = polymul(p1, p2) #(x-1) · (x-2)
Z = polymul(Z, p3)  #(x-1) · (x-2) · (x-3)
Z = polymul(Z, p4)  #(x-1) · (x-2) · (x-3) · (x-4)
Z

So our polynomial Z is 24x⁴ — 50x³ + 35x² — 10x + 1 . Finally, to find H(x) we need to do one last polynomial division.

H = polydiv(T, Z)

The coefficients of polynomial H are -3.66, 17.05, -3.44 . Remember, these are in ascending order, so our H is -3.44 · x^2 + 17.05 · x — 3.66 . There is no remainder. If we did any of the steps wrong or if we tried to do the same steps with variables that didn’t hold the constraints, this division would leave a remainder.

What if S is wrong?

Let’s say we change our vector S to 1, 4, 35, 9, 27, 30 , which is wrong because to get 35, our input should have been 3. If we redo everything with these wrong numbers, we would end up with polynomial H with coefficients -8, 28.916, -5.83 but different from the correct solution. We also have big remainders -22, 48.16, -22, 2.83 , showing us something went wrong.

And that’s the end of this text! Thank you so much for sticking with me through this text. Your thoughts and feedback are more than welcome; they make this content (and my knowledge as well). So, if you have a moment, please share your ideas or questions. Also, you can always contact me through Twitter or Telegram.

Until the next part, happy learning!

Emir Soyturk

Hello Arithmetization — R1CS