# on simple private information retrieval experiments

By [bt3gl's symposium](https://paragraph.com/@go-outside) · 2023-06-16

---

tl; dr
------

today i go over a tool i wrote to learn and run simple experiments on PIR (a fascinating research subject in cryptography).

[**here is the source code.**](https://github.com/go-outside-labs/blockchain-science-py/tree/main/magick-py)

this particular research is highly based on the work of [**alexandra henzinger**](https://github.com/ahenzinger/simplepir) on [**SimplePIR/DoublePIR**](https://eprint.iacr.org/2022/949) and [**janmajaya mall**](https://github.com/Janmajayamall)’s [**zuzalu demo**.](https://docs.google.com/presentation/d/1ESZ2xZeyBYyzc-AWvZwc8o4ioOGGZtK9KlnsplzsfQ0/edit)

if you are advanced in the subject already, here is an excellent presentation by alexandra (at the [**simons institute**](https://simons.berkeley.edu/homepage), a foundation from [**quant-father jim simons**](https://en.wikipedia.org/wiki/Jim_Simons_\(mathematician\)), which also sponsored part of my research in my phd in physics):

[![]({{DOMAIN}}/editor/youtube/play.png)](https://www.youtube.com/watch?v=dmUgkoKZT2I)

* * *

👾 today’s outline
------------------

    0000. introduction to pir and this work
    0001. magick-py, a tool for simple PIR 
    0010. setting up magick-py 
    0011. experiment I: linear key regev encryption with sampled error
    0100. experiment II: linear key regev encryption with a scaled message
    0101. experiment III: proving the scheme is additive homomorphic
    0110. experiment IV: proving the scheme supports plaintext inner product
    0111. experiment V: a very simple pir setup without encryption
    1000. experiment VI: a full secret key regev PIR experiment
    1001. concluding thoughts
    

* * *

🎶 today’s mood
---------------

[https://open.spotify.com/track/5jETivOoz9oNxVgNSHJmuU?si=2154f67fd5d7459b](https://open.spotify.com/track/5jETivOoz9oNxVgNSHJmuU?si=2154f67fd5d7459b)

* * *

0000\. introduction to private information retrieval, lattice-based cryptography, and fully homomorphic encryption schemes
--------------------------------------------------------------------------------------------------------------------------

### what’s PIR

**private information retrieval** (PIR) was first introduced in 1995 by [b. chor et al](http://www.wisdom.weizmann.ac.il/~oded/p_pir.html). and refers to the **ability to query a database without revealing which item is looked up or whether it exists,** by using cryptography primitives.

it’s a pretty cool technology. once PIR becomes less expensive or prohibitive (_i.e._, cheaper computation with a small cipher, as PIR inherently has a high cost for server-side computation), these are some of the possible applications that could utilize the protocol:

*   **searching IP databases**: when filing a new IP, the author must search the IP database to check that no previous entry significantly overlaps with their invention. PIR could allow the search to be performed without leaving search terms on the query log of the IP database.
    
*   **real-time asset quotes**: investors interested in a particular asset often monitor the market to determine when to purchase. PIR could allow their interest to be confidential.
    
*   **safe browsing and private oracles, checking passwords over breached databases (or any type of credentials), certificate transparency (CT) checks, certificate revocation checks,** among many others.
    

by the way, PIR schemes are generally divided into **single-server schemes** and **multiple-server schemes** (which allows you to remove the trust from a subset of the servers).

in this research, we will look at simple single-server PIR protocol setups, where a server holds an embedded database `D` represented by a `n x n` square matrix (whose elements are under a constant modulo), and a client wants to privately read the `ith` database item (`Di`, with `n` elements) without letting the server learn about `i`.

![](https://storage.googleapis.com/papyrus_images/fabccdecab9b34d2fdffd0443ff63407621043db2c76e879edfc1091746d52c3.png)

* * *

### interlude: lattice-based cryptography and group theory

the subject of PIR is also a subset of the broad topic of **lattice-based cryptography**, which refers to a series of **quantum-resistant** cryptographic primitives that involve lattices, either in their construction or in the security proof.

> 💡 _Over an n-dimensional vector space, a lattice is an infinite set of points represented by a collection of vectors._

if you need an introduction to quantum cryptography, check out my (uber-old and outdated) [**def con ‘16**](https://defcon.org/) talk:

[![]({{DOMAIN}}/editor/youtube/play.png)](https://www.youtube.com/watch?v=1Fp6ibfOQ4Y)

> 💡 _in group theory, a lattice in the_ `R^n` _is an infinite set of points in this space in which coordinate-wise addition or subtraction of two points produces another point, so every point in the space is within some maximum distance of any lattice point._
> 
> _a lattice can also be described as a free abelian (commutative) group of dimension_ `n`_, spanning the vector space_ `R^n`_; or the symmetry group of a discrete translation symmetry in_ `n` _directions._
> 
> _if you would like to learn more about group theory, especially in a language for physicists, here is a_ [**_free and open-source graduate book i wrote_**](http://www.astro.sunysb.edu/steinkirch/books/group.pdf) _back when i was a student._

![.an excerpt from my book, addressing homomorphism for covering groups in lie algebra.](https://storage.googleapis.com/papyrus_images/5a0cf338a490a7c63bbf4c521f7097ea0d032746f3a8e848f6ffb34ad70b491e.png)

.an excerpt from my book, addressing homomorphism for covering groups in lie algebra.

* * *

### homomorphic encryption schemes

before we start, we need to review **the concept of homomorphic encryption**.

suppose a server that can `XOR` some client’s data. the client would send their cipher `c0`, obtained from their plaintext data `m0` and their key `k0`:

    c = m0 ⌖ k0
    

**homomorphism** is the property that if a client sends two encrypted messages, `c1` and `c2` (from messages `m0` and `m1`, respectively), the server can return `c1 ⌖ c2` so the client can retrieve `m0 ⌖ m1`.

**additive homomorphism** occurs when, given two ciphertexts `(a0, c0)` and `(a1, c1)`, their sum `(a0 + a1, c0 + c1)` decrypts to the sum of the plaintexts (provided that the error remains sufficiently small).

**partially homomorphic encryption** can be easily achieved as it accepts the possibility that not all data is encrypted (or homomorphic) through other operations (such as multiplication).

**fully homomorphic encryption (FWE)**, which is much harder to achieve, would occur if a server operated on encrypted data **without seeing ANY of its content.**

> 💡 \*in a more formal definition, homomorphic encryption is a form of encryption with evaluation capability for computing over encrypted data without access to the secret key, i.e., supporting arbitrary computation on ciphers. fully homomorphic encryption could be defined as the evaluation of arbitrary circuits of multiple types of (unbounded depth) gates (relevant to zero-knowledge proof setups). vitalik’s talks about homomorphic encryption in [**this post**](https://vitalik.eth.limo/general/2020/07/20/homomorphic.html).\*

* * *

### learning with errors (LWE)

a subsequent important progress in the the field was a [**2005 seminal paper**](https://dl.acm.org/doi/10.1145/1060590.1060603)**,** where oded regev introduced **the first lattice-based public-key encryption scheme**, and the **learning with errors** (LWE) problem.

the LWE problem relies on the hardness of distinguishing between a message with added noise and a random sample. it can be thought of as a search in a (noisy) modular set of equations whose solutions can be very difficult to solve. in other words, given m samples of coefficients (bi, ai) in the linear equation `bi = <ai, s> + ei`, with the error `ei` sampled from a small range `[-bound, bound]`, finding the secret key s is "hard".

note, however, that LWE-based encryption schemes have a **significant drawback due to noise growth**. as the ciphertexts produced by these schemes are noisy encodings of the plaintext, **homomorphic operations between ciphertexts increase the magnitude of the noise**. if the noise exceeds a certain threshold, the correctness of the decryption may no longer hold. despite this problem, **regev encryption** can be very efficient for PIR as it is additively homomorphic.

in the past decades, regev's security proof and the LWE scheme's efficiency have been the subject of intense research among cryptographers, including [craig gentry's thesis on the first fully homomorphic encryption scheme (2009)](https://crypto.stanford.edu/craig/craig-thesis.pdf).

* * *

### a simple implementation of the PIR protocol

a PIR protocol aims to design **schemes that satisfy privacy and correctness constraints while achieving the minimum possible download cost**.

> 💡 _the_ **_download cost_** _of a PIR scheme is defined as_ **_the total number of bits downloaded by the user from all the databases, normalized by the message size_**_. the_ **_PIR rate_** _is defined as_ **_the reciprocal of the PIR download cost_**_._

one possible implementation approach is to choose a suitable polynomial and then have a single server preprocess the data. this preprocessing depends only on the database `D` and the public parameters of the regev encryption scheme, so that the server can reuse the work across many queries from many independent clients.

after the preprocessing step, to answer a client's query, the server must compute only roughly `N 32-bit` integer multiplications and additions on a database of `N bytes`. the catch is that the client must download a _hint_ matrix about the database contents after this preprocessing.

therefore, a simple serve PIR scheme would comprise two phases:

*   **the offline phase**, with pre-computations and the exchange of _hints_, and
    
*   **the online phase**, with the query processing on the server and response decoding on the client.
    

the practicality of PIR-based applications is primarily impacted by the query processing time and the hint exchange phase. the theoretical query size grows as the square root of the number of field elements representing the database. for example, the largest query size for a database of `32 GB` is around `600 KB`.

* * *

### why PIR is still not feasible

although modern PIR schemes require surprisingly little communication and the protocol works well enough at smaller scales, the time needed to scan it grows proportionally as the database grows. for bigger databases, the process becomes prohibitively inefficient (fetching a database record grows only polylogarithmically with the number of records, `N`).

after preprocessing the database, the server can answer a query in time sublinear in `N`. thus, the current hard limit on the throughput of PIR schemes is the ratio between the database size and the server time to answer a query (the speed with which the PIR server can read the database from memory).

finally, it's important to note that PIR protocols do not ensure data integrity or authentication. an authenticated PIR scheme could combine an unauthenticated multi-server PIR scheme with a standard integrity-protection mechanism, such as merkle trees.

in this approach, PIR servers download the data from the blockchain to construct PIR databases. for each database, the PIR server creates a description file (usually called a _manifest file_). the user collects all available block headers and fetches the manifest files from the PIR servers to query the PIR database later efficiently.

* * *

### ["simple and fast single-server private information retrieval", by alexandra henzinger et. al (2022)](https://eprint.iacr.org/2022/949)

*   this paper introduces a design for **SimplePIR**, **the fastest single-server PIR scheme known to date**.
    
*   the security is held under a **Learning with Errors scheme** that requires no polynomial arithmetic or fast fourier transforms. regev encryption gives a secret-key encryption scheme that is secure under the LWE assumption.
    
*   to answer a client’s query, the server performs fewer than **one 32-bit multiplication** and **one 32-bit addition** per **database byte**, achieving **10 GB/s/core server throughput**.
    
*   the first approach to **query a 1 GB database** demands the client to first download a **121 MB "hint" about the database contents**. then, the client can make any number of queries, each requiring **242 KB of communication**.
    
*   the second approach **shrinks the hint to 16 MB**. then, the following queries demand **345 KB of communication**.
    
*   finally, the scheme is applied, together with a novel data structure for approximate set membership, to **private auditing in certificate transparency**. the results can be compared to google chrome’s current approach, with **16 MB of downloads per month, and 150 bytes per TLS connection**.
    

* * *

### this work: my illustration of simplePIR

in our code, the single-server database is represented by a square matrix `(m x m)`, while a query is a vector filled by `0s` except at the asking row and column `(m x 1)`. any result should have the same dimension as the query vector (_i.e._, the space is reduced to the size of the column where the data is located).

the server retrieves the queried item by:

1.  looping over every column and multiplying their values to the value in the same row of the query vector, and
    
2.  adding the values found in each column in its own matrix.
    

a secret key regev encryption scheme using sampled errors to reproduce LWE is then built on top of the ideas above. privacy is guaranteed by checking that fully homomorphic encryption is held with respect to addition in this setup (_i.e._, additive homomorphism).

* * *

### the rest of this article

now that we have some context, we are ready to do some coding and math.

in the following sections, i will be illustrating a simple single-server PIR setup with LWE, built on several progressive experiments:

1.  running a simple linear key regev encryption experiment with sampled error
    
2.  running a simple linear key regev encryption experiment with a scaled message
    
3.  proving that the regev scheme is additive homomorphic
    
4.  proving that the regev scheme supports plaintext inner product
    
5.  running a very simple pir setup (without encryption)
    
6.  running the full secret key regev PIR experiment
    

* * *

0001\. magick-py, a CLI tool for simple PIR
-------------------------------------------

we will now be building the parts of the tool i wrote to help understand how PIR can work. i call it [**magick**](https://github.com/go-outside-labs/blockchain-science-py/tree/main/magick-py):

![](https://storage.googleapis.com/papyrus_images/258e946c3f219701119b4ce731ed5eca72d1e50f5dfa8fb94ce42d952c56ce12.png)

* * *

### primitive classes for message and regev encryption

before we jump into the experiments, let’s take a look at the main parts of [**magic-py**](https://github.com/go-outside-labs/blockchain-science-py/blob/main/magick-py/README.md)’s source code:

![](https://storage.googleapis.com/papyrus_images/0d1ca1d4e1107aaa7f512349a727a4aae49086009dfd3f1aa700e63a89f09c89.png)

more specifically, we want to look at the primitive classes.

this is how `Message()` in `message.py` looks:

    class Message:
    
        def __init__(self, mod=None, rows=None, cols=None, message=None):
            """Initialize a message vector"""
    
            self.mod = mod
            self.rows = rows
            self.cols = cols
            self.message = message
    
    
        ############################
        #      Private methods 
        ############################
    
        def _check_dimensions(self, other_msg) -> None:
            """Check the dimensions of two matrices"""
    
            if self.rows != other_msg.rows or self.cols != other_msg.cols:
                    log_error(f'Matrices have different dimensions:')
                    exit_with_error(f'{self.rows}x{self.cols} and {other_msg.rows}x{other_msg.cols}')
    
        def __add__(self, vector):
            """Add two matrices"""
    
            self._check_dimensions(vector)
            
            for i in range(len(self.message)):
                self.message[i] = (self.message[i] + vector.message[i]) % self.mod
    
            return self
    
        def __sub__(self, vector):
            """Subtract two matrices"""
    
            self._check_dimensions(vector)
    
            for index in range(len(self.message)):
                self.message[index] = (self.message[index] - vector.message[index]) % self.mod
            
            return self
    
        def __mul__(self, vector):
            """Multiply two matrices"""
    
            this_vector = [0] * (self.rows * vector.cols)
            for i in range(self.rows):
                for j in range(self.cols):
                    for k in range(vector.cols):
                        this_vector[i * vector.cols + k] = (this_vector[i * vector.cols + k] + \
                                                           (self.message[i * self.cols + j] * \
                                                           vector.message[j * vector.cols + k])) % self.mod
            
            return Message(self.mod, self.rows, vector.cols, this_vector)
        
        def __eq__(self, vector):
            """Check if two matrices are equal"""
    
            return (self.rows == vector.rows) and \
                   (self.cols == vector.cols) and \
                   (self.message == vector.message)
    
        def __repr__(self):
            """Print the message vector"""
    
            return f'\nRows: {self.rows}\nCols: {self.cols}\nVector: {self.message}\n'
    
    
        ############################
        #     Public methods 
        ############################
    
        def calculate_scaling(self, numerator, denominator, this_mod):
            """Calculate the scaled message vector"""
    
            this_vector = [0] * (self.rows * self.cols)
    
            for i in range(len(self.message)):
                this_vector[i] = round((numerator * self.message[i]) / denominator) % this_mod
    
            return Message(this_mod, self.rows, self.cols, this_vector)
    
        def set_query_element(self, row, col, value) -> None:
            """Set the value at a particular index"""
    
            self.message[row * self.cols + col] = value
            
        def get_query_element(self, row, col) -> int:
            """Get the value at a particular index"""
    
            return self.message[row * self.cols + col]
    
    
        ############################
        #     Static methods 
        ############################
    
        @staticmethod
        def create_random_message(mod, rows, cols): 
            """Create a random message vector"""
    
            return Message(mod, rows, cols, [random.randint(0, mod - 1) for _ in range(rows * cols)])
    
        @staticmethod
        def create_zero_message(mod, rows, cols): 
            """Create a zero message vector"""
    
            return Message(mod, rows, cols, [0 for _ in range(rows * cols)])
    
        @staticmethod
        def calculate_sample_error(bound, mod, rows, cols): 
            """Create a random error message vector"""
    
            return Message(mod, rows, cols, [sample_error(bound) % mod for _ in range(cols * rows)])
    

and this is how `Regev()` in `regev.py` looks:

    class Regev():
    
        def __init__(self):
    
            self.mod = None
            self.n = None
            self.m = None
            self.p = None
            self.bound = None
            self._load_env_parameters()
    
        ############################
        #      Private methods
        ############################
    
        def _load_env_parameters(self) -> None:
            """Load environment variables"""
    
            env_vars = load_config()
            self.mod = int(env_vars['mod'])
            self.n = int(env_vars['n'])
            self.m = int(env_vars['m'])
            self.p = int(env_vars['p'])
            self.bound = int(env_vars['bound'])
    
    
        ############################
        #      Public methods
        ############################
    
        def print_results(self, m0, m1, m0_string, m1_string) -> None:
            """Print the results of the experiment"""
    
            if m0 == m1:
                log_info(f'Original msg was successfully retrieved!\n')
            else:
                log_error(f'Original msg was not retrieved.')
    
            log_info(f'{m0_string}: {m0}\n')
            log_info(f'{m1_string}: {m1}\n')
            log_info(f'Parameters: \nmod: {self.mod} \nn: {self.n} \nm: {self.m} \np: {self.p} \nbound: [-{self.bound}, {self.bound}] \n')
    
        def print_noise_growth(self, m0, m1, noise_growth) -> None:
            """Print the noise growth"""
    
            log_info(f'Correct decryption for Delta / 2: {(self.mod / self.p) / 2}? {m0 == m1}')
            log_info(f'Noise growth: {noise_growth.message[0]}')
    
        def create_secret_key(self, this_mod=None, msg_n=1):
            """Create a secret key vector"""
    
            if this_mod is None:
                this_mod = self.mod
    
            return  Message.create_random_message(this_mod, self.n, msg_n)
    
        def create_message_setup(self, this_m=None, this_n=None, this_mod=None, msg_n=None):
            """Create a message vector setup"""
            
            if this_mod is None:
                this_mod = self.mod
            
            if this_m is None:
                this_m = self.m
            
            if this_n is None:
                this_n = self.n
            
            if msg_n is None:
                msg_n = 1
    
            # message vector of size `m`, where each element has a modulus `mod`
            m0= Message.create_random_message(this_mod, self.m, msg_n)
    
            # public    
            A = Message.create_random_message(self.mod, self.m, self.n)
    
            # error vector
            e = Message.calculate_sample_error(self.bound, self.mod, self.m, msg_n)
    
            return m0, A, e
    
        ############################
        #      Static methods
        ############################
    
        @staticmethod
        def calculate_encryption(A, s, e, m0):
            """
                Encrypt this message with a simple `B = A * s + e + m0`, 
                where `s` is the secret and `e` is the error vector.
                Set the cipher as the tuple c = (B, A).
            """
    
            B = (A * s) + e + m0
            return (B, A)
    
        @staticmethod
        def calculate_decryption(s, c):
            """ 
                Calculate the decryption of a ciphertext, given c
                and a secret, such that m1 = m0 + e.
            """
    
            B = c[0]
            A = c[1]
            return B - (A * s)
    

in the subsequent sections, we review each experiment's code and results. but first, let me show you how you can set up the tool.

* * *

0010\. setting up magick-py
---------------------------

install the requirements in a virtual environment with:

    python3 -m venv venv
    source venv/bin/activate
    make install_deps
    

create a `.env` file and set the environment variables and parameters.

LWE parameters are:

*   size of msg vector, `m` and `n`
    
*   message’s modulo `mod` and `p`
    
*   a work around the sampling errors (**i.e.,** the standard variation `sigma` of a Gaussian distribution with zero mean `sigma`) by setting a `bound` range for them
    

you should also take a look at [lattice-estimator](https://github.com/malb/lattice-estimator) to learn how to make security estimates for LWE parameters.

finally, install magick with:

    make install
    

* * *

0011\. experiment I: simple linear encryption and decryption of a msg vector with a sampled error vector
--------------------------------------------------------------------------------------------------------

in this simple experiment of learning with error (LWE), we operate our message vector over a ring modulo `mod`, so some information is lost. luckily, **gaussian elimination** can still be used to recover the original message vector as it works over a ring modulo `mod`.

the method for this experiment can be found at `experiments/simple_encryption.py`:

    def linear_secret_key_regev_encryption_with_error() -> None:
        """ 
            This method runs a secret key Regev encryption and decryption 
            experiment for a msg vector with a sampled error vector.
    
            In this simple example of learning with error (LWE), we operate
            our message vector over a ring modulo mod, such that some
            information is lost. This is not a problem since gaussian elimination
            can be used to recover the original message vector (i.e., it works
            over a ring modulo mod).
    
            We represent the message vector m0 of size m where each element is
            modulus mod. The cipertext c is B = A * s + e + m0, which can be
            decrypted as c = (B, A).
        """
    
        ########################################################################
        # 1. Key generation
        ########################################################################
        regev = Regev()
        m0, A, e = regev.create_message_setup()
        s = regev.create_secret_key()
    
        ########################################################################
        # 2. Encryption by calculating B and ciphertext c
        ########################################################################
        c = regev.calculate_encryption(A, s, e, m0)
    
        ########################################################################
        # 3. Calculate the decryption of the ciphertext c
        ########################################################################
        m1 = regev.calculate_decryption(s, c)
    
        ########################################################################
        # 4. The message vector m1 should be equal to m0 plus the error vector e
        ########################################################################
        regev.print_results(m0, m0 + e, 'm0', 'm0 + e')
    

simply put, these are the steps of this experiment:

1.  represent a message vector `m0` of size `m`, where each element has modulo `mod`.
    
2.  encrypt this message with a simple `B = A * s + e + m0`, where `s` is the secret and `e` is an error vector.
    
3.  set the ciphertext as the tuple `c = (B, A)`.
    
4.  decrypt `c = (B, A)` for a given `s`, such that `m1 = m0 + e`.
    

here is an example of an output:

![](https://storage.googleapis.com/papyrus_images/7786229d6757fe9673507dbac96ee139856078c24b628ec7c2fa9beab3d2a2d2.png)

* * *

0100\. experiment II: secret key regev encryption by scaling a message vector
-----------------------------------------------------------------------------

in this another simple example of learning with error (LWE), we lose information on the least significant bits by adding noise, _i.e._, by scaling the message vector (before adding it to encryption) with:

    delta = mod / p
    

then, during the decryption, we scale the message vector back by:

    1 / delta
    

the scaling ensures that `m` is in the highest bits of the message vector, without losing information by adding the error vector `e`.

consequently, the message `m0` vector has each element modulo `p` (not `mod`), where `p < q`.

the scaled message is

    m0_scaled = m0 * delta = m0 * mod / p
    

the ciphertext `c` is

    B = A * s + e + m0_scaled
    

which can be decrypted as

     c = (B, A)
    

here is the source code:

    def linear_secret_key_regev_encryption_scaled() -> None:
        """ 
            This method runs a secret key regev encryption and decryption experiment
            for a msg vector with a scaled msg vector.
    
            In this another simple example of learning with error (LWE), we loose
            information on least significant bits by adding noise, i.e., by scaling 
            the message vector by delta = mod / p before adding it to encryption. 
            Then, during the decryption, we scale the message vector by 1 / delta.
    
            The scaling ensures that m is in the highest bits of the message vector,
            without losing information with the addition of the error vector e.
    
            Now, the message m0 vector has each element module p (not mod), where
            p < q. The scaled message is now m0_scaled = m0 * delta = m0 * mod / p.
            The cipertext c is B = A * s + e + m0_scaled, which can be decrypted as
            c = (B, A), i.e., m0 = (B - A * s) / delta = (delta * m0 + e) / delta.
        """
    
        ########################################################################
        # 1. Key generation
        ########################################################################
        regev = Regev()
        m0, A, e = regev.create_message_setup(this_mod = regev.p)
        s = regev.create_secret_key()
    
        ########################################################################
        # 2. Scale message vector by delta = mod / p
        ########################################################################
        scaled_m0 = m0.calculate_scaling(regev.mod, regev.p, regev.mod)
    
        ########################################################################
        # 3. Encryption by calculating B and ciphertext c
        ########################################################################
        c = regev.calculate_encryption(A, s, e, scaled_m0)
    
        ########################################################################
        # 4. Calculate the decryption of the ciphertext c
        ########################################################################
        m1 = regev.calculate_decryption(s, c)
    
        ########################################################################
        # 5. Scale m1 vector by 1/ delta = p / mod
        ########################################################################
        scaled_m1 = m1.calculate_scaling(regev.p, regev.mod, regev.p)
    
        ########################################################################
        # 6. The message vector m0 should be equal to m1
        ########################################################################
        regev.print_results(m0, scaled_m1, 'm0', 'scaled m1')
    

here is an example of an output:

![](https://storage.googleapis.com/papyrus_images/56a52ace860a758398b06253958f4b8515078de7ebb8542dd7d68055c0d5da7b.png)

* * *

0101\. experiment III: proving that the secret key regev encryption scheme supports additive homomorphism
---------------------------------------------------------------------------------------------------------

we saw that **additive homomorphism** means that if `c0` is the encryption of `m1` under secret key `s` and `c2` is the encryption of `m2` under the same secret key `s`, then `c0 + c1` is the encryption of `m0 + m1` under `s`.

for a large number of `ci`, noise can be introduced from error, so the correctness of the results will depend on the values of `m`, `n`, `mod`, and `p`, such that:

    |sum ei| < mod / (2 * p)
    

here is the source code for this experiment:

    def additive_homomorphism() -> None:
        """ 
            This method proves that the secret key Regev encryption scheme is
            additive homomorphic, i.e., if c0 encrypts m0 and c1 encrypts m1,
            both under s, then c0 + c1 decrypts to m0 + m1. 
        """
    
        ########################################################################
        # 1. Key generation for two independent messages m0 and m1
        ########################################################################
        
        r0 = Regev()
        m0, A0, e0 = r0.create_message_setup(this_mod = r0.p)
    
        r1 = Regev()
        m1, A1, e1 = r1.create_message_setup(this_mod = r1.p)
    
        s = r0.create_secret_key()
    
        ########################################################################
        # 3. Scale message vectors by delta = mod / p
        ########################################################################
        scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)
        scaled_m1 = m1.calculate_scaling(r1.mod, r1.p, r1.mod)
    
        ########################################################################
        # 4. Encryption by calculating B and ciphertext c for each message
        ########################################################################
        c0 = r0.calculate_encryption(A0, s, e0, scaled_m0)
        c1 = r1.calculate_encryption(A1, s, e1, scaled_m1)
    
        ########################################################################
        # 5. Add the ciphertexts, with c2 = c0 + c1
        ########################################################################
        c2 = (c0[0] + c1[0], c0[1] + c1[1])
    
        ########################################################################
        # 6. Decrypt the sum of the ciphertexts
        ########################################################################
        r2 = Regev()
        m2 = r2.calculate_decryption(s, c2)
    
        ########################################################################
        # 5. Scale m1 vector by 1/ delta = p / mod
        ########################################################################
        scaled_m2 = m2.calculate_scaling(r2.p, r2.mod, r2.p)
    
        ########################################################################
        # 6. The sum of the message vectors m0 and m1 should be equal to m2
        ########################################################################
        r2.print_results(m0 + m1, scaled_m2, 'm0 + m1', 'm2')
    

here is an example of an output:

![](https://storage.googleapis.com/papyrus_images/451b782ea623827759a8bc2fe4ea28d9241b2fcb03f8f8946b701f89e3d10e6e.png)

* * *

0110\. experiment IV: proving that the secret key regev encryption scheme supports plaintext inner product
----------------------------------------------------------------------------------------------------------

this experiment shows that given a cipher `c` and a message vector `m0`, `c -> c1` can be transformed such that it also encrypts the **inner product** of `m0` with a plaintext vector `k` of size `m` and element modulo `p`.

because of **noise growth** with the vector `k`, fine-tuning the initial parameters is crucial for the message to be successfully retrieved. as you will see in the snippet below, to guarantee correct decryption, the following must hold:

    k * e0 < mod / (2 * p)
    

here is the source code:

    def plaintext_inner_product() -> None:
        """ 
            This method proves that the secret key Regev encryption scheme is
            supports plaintext inner product, i.e., if c0 encrypts m0 and c1
            encrypts m1, both under s, then c0 * c1 decrypts to m0 * m1.
        """
    
        ########################################################################
        # 1. Key generation
        ########################################################################
        r0 = Regev()
        m0, A, e = r0.create_message_setup(this_mod = r0.p)
        s = r0.create_secret_key(this_mod = r0.p)
    
        ########################################################################
        # 2. Scale message vector by delta = mod / p
        ########################################################################
        scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)
    
        ########################################################################
        # 3. Encryption by calculating B and ciphertext c
        ########################################################################
        c = r0.calculate_encryption(A, s, e, scaled_m0)
    
        ########################################################################
        # 4. Calculate a plaintext vector transposed k and then scale it by
        #    delta = mod / p
        ########################################################################
        rk = Regev()
        k = m0.create_random_message(rk.p, 1, rk.m )
        scaled_k = m0.calculate_scaling(1, 1, rk.mod)
    
        ########################################################################
        # 5. Calculate the noise growth 
        ########################################################################
        noise_growth = scaled_k * e
    
        ########################################################################
        # 6. Define the ciphertext of the inner product of m0 and k
        ########################################################################
        c1 = (scaled_k * c[0], scaled_k * c[1])
    
        ########################################################################
        # 7. Decrypt the ciphertext of the inner product of m0 and k
        ########################################################################
        m1 = r0.calculate_decryption(s, c1)
    
        ########################################################################
        # 8. Scale m1 vector by 1/ delta = p / mod
        ########################################################################
        m1_scaled = m1.calculate_scaling(r0.p, r0.mod, r0.p)
    
        ########################################################################
        # 9. Scale back the plaintext vector k by 1/ delta = p / mod
        ########################################################################
        scaled_scaled_k = scaled_k.calculate_scaling(1, 1, rk.p)
    
        ########################################################################
        # 10. The message vector m1 scaled should be equal scaled k * m0
        ########################################################################
        r0.print_results(m1_scaled, scaled_scaled_k * m0, 'scaled m1', 'scaled k * m0')
    
        ########################################################################
        # 11. Print results on noise, decryption fails when noise > delta / 2 
        ########################################################################
        rk.print_noise_growth(m1_scaled, scaled_scaled_k * m0, noise_growth)
    

here is an example of a successful decryption:

![](https://storage.googleapis.com/papyrus_images/c92046dafff2a8ba2e150ecfd793f4ef5365ca4cd42369df8e40401c42128779.png)

now, changing `MOD_P` from 10 to 100, we see a failed decryption case, showing how noise growth can interfere with the results:

![](https://storage.googleapis.com/papyrus_images/e7131630c16e3245a622d3201887e0ade1c05623a7352f11cbffe30c546e8b60.png)

* * *

0111\. experiment V: run an intro tutorial on how PIR should work (without encryption)
--------------------------------------------------------------------------------------

in this experiment, we get the first taste of how PIR works, but without encryption yet.

to do that, we define our server’s database by a square vector of size `m x m`, with each entry modulo `p`.

we query a value at a specific row `r` and col `c` in plaintext, by creating a query vector of size `m x 1` that is filled with `0`, except for the desired column index `c`.

we then show that computing the **dot produc**t of the database vector to the query vector will give a result vector with all rows in the column index `c`, where you can retrieve the row `r`.

here is the source code, located at `experiments/simple_pir.py`:

    def no_encryption_example() -> None:
        """
            Run a tutorial presenting the logic of a PIR experiment 
            without encryption.
        """
    
        ########################################################################
        # 1. Represent a database as a square matrix, where the columns are 
        #    the database entries and the rows are the database attributes
        ########################################################################
        log_debug('In this PIR tutorial, we represent a database as a square matrix, ' + 
            'where columns are the database entries and rows are the database attributes.')
        
        log_debug('We start the class Message(), creating a random database ' +
                        'with mod 500, and 20 entries and 20 attributes.\n')
    
        msg = Message()
        db = msg.create_random_message(500, 20, 20)
        
        log_debug(f'db: {db}\n')
    
        ########################################################################
        # 2. Create some random query value for row and column
        ########################################################################
        log_debug('Now, let\'s create a random query value for row and column. ' +
                                                'Say, row 10 and column 10.')
        
        query_row = 10
        query_col = 10
    
        log_debug(f'query_row: {query_row}, query_col: {query_col}\n')
    
        ########################################################################
        # 3. Create a message that is 5 at the query column and 0 elsewhere
        ########################################################################
        log_debug('Let\'s create a query message vector, of size 500, that is 1 at ' +
                                                'the query column and 0 elsewhere.')
        query = msg.create_zero_message(500, 20, 1)
        query.set_query_element(query_col, 0, 1)
    
        log_debug(f'query vector: {query.message}')
    
        ########################################################################
        # 4. Compute resulting message vector
        ########################################################################
        log_debug('Let\'s compute the resulting message vector, which is the ' +
                                   'dot product of the database and the query.')
        
        result = db * query
        log_debug(f'result = db * query: {result}\n')
    
        ########################################################################
        # 5. Compute msg retrieved from the database
        ########################################################################
        log_debug('Finally, let\'s compute the message retrieved from the database, ' + 
                        'by getting the element at the query row and column.')
        log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}\n')
    
        log_debug('This should be the same as the result message vector element at the query row.')
        log_debug(f'result.get_query_element({query_row}, 0): {result.get_query_element(query_row, 0)}\n')
    
        correct_retrieval = result.get_query_element(query_row, 0) == \
                            db.get_query_element(query_row, query_col)
    
        log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}')
    

here is an example of an output:

![](https://storage.googleapis.com/papyrus_images/40bf587c7202146c7b01f5ee9126f6e8399288c0163241277caa6e0b4961b75d.png)

* * *

1000\. experiment VI: run a simple PIR experiment with secret key regev encryption
----------------------------------------------------------------------------------

we are ready to run our first full PIR experiment, where we build a query vector as in the previous experiment, but encrypt it using the secret key `s` from the regev encryption scheme.

here is the source code:

    def secret_key_regev_example() -> None:
        """Run a secret key Regev encryption and decryption PIR experiment."""
        ########################################################################
        # 1. Represent a database as a square matrix, where the columns are 
        #    the database entries and the rows are the database attributes
        ########################################################################
        
        regev = Regev()
        msg = Message()
    
        log_debug('1. We start creating a random message vector ' + 
                                     'as a square m x m database with mod p')
        
        db = msg.create_random_message(regev.p, regev.m, regev.m)
        log_debug(f'db: {db}\n')
    
        ########################################################################
        # 2. Create some random query value for row and column
        ########################################################################
        log_debug('2. Now, let\'s create a random query value for row and column.')
       
        query_row = 5
        query_col = 5
    
        log_debug(f'query_row: {query_row}, query_col: {query_col}\n')
    
        ########################################################################
        # 3. Create query message vector
        ########################################################################
        log_debug('3. Let\'s create a query message vector, of size m, that is 1 at ' +
                                                'the query column and 0 elsewhere.')                
    
        query = msg.create_zero_message(regev.mod, regev.m, 1)
        query.set_query_element(query_col, 0, 1)
    
        log_debug(f'query vector: {query.message}\n')
    
        ########################################################################
        # 4. Encrypt query message vector
        ########################################################################
        log_debug('4. Let\'s encrypt the query message vector, calculating A and e.')
       
        _, A, e = regev.create_message_setup()
    
        # Here we could either use mod or p as the scaling factor.
        s = regev.create_secret_key()
    
        log_debug(f'The secret key s: {s}')
    
        ########################################################################
        # 5. Scale query vector by delta = mod / p and db vector from p to mod
        ########################################################################
        log_debug('5. We scale the query vector by delta=mod/p and db vector to 1/p')
    
        scaled_query = query.calculate_scaling(regev.mod, regev.p, regev.mod)
        scaled_db = db.calculate_scaling(1, 1, regev.mod)
    
        log_debug(f'scaled_query: {scaled_query}')
        log_debug(f'scaled_db: {scaled_db}\n')
    
        ########################################################################
        # 6. Encryption by calculating B and ciphertext c
        ########################################################################
        log_debug('6. Let\'s encrypt the query vector by calculating B and ciphertext c.')
        c_query = regev.calculate_encryption(A, s, e, scaled_query)
    
        log_debug(f'c_query: {c_query}\n')
    
        ########################################################################
        # 7. Compute encrypted result
        ########################################################################
        log_debug('7. Let\'s compute the encrypted result by calculating the dot ' +
                     'product of the encrypted query and the encrypted database.') 
    
        c_result = (scaled_db * c_query[0], scaled_db * c_query[1])
    
        log_debug(f'c_result: {c_result}\n')
    
        ########################################################################
        # 8. Calculate the decryption of the ciphertext c_result to find the
        #    result of the PIR query at the query_col th column
        ########################################################################
        log_debug('8. Let\'s calculate the decryption of the ciphertext c_result')                 
        m1 = regev.calculate_decryption(s, c_result)
    
        log_debug(f'm1: {m1}\n') 
    
        ########################################################################
        # 9. Scale the result by p / mod
        ########################################################################
        log_debug('9. Let\'s scale the result by p / mod.')
        m1_scaled = m1.calculate_scaling(regev.p, regev.mod, regev.p)
    
        log_debug(f'm1_scaled: {m1_scaled}\n')
    
        ########################################################################    
        # 10. The message vector m1_scaled should be equal to the db at the 
        # query vector query_row, query_col, showing that PIR works.
        ########################################################################
        log_debug('10. The message vector m1_scaled should be equal to the db at ' +
                   'the query vector query_row, query_col, showing that PIR works.')  
    
        log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}') 
        log_debug(f'm1_scaled.get_query_element({query_row}, 0): {m1_scaled.get_query_element(query_row, 0)}\n')            
    
        correct_retrieval = m1_scaled.get_query_element(query_row, 0) == \
                            scaled_db.get_query_element(query_row, query_col)
    
        log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}\n')
    

here is an example of an output when you run this experiment:

![](https://storage.googleapis.com/papyrus_images/d1440b64f061fa6eb24fedece23a5409a0500f981b5d17362edcf3337b3927c7.png)

### did i convince you that PIR (or, more properly, math) feels just like magick?

* * *

1001\. concluding thoughts
--------------------------

if you would like to learn more about PIR, here are some canonical papers:

*   [private information retrieval and its applications, sajani vithana et al.](https://arxiv.org/pdf/2304.14397.pdf)
    
*   [practical private information retrieval, femi george olumofin](https://uwspace.uwaterloo.ca/bitstream/handle/10012/6142/Olumofin_Femi.pdf?sequence=1&isAllowed=y)
    
*   [how practical is single-server private information retrieval?, sophia artioli](https://ethz.ch/content/dam/ethz/special-interest/infk/inst-infsec/appliedcrypto/education/theses/How_practical_is_single_server_private_information_retrieval_corrected.pdf)
    
*   [applying private information retrieval to lightweight bitcoin clients, kaihua oin et al.](https://www.computer.org/csdl/proceedings-article/cvcbt/2019/366900a060/1cdOwKCMqXK)
    
*   [computationally PIR with polylogarithmic communication, c. cachin et al.](https://link.springer.com/content/pdf/10.1007/3-540-48910-X_28.pdf)
    
*   [single-database PIR with constant communication rate, c. gentry et al.](https://eprint.iacr.org/2023/452.pdf)
    
*   [evaluating s-dnf formulas on ciphertexts, d. boneh et al.](https://crypto.stanford.edu/~dabo/pubs/abstracts/2dnf.html)
    
*   [reducing the servers’ computation in PIR retrieval, a. beimel et al.](https://www.cs.bgu.ac.il/~beimel/Papers/BIM.pdf)
    
*   [piano, extremely simple, single-Server PIR with sublinear server computation, m. zhou et al.](https://eprint.iacr.org/2023/452.pdf)
    

* * *

**◻️ motherofbots.eth**
-----------------------

---

*Originally published on [bt3gl's symposium](https://paragraph.com/@go-outside/on-simple-private-information-retrieval-experiments)*
