Solidity - The Strings Library

In this article, we are going to inspect OpenZeppelin's String library. Most of the articles are about core things like oracles, launchpads, tokens, etc. I wanted to write an article about this kind of useful but not too much-used thing. Let’s start!

State Variables

At the beginning of our contract, we have two state variables:

bytes16 private constant _HEX_SYMBOLS = "0123456789abcdef";
uint8 private constant _ADDRESS_LENGTH = 20;

We will use these variables in our toHexString functions.

toString() function

This function Converts a uint256 to its ASCII string decimal representation.

First of all, let’s look at the function signature:

function toString(uint256 value) internal pure returns (string memory)

It takes an uint256 parameter.
It’s an internal function. So, it can only be accessible in the String contract itself, or a contract inherited from the String contract.
It’s a pure function. That means it doesn’t read or write from the state.
And lastly, it returns a string.

The method we used in this function does not work with the number 0. So, if the input number is zero, return ”0”:

if (value == 0) {
  return "0";
}

Next, we are saving our input variable to another variable. We’ll divide it by 10 continuously. Because we want to get how many digits is the input variable and we don’t want to lose the value.

uint256 temp = value;
uint256 digits;

while (temp != 0) {
  digits++;
  temp /= 10;
}

Let’s assume that we sent 100,042 to this function. Now, temp is equal to 0 , and digits is equal to 6.

Now, we are going to create a bytes variable and insert the bytes one by one to it:

bytes memory buffer = new bytes(digits);
while (value != 0) {
  digits -= 1;
  buffer[digits] = bytes1(uint8(48 + uint256(value % 10)));
  value /= 10;
}
return string(buffer);

At the beginning buffer is equal to 0x000000000000. And in every loop it changes these:

0x000000000032
0x000000003432
0x000000303432
0x000030303432
0x003030303432
0x313030303432

If you convert every byte to ASCII, you can get 0 for 0x30, 1 for 31, 2 for 32, etc. 30 to 40 are the numbers 0 to 9.

Let’s break down the confusing line:

value % 10 is for getting the least important number.
uint8(48 + uint256(value % 10)) nothing much different. Just type conversion to be able to use bytes1 without losing data.
bytes1(uint8(48 + uint256(value % 10))) now we can add the resulting number to our buffer .

Let’s run the first loop to understand what is going on in this line:

value % 10 gives us 2. I am not going to explain to you “what is mod” here. If you don’t know how we are getting 2 from 100,042 % 10 , then just google “what is mod in programming”.
uint8(48 + uint256(value % 10)) gives us 50. You can think: “what the hell is this number?”. And you’ll be right. They used this, because, we are going to use hexadecimal numbers. If you convert 50 to hexadecimal, you’ll get 32. And 32 is equal to 2 in the ASCII table. Wow! What a conversion, hah?
bytes1(uint8(48 + uint256(value % 10))) conversion is required for our buffer , which is a bytes variable.

Lastly, we are converting our bytes to string. I love this part. Because if you want to get to result in uint, the result is will be a totally different thing. But, we are telling “convert it to string, I want to see its corresponding value in the ASCII table”.

toHexString() function

Before we go deep dive into this function, you have to know that: there are 3 different toHexString functions in this library. If you don’t know function overloading, you might be thinking “how can they use 3 functions with the same name?”. So, it’ll be better for you to check out function overloading before continuing this article.

Let’s look at our three functions:

function toHexString(uint256 value, uint256 length) internal pure returns (string memory)

function toHexString(uint256 value) internal pure returns (string memory)

function toHexString(address addr) internal pure returns (string memory)

As you can see, they have the same name, but, they all have a different kinds of parameters. Because of that, their function selectors are different. So, EVM is good with that.

Please think the functions in order. For example, the number 1, the first function, is the function that takes two uint256 parameters.

But, you should know that: the second and third functions are calling the first function. It means if you call the second or the third function, your input values are going to the first function at the end of the day.

Enough for talking. Let’s look at the third function’s code:

return toHexString(uint256(uint160(addr)), _ADDRESS_LENGTH);

It has only one line of code. And it is converting our address input variable to uint256. And, send the result with _ADDRESS_LENGTH, which is 20, to our first function.

Let’s convert my address,0x000000000042bAA586DD7161dC0EB8f0CB4a9fBE , to uint256: 346477235235092042596196240046333886 we’ll get this huge number.

We’ll look at the other steps in a minute when we are talking about the first function.

Okay, now time for our second function. If the input value is 0, then we are returning ”0x00”.

if (value == 0) {
  return "0x00";
}

If it is not zero, then we are going to calculate it’s length:

uint256 temp = value;
uint256 length = 0;
while (temp != 0) {
  length++;
  temp >>= 8;
}

I think only the temp >>= 8; line is confusing. Let’s again send to this function our 100,042 number and see what happens in our loop:

In our first loop length will be 1 and the temp is 390. What? Ok. Now we have to go to the dark side of coding: the binary world. 100,042 in binary is equal to 11000011011001010 and 390 is equal to 110000110 . Did you notice the similarity between to binaries? Their first (start from left) 8 digits are equal. >> means shift bits to right. If you used << this one, the result will be: 25610752 and its binary is 1100001101100101000000000 . Okay, I believe now things are more clear. >> deletes bits, << adds zero to binary. I think we can think like that. So, our loop basically calculates the length of our value in bytes (8 bits are equal to 1 byte). We will use the length of this byte in our first function. Let’s go on to other steps.
temp’s value was 110000110 . So, we can delete the least significant 8 bits by our hand (last 8 bits). If we delete them here we have binary 1, it is equal to decimal 1. And length is equal to 2 now.
This is the last step for our value. Because in this step our value is going to equal to 0. And while loop not going to run anymore. In this step temp is equal 0 and length is equal to 3. Now, we are sending 100,042 and length to the first function.

So far so good! Only one function is left, the first function. It starts with defining a bytes variable named buffer and its length is equal to 2 * length + 2. Why is that? It is because every 2 hexadecimal characters are equal to 1 byte. For example, if you want to convert byte 0 to hexadecimal, you'll have 0x00. Since we are returning a string, we don’t actually have 0x at the beginning. So, we have to add it manually. I think you got why we are adding 2 to our length, it is because of 0x.

bytes memory buffer = new bytes(2 * length + 2);
buffer[0] = "0";
buffer[1] = "x";

After than, we are going to determine all of our hexadecimal values one by one.

for (uint256 i = 2 * length + 1; i > 1; --i) {
  buffer[i] = _HEX_SYMBOLS[value & 0xf];
  value >>= 4;
}

We have sent to this function to value, do you remember? One is (100042, 3) and the other one is (0x000000000042bAA586DD7161dC0EB8f0CB4a9fBE, 20) . Let’s run them step-by-step. First (100042, 3) :

In this step, we are going to determine value & 0xf first. It is a bitwise AND operation. We want to get the last 4 bits We sent 100042 , in binary 11000011011001010. So, value & 0xf gives us 1010, which is equal to decimal 10. 10th item in the _HEX_SYMBOLS list is equal to a. After then we are deleting the least significant 4 bits from value. And now we have 1100001101100 , which is equal to decimal 6252. buffer is equal to 0x3078000000000061. The first 2 bytes 3078 are coming from 0x that we add before the loop. Do you remember 30 is equal to 0 in the ASCII table, right? The same goes with the x and a. 78 is equal to x in the ASCII table. If you convert the buffer to string, you’ll get 0xa.
Now we are operating 1100001101100 & 0xf which gives us decimal 12 and hexadecimal c. Buffer is equal to 0x3078000000006361, in string 0xca.
Our number is 110000110 . Again, delete the least significant 4 bits and run the operation bitwise AND. It gives us 0110. Buffer is equal to 0x3078000000366361 , in string 0x6ca.
Our number is 11000 . In this step, we are getting 8 . Now temp is equal to 1. Buffer is 0x3078000038366361, in string 0x86ca .
We get 1 in binary in this step. And it gives us hexadecimal 1. temp is 0. Buffer is 0x3078003138366361, in string 0x186ca .
temp was equal to. Because of that, we are getting a 0 in hexadecimal. Buffer is equal to 0x3078303138366361 , in string 0x0186ca . And this was the last step.

We gave (100042, 3) to this function and it returned 0x0186ca. You can convert these numbers using an online converter and you’ll see the result is true.

Conclusion

I said I also inspect the 346477235235092042596196240046333886 and 20 . But I am not a computer, I am bored doing the same thing over and over. So, you can do it by yourself. Here is the full code of the library that I inspect. I am sharing it because, in the future, it can be changed:

// SPDX-License-Identifier: MIT
// OpenZeppelin Contracts v4.4.1 (utils/Strings.sol)

pragma solidity ^0.8.0;

/**
 * @dev String operations.
 */
library Strings {
    bytes16 private constant _HEX_SYMBOLS = "0123456789abcdef";
    uint8 private constant _ADDRESS_LENGTH = 20;

    /**
     * @dev Converts a `uint256` to its ASCII `string` decimal representation.
     */
    function toString(uint256 value) internal pure returns (string memory) {
        // Inspired by OraclizeAPI's implementation - MIT licence
        // https://github.com/oraclize/ethereum-api/blob/b42146b063c7d6ee1358846c198246239e9360e8/oraclizeAPI_0.4.25.sol

        if (value == 0) {
            return "0";
        }
        uint256 temp = value;
        uint256 digits;
        while (temp != 0) {
            digits++;
            temp /= 10;
        }
        bytes memory buffer = new bytes(digits);
        while (value != 0) {
            digits -= 1;
            buffer[digits] = bytes1(uint8(48 + uint256(value % 10)));
            value /= 10;
        }
        return string(buffer);
    }

    /**
     * @dev Converts a `uint256` to its ASCII `string` hexadecimal representation.
     */
    function toHexString(uint256 value) internal pure returns (string memory) {
        if (value == 0) {
            return "0x00";
        }
        uint256 temp = value;
        uint256 length = 0;
        while (temp != 0) {
            length++;
            temp >>= 8;
        }
        return toHexString(value, length);
    }

    /**
     * @dev Converts a `uint256` to its ASCII `string` hexadecimal representation with fixed length.
     */
    function toHexString(uint256 value, uint256 length) internal pure returns (string memory) {
        bytes memory buffer = new bytes(2 * length + 2);
        buffer[0] = "0";
        buffer[1] = "x";
        for (uint256 i = 2 * length + 1; i > 1; --i) {
            buffer[i] = _HEX_SYMBOLS[value & 0xf];
            value >>= 4;
        }
        require(value == 0, "Strings: hex length insufficient");
        return string(buffer);
    }

    /**
     * @dev Converts an `address` with fixed length of 20 bytes to its not checksummed ASCII `string` hexadecimal representation.
     */
    function toHexString(address addr) internal pure returns (string memory) {
        return toHexString(uint256(uint160(addr)), _ADDRESS_LENGTH);
    }
}