Hashing algorithm in Blockchain and its properties
You might have come across blockchain hashes and hashing. So before understanding blockchain hashing and its algorithm, let us see its mathematical definition.
So in mathematics, the hash function is defined as a function that Uses a collection of inputs of any size and organizes them into a table or other data structure with fixed-size elements. For each input value, the function generates an output value. Here the hash function’s output is always a fixed length however its input can be of any length.
This is a mathematical operation where one numerical input value is converted into another.
In other words, Hashing is the process of simply passing some data through a formula to produce a result which is called the hash. This hash is in encrypted form. In encryption, data is transformed into a secure format that is unreadable The hash is frequently a string of characters, and the formula will always create hashes of the same length regardless of the amount of data you input.
This is how hashes and hashing are generally understood.
Having understood that let us now jump onto blockchain hashing.
So blockchain hash function is no different than its original definition, it is just that the blockchain system is based on cryptographic techniques (encrypted message format), so here, in the cryptographic hash function, The hash algorithm uses the transactions as inputs and produces an output with a predetermined size.
Simply put, input here is not an ordinary string or characters but transactions.
This hashing in the blockchain is done with SHA 256 hash function that outputs a value that is 256 bits long.
Secure Hash Algorithm 256-bit, or SHA-256, is a cryptographic security algorithm. Hashes generated by cryptographic hash algorithms are both irreversible and unique. The probability that two values will produce the same hash decreases as the number of potential hashes increases.
Now the question is how hashes are cryptographically secure. For that to understand, let’s see the properties of hashing algorithm which makes it useful in blockchain technology.
Fixed-length Mapping
The hash function in SHA 256 algorithm will always provide an output with the same length for any input that is fixed in length. That does not mean that if I provide an input of 50 characters, it will return the output of 50 sizes. However if let’s say the input is 50 characters and the output returned is 100 characters, then every time my input of any number of characters will return the output of 100 characters.
This characteristic enables us to hash any file, including text files, images, and even video files, and obtain an output with the same length.
Collision resistant
The hash function’s output cannot be the same for two different inputs.
Mathematically, we can state that Hash(X1) should not be equivalent to Hash for two different inputs, X1 and X2. That means in any condition, for different inputs, outputs cannot be the same. This ensures the accuracy of the algorithm and also gives an avalanche effect.
Avalanche Effect
This states that even a slight change in the input will result in a huge change in the output.
For instance, the insertion of merely a comma in a large string input will change the output completely. This is the reason behind being collision resistant.
Large output space
The only way to identify a hash collision is by A brute force search, which involves looking at as many inputs as the hash function’s potential outputs. What does the term “brute force approach” mean?
A brute force technique is a method for solving a problem by exploring every option that could possibly exist. The brute force method explores every avenue up until a workable solution is not discovered. Here, A bit has two possible values: 0 and 1. The possible number of unique hashes can be expressed as the number of possible values raised to the number of bits. For SHA-256 there are 2256 possible combinations. That means in order to search for a possible option, you have to look at 2256 possibilities. This amount is high enough to render a brute force search impractical.
In most simple words, a large output space means that for a given input transaction, there is a wide possibility of an output-generated hash. This ensures security and also wipes out the possibility of security risks.
Deterministic
A hash function must be deterministic in order to guarantee that it will always provide the same output for a given input. For example, if I provided input and recorded its hash. And again after some days, I gave the exact same input without even the smallest change, then I will get the exact same output.
Puzzle friendliness
This indicates that even after analyzing the pattern of input and output of the hash function, it is not possible to apply that logic to every input and predict the output. For instance, even after learning the first 50 bytes, it is impossible to guess or figure out the remaining bytes. Hence it cannot be prone to hacker attacks of any kind.
Preimage resistance
It says that given a hash value h, it is difficult to find a message m such that h = hash(m). Here m stands for message/ transaction input.
In other words, You cannot deduce the input from the hash function’s output (hash digest). As a result, even if someone obtains the hash digest of a message that has been hashed and sent to another, they will not be able to decrypt the original message.
To sum up, A cryptographic hash function needs to be computationally efficient in order to produce the hash result quickly. It must be pre-image resistant, which means it cannot reveal any information about the input in the output, and it must be deterministic, meaning it must always create the same output when given a specific input.