John Ferrolino
3 Mar 2021
•
4 min read
There are quite a few literatures that talk about this data structure by people who, the same as I, came from their early introduction to the blockchain technology and bitcoin.
This article is not trying to replicate the effort as some posts were in-depth in their walkthrough on the usage and importance of Merkle Tree in the blockchain space and other use cases. This is the reference I used that inspired me to write this article.
In this post, we will write an implementation of this data structure using the Scala language. The approach I will take is first to first a)define the data types then (b)implement the algorithm for Merkle tree. The simplest representative of this data structure is a binary tree where the input data are used to generate our leaf nodes that contain the cryptographic hash of a dataset which will continue iterating to create branches until it reaches the last node that we will call the root.
We will define the Node as a trait where other types will refer from. It will contain a type parameter that represents the data type of the hash value which is typically be represented as an Array[Byte]
.
trait Node[T] {
val hash: T
}
The structure of the tree is composed of:
case class Leaf[T](datum: T) extends Node[T]{
override val hash: T = ???
}
case class Branch[T](left: Node[T], right: Option[Node[T]]) extends Node[T] {
override val hash: T = ???
}
As seen above, we are missing the computation for producing the hash value. We needed to provide the mechanism to generate this hash value through defining the hash function. We will then add another parameter in the case class definition written above with this:
(implicit hashFn: (T, Option[T]) => T)
This will be used both on the Leaf and Branch nodes whilst the second value parameter in the function literal is left optional.
To complete the definition, let us rewrite them and add the logic for the computation as follows:
case class Leaf[T](datum: T)(implicit hashFn: (T, Option[T]) => T extends Node[T] {
override val hash: T = hashFn(datum, None)
}
case class Branch[T](left: T, right: Option[Node[T]])(implicit hashFn: (T, Option[T]) => T) extends Node[T]{
override val hash: T = hashFn(left.hash, right.flatMap(r => Some(r.hash))
}
We now have completed the abstract data types for this data structure. The other remaining thing to do is to work on writing the algorithm for our Merkle Tree.
The merkle tree will be defined as a class with a companion object to hold the construction of the tree.
class MerkleTree[T](val root: Node[T])
The gist of the algorithm will be written in its companion object. It will be composed of two parts:
- the constructor that accepts the input data of some type T and generates the leaf nodes that will store the hash value of the input data.
object MerkleTree {
def apply[T](data: Seq[T])(implicit hashFn: (T,Option[T]) => T): MerkleTree[T] = ???
...
}
- the recursive method that will keep on creating branches until a single node is left. This node will represent the root of the Merkle tree.
object MerkleTree {
...
def build[T](nodes: Seq[Node[T]])(implicit hashFn: (T, Option[T]) => T): MerkleTree[T] = ???
}
We also need an outlet to plug in our function for generating the hash for our inputs which will then be used in the data types.
apply
When creating an instance of the Merkle tree, the apply method will be called. It requires data as input which is typically an array of bytes.
The apply method will use those input data to generate our leaf nodes which will store the hash content. It will look something like below:
def apply[T](data: Seq[T])(implicit hashFn: (T, Option[T]) => T): MerkleTree = {
val withLeaves = data.map(Leaf(_))
build(withLeaves)
}
After generating the leaves, it will then pass this to build
which will complete the construction of the Merkle tree.
build
methodThe build method completes the creation of our Merkle Tree where it will recursively call itself until a single node is left, the root.
Whilst in the process, the sequence of nodes are paired from leaves and internal nodes (i.e. branch) to instantiate our Branch
case class. The pair serves as an input to our branch to represent either the left or an optional right node.
def build[T](nodes: Seq[Node[T]])(implicit hashFn: (T, Optional[T]) => T): MerkleTree = {
if(nodes.length == 1) new MerkleTree[T](nodes.head)
else {
val withBranches = nodes.grouped(2).map {
case Seq(l,r) => Branch(l, Some(r))
case Seq(a) => Branch(a, None)
}.toSeq
build(withBranches)
}
}
The first thing to do is to define our cryptographic hash function implementation. For this, we will use the built-in MessageDigest utility available in the jdk with SHA-256
encoding.
This function literal will be used and injected to the classes and methods that require it. One way of writing it will look like as follows:
implicit val hashFn: (Array[Byte], Option[Array[Byte]])=> Array[Byte] = (byteArr, opt) => {
val cc = opt match {
case None => byteArr
case _ => byteArr ++ opt.get
}
java.security.MessageDigest.getInstance("SHA-256").digest(cc)
}
To prepare, we can define a sequence of string as inputs. Since our hashFn expects an input of Array[Byte], we need to convert each items to its bytes equivalent.
It will be written as shown below.
val inputs = Seq("trans01", "trans02", "trans03").map(_.getBytes)
val tree = MerkleTree(inputs)
For more details on how to use this, visit the test spec.
You can also find the source code for this article at this github link
So far, we only built a data structure in the most simplest way we could. This will help us understand the components being used as well as the algorithm to complete it. This is a good introduction to understanding the Merkle tree but far enough for practical use.
A follow up article(s) will ensue on topics for applying Merkle proofs and further improvements.
John Ferrolino
See other articles by John
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!