How to go full cloud native with CockroachDB

At Thought Machine we like to offer banks a tip. If they want to build a bank for the future it makes sense to adopt cloud native technologies across their core systems, which includes the database. And that means adopting CockroachDB.

Yes, other databases are available. Our core banking engine Vault Core works with existing PostgreSQL databases. But Thought Machine is vocal about the merits of Cockroach Labs. It's a remarkable company, with clients such as SpaceX, eBay, and DoorDash.

In many ways our companies are twins. We were founded a year apart. Our founding teams are ex-Googlers. We are both cloud native, with products written from the ground up to harness the unique properties of the cloud. Our products are cloud agnostic, able to run on any cloud (still pretty rare).

Our histories are tangled together. Thought Machine was the first customer of Cockroach Labs back in 2015. When Cockroach raised early stage capital, Thought Machine co-founder Will Montgomery spoke to their investors. In fact, at Google, Will worked on the ads database, whilst Spencer, the future CEO of Cockroach Labs launched Colossus, the new Google internal distributed file system.

We share an ethos. And our products work seamlessly together.

So this blog is an explanation of why we are advocates of Cockroach Labs, and why you should consider CockroachDB as part of your own transition to cloud native banking.

Maybe it's best to begin with the Cockroach origin story.

Cockroach Labs was co-founded by Spencer Kimball, Peter Mattis, and Ben Darnell in 2015. The three are well known in the software industry. Spencer and Peter were previously best known for creating the open source graphics editor GIMP, and the graphical user interface toolkit GTK, whilst roommates at Berkeley. Not a bad start to their careers together.

Cockroach Labs was founded to overcome a long-standing problem in databases – how to scale? When databases reach a certain size they become hard to manage. The traditional solution is “sharding”. This involves breaking the database into chunks. Sharding is painful. The word “nightmare” gets used a lot. What if another method of scaling was possible? Whilst at Google, Spencer was exposed to distributed technologies including the Spanner database, which solved the sharding problem with an entirely novel approach. However, he noted Spanner was unavailable outside of Google.

So, Spencer and his co-founders set up Cockroach Labs to build a distributed database based on the Spanner architecture to be free of sharding, and accessible for companies of all sizes. The result was CockroachDB. It scales with no theoretical limit.

It is a database built for the cloud native era.

How CockroachDB operates

To understand the appeal of CockroachDB it pays to get technical for a moment.

Cockroach solves the issue of scale by breaking the database into chunks called Ranges. A database starts with a single empty Range, and grows as users add data. Ranges split when they get too large and merge together with neighbours when too small. Thus, the database can grow and contract elastically.

To promote resilience, each Range is cloned as three replicas. Each replica is physically on a node within a CockroachDB cluster and can be anywhere, possibly in another country or continent. Cockroach uses the well known Raft consensus algorithm to manage the cluster. The most important aspect is the labelling of the three clones as a “leader” and two “followers”. When a write comes in, the leader checks the status of the two followers. If all is okay, the leader authorises the write for all three replicas in the cluster.

This method ensures data is held in triplicate in the cloud. In the event of a disaster, such as hardware failure or network issues, a replica may go missing. The data can still be read – only one of the replicas is required. Furthermore, when a write comes in, losing a single replica is no obstacle. If 2 out of 3 replicas in the cluster remain available, the write is authorised, as a majority is present. This means transactions continue despite a single malfunction. This approach offers tremendous resilience.

If required, a cluster can be composed of 5 replicas (or even 7 or more, so long as it is an odd number), rather than 3. In this case, the cluster can endure the loss of 2 followers and still authorise writes: 3 out of 5 is a majority, thus sufficient to proceed.

The result is a database free of the problems of sharding, and able to withstand severe hardware and network disruption. A disk, machine, rack, and even datacenter failure can be endured with no loss of operational performance.

“The number one advantage of CockroachDB is resilience,” says Will Montgomery, who today serves as CTO of Thought Machine. “You get multi-cloud, multi-region, multi-availability zones, with full active failover. Achieving that with a traditional setup would be incredibly complicated and expensive. The performance is a mere detail compared to that benefit.”

A long list of other advantages

In addition to resilience and operational simplicity, CockroachDB offers many other benefits. The geographic location of data can be controlled both manually and automatically at the row level for each table.

“We built something called Geo-Partitioning,” explains Jim Walker, VP of product marketing at Cockroach. “This allows us to tie data to a particular location. It's amazing. Banks may need to locate customer data in a jurisdiction. It is possible to manually specify where you want data to be stored. And CockroachDB automatically brings the data closer to you. The leader of a cluster can be assigned to be the one nearest to you, to reduce latency. So if you are in the UK you aren't trying to communicate to a node in Colorado. We'll bring one closer to you.”

CockroachDB is wire-compatible with PostgreSQL. This is a huge relief for banks seeking to upgrade with the minimum of retraining. “Let's not underestimate the challenge for banks in training staff,” says Jim. “Our PostgreSQL compatibility means their staff will be familiar with CockroachDB from day one. And hiring staff will be easier too, as they will also be familiar with the dialect.”

CockroachDB is cloud agnostic. It runs across any and all clouds seamlessly. “When I first saw it I was astonished,” says Jim. “I couldn't believe what I was seeing. You can deploy a single logical database across multiple different cloud providers. It means you can use whatever great deals you've got in each country, says AWS in France and Google Cloud in the UK. You avoid vendor lock in, totally.”

He stresses: “Is it easy to build? Absolutely not! I don't know of anyone else doing it.”

Load balancing is done automatically by Cockroach. It's normal for certain parts of a database to experience disproportionate activity. This can overwhelm a node. Cockroach runs automatic load balancing – a big benefit for IT teams.

Dig deeper into Cockroach and it gets more impressive. The technology underlying the read and write protocols are too specialist for this blog, but again, Cockroach is exceptional. Put simply, Cockroach has found a way to guarantee ACID compliant transactions, at the very highest rating, known as serializable isolation. This is critical for high-value system of record workloads like transaction processing within a bank.

Essentially, the goal is to resolve simultaneous transactions. Handling two simultaneous activities means there is a danger the data will be mis-read or mis-written. Thus, transactions must be put into order – serializable - and isolated from each other, hence serializable isolation. It is a standard issue for any database, but particularly difficult for distributed systems.

Google solved the problem using atomic clocks on each node. The use of atomic clocks means only specialised hardware is suitable: not great for cloud agnostic projects. Cockroach found a different, more elegant solution, explained in this blog Living Without Atomic Clocks. Thus, CockroachDB can run on standard cloud infrastructure such as AWS, Azure, GCP, or IBM Cloud.

The result is rock solid integrity, achieved for a fraction of the fuss. A slogan of Cockroach is “Make data easy”, and it delivers on that promise.

Peak performance

In summary, CockroachDB offers elastic scale without sharding. The concept of Range replicas means the database can function in the event of localised hardware or network failure. Resilience is baked in. Downtime is slashed.

The overall performance is strong. On the industry standard benchmark TPC-C the latest CockroachDB release can process 1.68M tpmC with 140,000 warehouses, resulting in an efficiency score of 95 per cent, a 40 per cent improvement on the previous release. It delivers notable scores in latency and throughput.

This is why cutting-edge companies are building on CockroachDB. JPMorgan Chase recently voted Cockroach Labs into its Hall of Innovation, just one of 25 companies in the past decade.

To offer transparency, CockroachDB is entirely open source. It means anyone can inspect the code. As is common in open source software, users and the wider community regularly comment on the code and offer their own ideas. Engineers at Thought Machine are active contributors. It's yet another reason why Cockroach is progressing so rapidly.

The bank of the future will be built on cloud native systems. Running a next-generation database built for the cloud makes sense. CockroachDB works in harmony with Vault Core.

As we say, it isn't compulsory. But banks wanting to enjoy the full firepower of cloud native technologies must give CockroachDB serious consideration.