Tech Corner

Everything you need to know about Distributed SQL

Anwesh Roy
.
Apr 28
.
5 mins

Table of contents

Automate your business at $5/day with Engati

REQUEST A DEMO
Distributed SQL

Introduction

SQL has been the mainstay language for building software applications for over five decades. SQL itself was based on a paper written by E.F. Codd of IBM Research in 1970. One of the main reasons for SQL to survive so long is that it has evolved with time, changed to handle newer use cases, provided a common interface (relational) which is easy to understand and model real world entities.

However in the new world data has grown in leaps and bounds and just plain old SQL had a hard time processing data across terabytes of data spread across geographical data centers and having highly concurrent access to data. This has led to the evolution of SQL to Distributed SQL which had its underpinnings in Distributed Databases.

As per the definition in Wikipedia:

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones.

NoSQL databases came up to address the scalability challenges legacy SQL databases were facing but could not provide ACID (atomicity, consistency, isolation, durability) guarantee.

Distributed SQL databases are solving that problem by providing distributed ACID transaction capability.

Origin

Most modern distributed SQL databases have their origin and inspiration from Google’s Spanner Database that was published in a paper in 2013.

Spanner is a multiversion, globally distributed and synchronously replicated database that can span millions of machines across hundreds of data centers and handle trillions of database records. It is the first database system to distribute data on a global scale and support externally consistent distributed transactions.

Google’s Advertising backend transitioned out of MySQL to Spanner and is one of the largest internal customers for Spanner.

The paper on Spanner provides a good walk through of the design and architecture of Spanner. 

Reference Database

Yugabyte is one of the new kids on the block and is challenging established players. The best part of this database is that it is an open source distributed SQL database.

From their github repository we find their definition:

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

Yugabyte’s transaction design is based on Google's Spanner. It supports strong consistency of writes by using cluster wide distributed ACID transactions.

It allows distributed SQL query execution by executing across multiple database nodes simultaneously. Incoming queries are accepted by any node in the cluster which then requests other nodes to process their portion of the query in a way that latency is minimized by reducing intra-node data transfer. The node that accepted the query aggregates the results from all nodes that executed the distributed SQL query and sends back the results to the client.

It automatically distributes data and indexes across multiple nodes of the cluster so that no single node becomes a bottleneck. It commits write operations synchronously across multiple nodes in the cluster and thus provides consistent replication of data.

One of the promising features is support for distributed ACID transactions, a single transaction is executed across multiple nodes by using 2 Phase Commit protocol.  

Competition

Yugabyte has put up a feature comparison of the various commercial distributed SQL databases which gives a good idea of how the competition is lined up.

Comparing distributed SQL Databases

Conclusion

As the requirement to store, access and make data available in real time grows, Distributed SQL databases provide the hope to scale up and provide data consistency across multiple concurrent transactions. Research in this area is hot and many new players are coming up with good solutions to address current challenges. 

Another key challenge is to make these databases cloud native so that commodity hardware can be used to store and process massive amounts of data.

In the next article we will look at the state of art technology for cloud based distributed SQL databases along with serverless options.

Anwesh Roy

Anwesh is the Senior Vice President of Engati. Driven by a passion to deliver value through AI-driven solutions, Anwesh is on a mission to mainstream Natural Language Processing (NLP), Natural Language Understanding (NLU), Natural Language Generation (NLG) and Data Analytics applications.

Close Icon
Request a Demo!

Get started on Engati with the help of a personalised demo.

Thanks for the information.
We will be shortly getting in touch with you.
Please enter a valid email address.
For any other query reach out to us on contact@engati.com
Close Icon
Congratulations! Your demo is recorded.

Select an option on how Engati can help you.

I am looking for a conversational AI engagement solution for the web and other channels.

I would like for a conversational AI engagement solution for WhatsApp as the primary channel

I am an e-commerce store with Shopify. I am looking for a conversational AI engagement solution for my business

I am looking to partner with Engati to build conversational AI solutions for other businesses

continue
Finish
Close Icon
You're a step away from building your Al chatbot

How many customers do you expect to engage in a month?

Less Than 2000

2000-5000

More than 5000

Finish
Close Icon
Thanks for the information.

We will be shortly getting in touch with you.

Close Icon

Contact Us

Please fill in your details and we will contact you shortly.

Thanks for the information.
We will be shortly getting in touch with you.
Oops! Looks like there is a problem.
Never mind, drop us a mail at contact@engati.com

<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": {
   "@type": "Question",
   "name": "what is Reference Database?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments."
   }
 }
}
</script>