PlanetScale and Vitess: A Technical Deepdive

The effortless solution for revamping your database management process and scaling data in the cloud.

If you’re unsure what PlantScale is, I suggest you read the previous article, where I go over the basics of what PlantScale is intended for. This article intends to explain the technical side of PlanetScale and Vitess.

How do PlanetScale and Vitess differ?

Vitess is a MySQL solution designed for deploying, scaling, and managing big data.

PlanetScale offers Vitess in the cloud, providing a GUI, CLI, and, most recently (Jan ‘23), an API. Traditionally, configuring custom, scalable databases can be a difficult task, often requiring two teams: infrastructure and database administrators.

PlanetScale completely abstracts this by offering a fully managed Vitess cluster, along with comprehensive interfaces and additional features, reducing effort and concerns about infrastructure.

How does PlanetScale use Vitess?

When a database is spun up on PlanetScale, behind the scenes, an entirely new Vitess cluster is created. This includes everything from the databases to the Vitess components, such as VTTablets and VTGate, which I explain in more detail later in this post. The database can then be connected via exposed connection strings, as described in the previous PlanetScale article I wrote.

Below is an architecture diagram of a Vitess cluster for a PlanetScale database based on a PlanetScale blog post and the Vitess documentation.

This cluster contains a VTGate, a Topology Database, a vtctld instance, and two shards, each consisting of a VTTablet and two databases. The VTGate is a lightweight proxy that forwards MySQL queries to the relevant VTTablets, bridging the gap between PlanetScale and the shards.

A benefit of sharding is that it enables clusters to have more replicas (primary and secondary), which can enable better geographic spreading with the ability to configure instances based on workload and hardware capabilities. For example, writes could occur on primary instances while reads on other instances. PlanetScale offers this level of configurability at a high level, simply enabled at the click of a button.

In the infrastructure diagram above, you can also see a Vtcld instance and a topology server within the Vitess cluster diagram. These components are responsible for managing the data: recording metadata of keyspaces, shards, and cells.

Connection Management

A PlanetScale database can theoretically serve an infinite number of connections, which is achieved by the Vitess infrastructure through connection pooling, reducing the number of resource-intensive MySQL connections. In addition, the Vitess components are written in Go and communicate with each other using gRPC, leveraging concurrency to handle thousands of connections simultaneously.

In the diagram above, you can see that only the VTTablet has direct communication with the MySQL database. You can consider it to be a daemon — or, in the Kubernetes realm, as a sidecar. It’s the VTTablet that has direct access to the mysqld process, and therefore, typically, they run on the same host. Vitess also implements a VTGate, which is a proxy responsible for routing traffic to the correct VTTablet, combining and returning the result to the client. The client can connect to the VTGate as if it’s a single MySQL server, increasing the spread across multiple instances and databases.

Because PlanetScale implements Vitess behind the scenes, connection pool management is supported straight out of the box, enabling virtually unlimited simultaneous connections — ideal for serverless infrastructure.

Online Schema Migrations

Vitess enables online schema migrations through tools such as gh-ost. Schema Migrations, also known as Data Definition Language (DDL) (a subset of SQL responsible for table descriptions and relationships), can be a complex and inefficient task — especially when relational data is already present in the database.

Traditionally, schema migrations were done with MySQL at the database level. This introduced issues such as resource blocking, uninterruptible queries, and resource-intensive tasks. Online DDL moves the migration logic to the application layer, introducing abilities such as monitoring, cancelling, and retrying migrations. This is achieved through ghost tables, where data is copied from the existing table into a new ghost table with the applied migrations.

During the migration, queries are still applied to the original table. After the migration completes, the ghost table replaces it. An obvious downside to cloning data for the ghost table is multiplied disk space usage during the migration. However, after migration, Vitess will automate the clean-up process by “garbage collecting” old tables and artefacts.

With relational data, foreign keys are used to describe these relationships, which can be enforced by the storage engine through foreign key constraints. Deleting data that would result in broken relationships, such as orphaned data, would result in an error. This problem arises when a ghost table is used — what happens to the old data once the migration finishes? Data becomes orphaned, and InnoDB would throw an error.

To overcome this, relationships can be enforced at the application layer: through Online DDL. This can speed up transactions involving deleting relationships — it is no longer a requirement that all dependent rows are dealt with immediately during a DELETE. Online DDL means they can be dealt with at some point in the future, removing the need for long, uninterruptible queries. Instead, dependency deletion can be batched into smaller jobs (such as limiting to 100 rows at a time), and scheduled for periods of lower traffic. You can read more about how PlanetScale leverages Vitess for concepts such as relationship integrity and batch processing here.

Demo

Vitess provides a demo that can be downloaded here. There is also some in-depth documentation available about the demo here.

The demo also provides a web interface to help visualise where data is being stored. If you want to learn more about the demo’s functionality, follow the links to run the demo locally and read about it in the docs.

As explained in the previous article I wrote, sharding is where rows in a table are split across multiple resources to enable various concepts such as horizontal scaling. In the demo, the customer database has been sharded into customer:-80 and customer:80-. The VTGate proxy works to combine these shards to appear as a single database, ‘customer’. The VTGate can be connected to as if it were a normal MySQL database, consolidating the sharded data into a single database.

In the demo, the customer table has two values: customer_id and uname. customer_id has been configured to be the Primary VIndex, which means it determines which shard the row should be placed into.

In the demo, the customer table has two values: customer_id and uname. customer_id has been configured to be the Primary VIndex, which means it determines which shard the row should be placed into. The demo uses a hash VIndex, which means each row is assigned a value (up to 64 bits) based on the value of the customer_id.

The data can then be split in half: storing rows where the VIndex is less than a specified value in shard -80, and otherwise in shard 80-. The VTGate can be used to calculate what the VIndex would be for IDs 1,2,3,4 and 5, and therefore which shard they would be placed in.

We can test this by running the SQL statement below to populate the customer table with 5 rows since we know the customer id will be automatically incremented:

We can then confirm which values were put into which shard via VTGate, which allows us to isolate a database’s shard:

VTGate will combine these shards into a single table when accessing just the customer database, consolidating the data to appear as if only one table was used:

Within the same demo application, ‘corder’ is a table that has a dependency on customer: A customer can have zero or more orders. To reduce the number of queries, Vitess can be configured to place orders within the same shard as the associated customer. You can read more on how to configure Vitess to work like this here. Running the SQL statement below will assign:

2 orders to customer 1 (shard -80)
1 order to customer 2 (shard -80)
1 order to customer 3 (shard -80)
1 order to customer 4 (shard 80-)‍‍

Similar to before, we can view the items in the corder table for each shard: the orders associated with a customer in shard -80 will also live in that shard, and vice versa.‍

And, a consolidated corder table can also be viewed:

Do I really have to worry about any of this?

No! PlanetScale isolates and abstracts all of this functionality, so you can simply use it as a standard database. With many features provided straight out of the box, customers can perform data migration, scaling and gathering performance insights without complex configuration. In contrast, whilst alternatives such as AWS RDS support connection pooling require manual configuration, whereas PlanetScale harnesses Vitess’ ability to do this automatically.

How can YLD help?

If you’re looking to get into the latest technology or want to join a collaborative team of great engineers and designers that work on impressive client projects, feel free to contact us at YLD at talent@yld.io for open jobs or hello@yld.io to discuss the prospect of a potential project.

PlanetScale and Vitess: A Technical Deepdive

How do PlanetScale and Vitess differ?

How does PlanetScale use Vitess?

Connection Management

Online Schema Migrations

Demo

Do I really have to worry about any of this?

How can YLD help?

View more blogs

Combatting sophisticated cybersecurity threats with AI

Why Evals are the missing link to your AI strategy

Get in touch