The Finalizer Pattern

Frank Greco Jr
4 min readFeb 18, 2022

--

Introduction

At Confluent, we recently introduced networking as a first-class citizen allowing users, among other things, to create their own isolated networks that they can schedule Kafka clusters inside of. Due to a myriad of tech debt that needed to be paid back, this feature was the culmination of a multi-year effort to decouple the networking resources from the original Confluent Cloud control plane into a new one. While embarking on this journey, it become apparent that we needed a way to represent dependencies across resources belonging to different control planes. This post will describe the pattern we created to solve it.

Finalizers

Given that a control plane shouldn’t have domain-specific knowledge about the resources that are on it, how do we gain just enough context so that we can answer basic questions such as, “Do I have any dependent resources that would prevent me from fulfilling a deletion request?” To answer questions like these, we developed a decentralized framework called finalizers.

No, decentralized doesn’t mean we run it on a private blockchain! It simply means that this isn’t a centralized service that manages finalizers for every resource. That would cause atomicity issues as we’ll see later. Rather, each control plane implements this interface for the resources it manages so that it can make atomic decisions in the context of them when it needs to.

service FinalizerService {
rpc Create(RegisterRequest) returns (RegisterReply) {}
rpc Delete(DeregisterRequest) returns (DeregisterReply) {}
}
message RegisterRequest {
enum Type {
UNKNOWN = 0;
SOFT = 1;
HARD = 2;
}
Type type = 1;
string from_resource_id = 2;
string to_resource_id = 3;
}
message DeregisterRequest {
string id = 1;
}

Finalizers can be one of two types — hard or soft. In retrospect, we probably should have broken these out into separate resources as their semantic meaning is just different enough to cause confusion. Let’s define the difference.

The presence of a soft finalizer on a resource indicates to that control plane that a dependency exists without hard-coding parent to child domain knowledge. This can be used to atomically prevent the deletion of a parent resource. It is the child resource’s responsibility to create this finalizer when the dependency is created and remove it when it no longer exists.

Before I can define hard finalizers, I need to explain what happens when a user deletes a resource in Confluent Cloud. We implement an extra safeguard for all of our resources by differentiating the deletion event from the actual deletion. We refer to it as hard versus soft deletion. This is usually implemented by an additional database column specifying the deletion timestamp. Hence, when a resource is deleted by an end-user, all soft finalizers for resources that it was a dependency on are removed.

When a resource is successfully soft-deleted, whether it is hard-deleted or not depends on the presence of any hard finalizers that may be attached to it. If no such finalizers are atomically determined to be present, the resource’s destruction is commenced. If hard finalizers do exist, it is ignored until the next reconciliation loop where this logic is executed again. Like soft finalizers, it is the responsibility of a child resource to remove any hard finalizers it may have created once it is hard deleted.

To help teach this finalizer concept better, let’s look at a concrete example that utilizes external APIs demonstrating end-user behavior and internal APIs to track the state of finalizers. Let’s examine the state of any finalizers would be after an end-user creates a Confluent Cloud Network and then Confluent Cloud Cluster inside of it.

POST api.confluent.cloud/networking/v1/networks
{
"id": "n-1"
}
POST api.confluent.cloud/cmk/v2/clusters
{
"network": "n-1"
} => {
"id": "c-1"
}
grpcurl networking.v1.FinalizerService.List
[
{
"type": "SOFT",
"from_resource_id": "c-1",
"to_resource_id": "n-1"
},
{
"type": "HARD",
"from_resource_id": "c-1",
"to_resource_id": "n-1"
}
]

If the end-user would attempt network deletion at this moment, the presence of any soft finalizers that may exist on it can be atomically checked in the same database transaction as the actual soft deletion. Let’s examine the state of any finalizers after the cluster is deleted.

DELETE api.confluent.cloud/cmk/v2/clusters
{
"id": "c-1"
}
grpcurl networking.v1.FinalizerService.List
[
{
"type": "HARD",
"from_resource_id": "c-1",
"to_resource_id": "n-1"
}
]

With this state, the network would be able to successfully be deleted by the end-user because no soft finalizers exist. To complete the example, when a certain retention person has elapsed, the cluster service would hard delete the cluster and remove the remaining hard finalizer on the network. This would allow the network to be hard deleted during the next reconciliation loop.

On the surface, this pattern appears to be an elegant solution to the problem. It keeps resources agostic to their children and only requires that they know about their parents. It enables atomic evaluation of dependencies without a remote lock service like Chubby.

Considerations

It’s important that child resources correctly manage the lifecycle of finalizers that they create on parent resources. For example, if the aforementioned cluster service forgets to register a hard finalizer on a network, then that network would attempt hard delete itself once all of the criteria for soft deletion has been met. It’s also important that the finalizer be created before the resource itself is created. Finally, it’s important that services use a pattern like Saga to ensure the absence of dangling finalizers and resources without the appropriate finalizers.

--

--