A Quick Guide to Kubernetes Persistent Storage (Gilad David Maayan )
What Is Persistent Storage?
Persistent storage, or nonvolatile storage, involves using data storage devices that can retain data after the device is powered off. The Kubernetes architecture initially offered only ephemeral storage volumes that were bound to the lifecycle of containers.
This type of storage works for stateful applications but does not allow you to retain data beyond the scope of the container. Later versions of Kubernetes storage introduced persistent storage volumes that enable you to build stateful applications.
Persistent Storage in Kubernetes
Containers are ephemeral by nature, meaning that they are started for a specific purpose and shut down when complete. On their own, containers don’t hold state data, and a new container instance has no “memory” of previous instances. A container does have storage, but it is known as “ephemeral storage” which is wiped out as soon as the container shuts down.
As developers adopt containers for additional use cases, there is a need to manage persistent storage as part of containerized applications. For example, a developer might want to run a database in a container, and have the data stored in a volume that persists even after the container shuts down.
Kubernetes, the popular container orchestrator, provides numerous management capabilities for groups of containers, called clusters. One of those capabilities is the ability to manage persistent storage. Kubernetes persistent storage allows administrators to maintain several types of persistent and non-persistent data in a Kubernetes cluster. Storage resources can then be used dynamically by multiple applications running on the cluster.
Kubernetes provides two primary mechanisms to help manage persistent storage:
Persistent Volume (PV)—PV is a storage element, either defined manually or created dynamically based on a storage class. It has an independent lifecycle, which is not determined by the lifecycle of Kubernetes pods. A pod can mount a PV, but when the pod shuts down, the PV remains and its data can still be accessed. Each PV can have custom properties such as type of disk, storage tier, or performance.
Persistent Volume Claim (PVC)—PVC is a storage request by a Kubernetes user. Any application running on a container can request storage, specifying the size and other characteristics of the storage it needs, based on the custom parameters (for example, SSD storage). The Kubernetes cluster can then provision a PV based on available storage resources.
Lifecycle of PV and PVC
Kubernetes offers the following provisioning options for persistent storage:
Static provisioning—cluster administrators can create PVs that offer storage resources. These static PVs remain in the Kubernetes API, available for future use. Once a PVC is made, the PV provides the requesting pod with storage.
Dynamic provisioning—if a static PV does not match a newly made PVC, the cluster tries to automatically provision a PV created according to the required storage class. Dynamic provisioning can work only if the storage classes are specified and configured in advance. Additionally, administrators need to enable the access controller for DefaultStorageClass in the API Server.
Here is how binding works in Kubernetes:
A user creates a PVC—defining the requested storage size and access mode.
The master matches resources—using a control loop, the master monitors new PVCs and looks for matching PVs. If matching PVs exist, the master binds the PV and PVC. Alternatively, it can also bind a dynamically-provisioned PV with the new PVC.
This process works to ensure that each request is fulfilled. As a result, users may receive a volume that exceeds their request.
Here are the two possible outcomes of this process:
The matching PV and PVC are bound and remain together indefinitely.
A PVC with no matching PV remains unbound indefinitely. It may be bound only when a matching PV is made available.
Here is what you should know about using PVCs:
When a pod attempts to use a PVC as a volume, the cluster locates the bound PV and mounts it to the pod.
A PVC user can specify the desired access method. If the volume supports multiple access, the user can specify the desired mode when using a PVC as a volume.
Once a user takes a bound PVC, the bound PV belongs to this user only. The user can then access the PV via the PVC in the pod’s storage volume.
A user that no longer needs a volume can delete this PVC object from the API that enables reclamation of the storage resource. A PV has a reclaim policy that defines what the cluster can do with the volume after the claim is released. Currently, volumes can either be deleted or retained.
Kubernetes Persistent Storage in the Cloud
Amazon Web Services (AWS) offers a cloud-based Elastic Block Store service designed for persistent storage. The service offers high durability by replicating EBS volumes automatically inside their allocated availability zone (AZ).
You can use EBS volumes as persistent storage for your Kubernetes clusters. The service lets you provision EBS volumes in ReadWriteOnce access mode, mounting them on one node at a time. AWS offers the following types of volumes:
HDD volumes—available as throughput optimized (st1) and cold (sc1).
SSD volumes—available as provisioned IOPS (io1) and general purpose (gp2).
Each storage class offers different configurations and pricing structures.
Kubernetes supports two Azure cloud storage services:
Azure Files—a fully managed file shares service. It supports all three access modes, including ReadWriteOnce, ReadWriteMany, and ReadOnlyMany.
Azure Managed Disks—offers durable, cloud-based block storage. The service supports only one access mode—ReadWriteOnce.
You can use Azure Managed Disks and Azure Files as persistent storage for your Kubernetes clusters. Additionally, you can use Azure Disks to create a Kubernetes DataDisk resource.
Google Cloud offers persistent disks that you can use as persistent volumes for your Kubernetes clusters. The service supports two access modes:
ReadOnlyMany—these volumes can be attached to multiple nodes.
ReadWriteOnce—these volumes can be attached to a single node.
Google Cloud offers the following types of persistent disks:
Standard PD—offers HDD and standard throughput. It is generally considered the most cost-effective option. You can use Standard PD for scale-out analytics with Kafka and Hadoop and cost-sensitive applications.
Balanced PD—offers SSD at the best rate per GB. It is ideal for regular workloads like web serving, line of business apps, and boot disks.
Performance PD—offers SSD at the best rate per input/output operations per second (IOPS). It is ideal for performance-sensitive applications like databases, scale-out analytics, and caches.
Extreme PD—offers SSD optimized for applications with high-performance requirements, such as SAP HANA and Oracle.
Local SSD—offers very low latency. It is ideal for hot caches offering the best performance for analytics or media rendering.
In this article I explained the basics of Kubernetes persistent storage and covered key concepts including:
Persistent Volumes (PVC)
Persistent Volume Claims (PVC) and the binding process
Kubernetes persistent storage in popular public clouds - AWS, Azure, and Google Cloud
I hope this will be useful as you take your first steps in Kubernetes storage management.