January 15th 2023

The emergency of NVMe is also to solve the current problem.

The emergency of any technology is to solve the current problems. The emergency of NVMe is also to solve the current problem. The problem is the contradiction between the increasing performance of storage medium and the poor performance of transmission path. Based on the excellent performance of SSD, but the performance of SAS and SATA interfaces has no essential improvement.

At present, SAS and SATA based on SCSI protocol can only be a single queue, and the depth of each queue is relatively low, which is 254 and 32 respectively. The NVMe protocol has been considered this at the beginning of its design. Its maximum number of queues can be 64K where there are 65535 command queues and 1 management queue,  and the depth of each queue can be as high as 64K. Compared with the SCSI protocol, it’s like the difference between a rural  path and a two-way eight lane highway.

highway of nvme

The basic principle of NVMe

In order to understand the relationship between host and NVMe equipment, we simplify the internal structure of NVMe devices. As shown in Figure 2 is the diagram in the NVNe white paper, where the host is called the host and the nvme device is called the controller. The host and the controller interact through a queue of shared memory.

internal structure of NVMe devices

Nvme queues are divided into two types, one of which is for management, called admin queue. There is only one. The other one is Command Queue, there are 65535 at most.The number and mode of command queues are set through management queues.Each of these queues is actually a queue pair, that is, it includes two queues: submission queue and completion queue.The submission queue is used by the host to send nvme commands to the nvme device, and the completion queue is used by the nvme device to feedback the command execution to the host. In fact, there is another mode of nvme, that is, multiple submission queues share the same completion queue. We will not introduce here.

NVMe queue and command processing

As we know above, nvme transmits control commands and commands through queues. What is the queue entity here? In fact, the submission queue and completion queue are just one area of memory. In the principle of data structure, the queue here is actually a ring buffer, as shown in Figure 3.

nvme transmits control commands

ring buffer

The command format of NVMe

We have introduced how command sends and the processing, let’s see what does NVMe command look like.As shown in Figure 5, the specific format of nmve command. If you understand the TCP / IP protocol or SCSI protocol, it will be quite easy to understand this figure. In Figure 4, each line has 8 bytes, and the total command size is 64 bytes.

Nvme command format

In this command format, there are several fields that are relatively complex. It is difficult to understand. We will  not intend to introduce all the details. We briefly introduces several key fields of the command format. Where command identifier identifies a specific command. Namespace identifier means the namespace to which the command is sent. Data point 1 and data point 2 are used to identify the specific location of the data.

The two point should be mentioned.

  • Nvme can have multiple namespaces under a controller, which are identified by namespace ID.
  • Command and data are separated, and the data is not behind the command like TCP.

Let’s focus on command identifier, which takes up 4 bytes. Although there are only four bytes, they are divided into three parts and six parts, as shown in Figure 6.

Command identification format

Let’s introduce the meaning of each field in the order of low order to high order.

  • OPC: the full name is opcode, which is the opcode of the executed command. Specifically, what do you want the controller to do, such as read data, write data or brush write, etc.

FUSE: The full name is Fused Operation, which is used to identify whether the command is a normal command or a compound command. Figure 8 is a description of this field in the white paper.

the definition of FUSE

The definition of FUSE

    • Psdt: the full name is PrP or SGL for data transfer, which is used to describe the organization of memory storing data

    The function of NVMe

    Finally, let’s look at the performance comparison between nvme and SAS and SATA storage devices. In order to avoid advertising suspicion, this paper describes the manufacturer and type of the equipment on the layout.

    performance comparison

    performance comparison

    Through the figure above, we can clearly see the performance difference between SAS and SATA devices and nvme devices, especially for read operation, nvme has absolute performance advantage.

    Leave a comment

    Back to Top
    Product has been added to your cart