NVMe Over Fabric adapts the Fabric technology of RDMA or Fiber Channel system and so on to replace of PCIe transmission. As shown in the figure, in addition to the transmission based on RDMA system, including ROCE, Infiniband and iWARP, of course, it is possible to use the transmission based on native TCP rather than RDMA. Up to July 2018, TCP technology is still developing.
RDMA figure and FC Fabric NVMe Structure
The NVM subsystem in the figure is the collection of one or more physical interface, and each individual controller is usually connected to a single port. Multiple controllers can share a port. Although ports of NVM subsystem are allowed to support different NVMe transmissions, in fact, a single port may only support a single transmission type.
it should be mentioned that NVM subsystem includes one or more controllers, one or more name space, on or more PCI Express ports, nonvolatile memory and the interface between controller and nonvolatile memory.
The following figure is an example of storage array, the array is composed of NVM subsystem connected to three hosts through FC structure.
the example array is composed of NVM subsystem connected to three hosts through FC structure.
In general, NVM subsystem is the collection of one or more NVMe controllers, whose maximum capacity is 64K. It is used to access the namespace associated with one or more hosts through one or more NVM subsystem ports. In fact, the number of subsystem controllers or subsystem ports is often very small.
NVMe Over Fabrics is also based on NVMe structure, and includes collection of commands and the interface of queuing. In addition to Admin and I/O command, it also supports Fabric command. NVMe-Of is different from basic NVMe specification in some aspects. For example, NVMe-OF isn't allowed to interrupt. Because Interrupt in NVMe is only limited to the architecture of NVMe over PCIe, there is Interrupt in the architecture of Nvme over Fabric.
it should be mentioned that you should see the NVMe over fabric 1.0 specification for a complete list of differences between NVMe Over Fabrics and basic specifications of NVMe.
Controller just can be connected to one host, while port can be shared. NVMe allows host to connect to several controller in NVM subsystem by same port or different ports.
NVMe -OF supports discovery service. Host can a host can obtain a list of NVM subsystems with host accessible name space by the discovery mechanism, including the ability to discover multiple paths to the NVM subsystem. NVMe Identify Admin commands is used to identify the namespace of controller.
As we talk about before, NVMe specification supports multi-channel I/O and namespace share. Although the definitions of multi-channel of I/O, namespace share, multi-host connection and reserved space are different, they will be described together for convenience. They are somewhat correlated when it comes to multi host namespace access, especially when using NVMe reservations. A brief description of these concepts is provided below.
Namespace share is the ability that two or more hosts use different NVMe controller to invite namespace. Namespace share needs that the NVM subsystem should has two or more controllers.
The following figure is the example that two NVMe controllers are connected each other by two NVM subsystem ports. In the example, namespace B is shared by two controllers. Nvme operations can be used to coordinate access to shared name space. The controller associated with the shared namespace can operate on the namespace at the same time. You can use a globally unique identifier or the namespace ID associated with the namespace itself to determine when multiple paths to the same shared namespace exist.
NVM subsystem doesn't need to attach the same namespace to all controllers. in the picture, only namespace B can be shared and connected to controller.
it should be mentioned that the current NVMe specification does not specify namespace share across NVM subsystem, which is settled down in NVMe 1.4 draft specification.
Example with private port access to shared namespace
NVMe multi-channel I/O is two or more completely independent paths between a single host and a namespace. Each path uses its own controller, although multiple controllers can share subsystem ports. Namespace share and multi-path I/O require that NVM subsystem should have two controller at least.
As the example in the following figure, host A has two paths through controller 1 and controller 2. At present, NVMe Standards Technical Committee is working on a draft specification for multipath I / O.
multi-host connection and reservation
Nvme reservation, similar to SCSI-3 persistent reservation, can be used to provide two or more hosts to coordinate access to the shared namespace. NVMe reservation in namespace limits the invitation of host.
For example, VMware ESXi supported by the driver can use NVMe subscriptions to support Microsoft Windows server failover clustering using VMS.
Name reservation needs the connection between host and namespace. The controller in multi-path I/O or namespace share is only connected to one host. As shown in the following figure, host can be connected with multi controllers by registering the same host ID with each controller it is associated with.
it should be noted that controller can support one of two formats for the only identification host ID.
1) 64 bit host identifier
2) Extended 128 bit host identifier; NVMe Over Fabrics needs the format of extended 128 bit.
As shown in the following example, host A is connected two controllers, while host B is only connected to single controller. Host identifier like host ID A allows the controller where NVMe subsystem identification is connected to the same host like host A, and reserves these reserved properties across these controllers.
Multi host access to shared namespace
NVMe-OF is a fact standard for extending the NVMe architecture over mainstream interconnects in an extensible way. The purpose of this standard is to enable non-volatile storage to quickly transfer data between the host computer and the target SSD device or system through a network based on message based commands. Key benefits include improved performance, reduced network latency and bottlenecks.
One of the more interesting development is the new transmission binding between NVMe and TCP. For developer, the benefit is to migrate NVME technology to the Internet small computer system interface. Nvme-OF or TCP is a good choice for enterprises that want to take advantage of their Ethernet infrastructure and avoid the complexity of remote direct memory access protocol.
The transmission independence of NVME-OF means that NVME-OF can support all transmission. At present, there are several mainstream transmission modes: RoCEv2, iWAP, Infiniband and FCoE. Some of these transports are bound using RDMA protocol included in our specification, but currently NVMe related organizations are adding TCP to meet the market demand.
The industry is optimistic about the NVMe-OF/ TCP standard, which is supported by many industry leaders, including Facebook, Google, Dell EMC, Intel and other companies.
external memory market has already adopted NVMe-OF technology, we hope enterprise clients can use and deploy it in these high-performance APPs. At present, we have seen top suppliers, including Broadcom, Cisco, Intel, IBM, etc., and announced the launch of NVMe-OF solutions.
the future of NVMe-OF in enterprise is brilliant, and the emerging computer market needs NVMe-OF technology.
Artificial intelligence, machine learning and real-time analysis all require lower latency and faster throughput provided by NVMe-OF. Nvme-OF technology has many advantages, can meet the new application requirements. On the server side, NVMe-OF reduces the length of the operating system's storage stack, enabling more efficient connections. In the storage array, the path through the target stack is shorter, which improves the performance of the array.
however, one of the most important benefits is that NVMe-OF uses the original technology of storage array, which can accelerate the solution to market by moving from SAS / SATA drives to NVMe SSDs.
here is the end. If you want more technology detail information, you can read the e-book of Deep Analysis of NVMe technical standards and Principles . following are the detail information and content.
The summary of NVMe technology and APPs 6
The interpretation of NVMe standard terms 19
Analysis of NVMe Over Fabric command 47
Discovery processing 50
connection processing 57
data transmission process 61
Nvme / nvme over fabric flow control processing
Nvme security authentication mechanism
Stream data stream
Accelerate backstage operation
The realization of NVMe transmission binding
The principle of Sanitize mechanism
Analysis of reservations mechanism
Keep Alive mechanism
10NVMe Virtualization mechanism