Non-Volatile Memory Express. Non-volatile memory standard is a specification of SSD using PCI-E channel.

post @ December 18, 2019 12:07 Ella Maclin

NVMe Over Fabric adapts the Fabric technology of RDMA or Fiber Channel system and so on to replace of PCIe transmission. As shown in the figure, in addition to the transmission based on RDMA system, including ROCE, Infiniband and iWARP, of course, it is possible to use the transmission based on native TCP rather than RDMA. Up to July 2018, TCP technology is still developing.

 PCIe transmission

RDMA figure and FC Fabric NVMe Structure

The NVM subsystem in the figure is the collection of one or more physical interface, and each individual controller is usually connected to a single port. Multiple controllers can share a port. Although ports of NVM subsystem are allowed to support different NVMe transmissions, in fact, a single port may only support a single transmission type.

it should be mentioned that NVM subsystem includes one or more controllers, one or more name space, on or more PCI Express ports, nonvolatile memory and the interface between controller and nonvolatile memory.

The following figure is an example of storage array, the array is composed of NVM subsystem connected to three hosts through FC structure.

PCI Express ports

the example array is composed of NVM subsystem connected to three hosts through FC structure.

In general, NVM subsystem is the collection of one or more NVMe controllers, whose maximum capacity is 64K. It is used to access the namespace associated with one or more hosts through one or more NVM subsystem ports. In fact, the number of subsystem controllers or subsystem ports is often very small.

NVMe Over Fabrics is also based on NVMe structure, and includes collection of commands and the interface of queuing. In addition to Admin and I/O command, it also supports Fabric command. NVMe-Of is different from basic NVMe specification in some aspects. For example, NVMe-OF isn't allowed to interrupt. Because Interrupt in NVMe is only limited to the architecture of NVMe over PCIe, there is Interrupt in the architecture of Nvme over Fabric.

it should be mentioned that you should see the NVMe over fabric 1.0 specification for a complete list of differences between NVMe Over Fabrics and basic specifications of NVMe.

Controller just can be connected to one host, while port can be shared. NVMe allows host to connect to several controller in NVM subsystem by same port or different ports.

NVMe -OF supports discovery service. Host can a host can obtain a list of NVM subsystems with host accessible name space by the discovery mechanism, including the ability to discover multiple paths to the NVM subsystem. NVMe Identify Admin commands is used to identify the namespace of controller.

As we talk about before, NVMe specification supports multi-channel I/O and namespace share. Although the definitions of multi-channel of I/O, namespace share, multi-host connection and reserved space are different, they will be described together for convenience. They are somewhat correlated when it comes to multi host namespace access, especially when using NVMe reservations. A brief description of these concepts is provided below.

namespace share

Namespace share is the ability that two or more hosts use different NVMe controller to invite namespace. Namespace share needs that the NVM subsystem should has two or more controllers.

The following figure is the example that two NVMe controllers are connected each other by two NVM subsystem ports. In the example, namespace B is shared by two controllers. Nvme operations can be used to coordinate access to shared name space. The controller associated with the shared namespace can operate on the namespace at the same time. You can use a globally unique identifier or the namespace ID associated with the namespace itself to determine when multiple paths to the same shared namespace exist.

NVM subsystem doesn't need to attach the same namespace to all controllers. in the picture, only namespace B can be shared and connected to controller.

it should be mentioned that the current NVMe specification does not specify namespace share across NVM subsystem, which is settled down in NVMe 1.4 draft specification.

NVMe specification

Example with private port access to shared namespace


NVMe multi-channel I/O is two or more completely independent paths between a single host and a namespace. Each path uses its own controller, although multiple controllers can share subsystem ports. Namespace share and multi-path I/O require that NVM subsystem should have two controller at least.

As the example in the following figure, host A has two paths through controller 1 and controller 2. At present, NVMe Standards Technical Committee is working on a draft specification for multipath I / O.

multi-host connection and reservation

Nvme reservation, similar to SCSI-3 persistent reservation, can be used to provide two or more hosts to coordinate access to the shared namespace. NVMe reservation in namespace limits the invitation of host.
For example, VMware ESXi supported by the driver can use NVMe subscriptions to support Microsoft Windows server failover clustering using VMS.

Name reservation needs the connection between host and namespace. The controller in multi-path I/O or namespace share is only connected to one host. As shown in the following figure, host can be connected with multi controllers by registering the same host ID with each controller it is associated with.

it should be noted that controller can support one of two formats for the only identification host ID.

1) 64 bit host identifier

2) Extended 128 bit host identifier; NVMe Over Fabrics needs the format of extended 128 bit.

As shown in the following example, host A is connected two controllers, while host B is only connected to single controller. Host identifier like host ID A allows the controller where NVMe subsystem identification is connected to the same host like host A, and reserves these reserved properties across these controllers.

NVM subsystem

Multi host access to shared namespace

NVMe-OF is a fact standard for extending the NVMe architecture over mainstream interconnects in an extensible way. The purpose of this standard is to enable non-volatile storage to quickly transfer data between the host computer and the target SSD device or system through a network based on message based commands. Key benefits include improved performance, reduced network latency and bottlenecks.


One of the more interesting development is the new transmission binding between NVMe and TCP. For developer, the benefit is to migrate NVME technology to the Internet small computer system interface. Nvme-OF or TCP is a good choice for enterprises that want to take advantage of their Ethernet infrastructure and avoid the complexity of remote direct memory access protocol.

The transmission independence of NVME-OF means that NVME-OF can support all transmission. At present, there are several mainstream transmission modes: RoCEv2, iWAP, Infiniband and FCoE. Some of these transports are bound using RDMA protocol included in our specification, but currently NVMe related organizations are adding TCP to meet the market demand.

The industry is optimistic about the NVMe-OF/ TCP standard, which is supported by many industry leaders, including Facebook, Google, Dell EMC, Intel and other companies.

external memory market has already adopted NVMe-OF technology, we hope enterprise clients can use and deploy it in these high-performance APPs. At present, we have seen top suppliers, including Broadcom, Cisco, Intel, IBM, etc., and announced the launch of NVMe-OF solutions.

the future of NVMe-OF in enterprise is brilliant, and the emerging computer market needs NVMe-OF technology.

Artificial intelligence, machine learning and real-time analysis all require lower latency and faster throughput provided by NVMe-OF. Nvme-OF technology has many advantages, can meet the new application requirements. On the server side, NVMe-OF reduces the length of the operating system's storage stack, enabling more efficient connections. In the storage array, the path through the target stack is shorter, which improves the performance of the array.

however, one of the most important benefits is that NVMe-OF uses the original technology of storage array, which can accelerate the solution to market by moving from SAS / SATA drives to NVMe SSDs.

here is the end. If you want more technology detail information, you can read the e-book of Deep Analysis of NVMe technical standards and Principles . following are the detail information and content.

The summary of NVMe technology and APPs 6

  • 1.1the unique advantage of NVMe technology 7
  • 1.2the summary of NVMe-OF technology
  • 1.2.1 NVMe over FC8
  • 1.2.2 NVMe Over Ethernet and InfiniBand8
  • 1.2.3 NVMe over TCP8
  • 1.3 the analysis of NVMe data centre APP status 8
  • 1.3.1 Dell EMC (PowerMax) 8
  • 1.3.2 E8 storage company and E8 equipment and software 9
  • 1.3.3 Excelero Inc (NVMesh)10
  • 1.3.4 IBM (FlashSystem 9100)11
  • 1.3.5 NetApp (AFF A800 and EF570)12
  • 1.3.6Pure Storage(FlashArray and FlashBlade)16
  • 1.3.7Vexata company (VX-100M and VX-100F)17

The interpretation of NVMe standard terms 19

  • 2.1the unique advantage of NVMe technology 19
  • 2.2 The introduction of nvm subsystem 19
  • 2.2.1physical port 19
  • 2.2.2 NVM subsystem port 20
  • 2.3transmission port 23
  • 2.3.1NVM controller
  • 2.3.2dynamic controller 23
  • 2.3.3 persist controller 24
  • 2.4 discovery process controller 25
  • 2.5 discovery services subsystem 26
  • 2.6 discover log page 26
  • 2.7 analyze namesapce 26
  • 2.7.1 namespace 27
  • 2.7.2 build namespace 27
  • 2.7.3 delete namespace 28
  • 2.7.4 add and relieve namespace 28
  • 2.7.5 namespace identifier 28
  • 2.7.6 the format of namespace 29
  • 2.8 Association 30
  • 2.9 connection mechanism 31
  • 2.10 NVMe conception structure 31
  • 2.11 the unit of capsule data change 32
  • 2.13 Properties 33
  • 2.14 types of Fabric Command34
  • 2.15 Host ID and Host NQN34
  • 2.13 Host, controller and Namespace35
  • 2.14 Nvme subsystem preset conditions

Analysis of NVMe Over Fabric command 47

  • 4.1 analysis of command field 47
  • 4.1.1 analysis of Fabric Command field 47
  • 4.1.2 Admin/IO Command47
  • 4.2 analysis of command response field 48
  • 4.2.1 Fabric Response field 48
  • 4.2.2 Admin/IO response field 48

Discovery processing 50

  • 5.1 the original process of Discovery 50
  • 5.2 Discovery Log Page51
  • 5.3 Discovery termination mechanism 51

connection processing 57

data transmission process 61

  • the general introduction of data transmission 61
  • 7.2 Capsule the transmission unit 61
  • 7.2.1 the size of Command Capsule 62
  • 7.2.2 the structure of Command Capsule 62
  • 7.2.3 the structure of Response Capsule 64
  • 7.2.4 In Capsule transmission way 64
  • 7.2.5 In Memory transmission way 65
  • 7.2.6 Out of Order transmission way 66
  • 7.3 transmission command and process 66
  • 7.3.1 MVM Read Command 66
  • 7.3.2 MVM Write Command 68
  • 7.4 SLG hash table 69
NVMe metadata

  • 8.1 the definition of NVMe metadata 71
  • 8.2 the transmission of NOF metadata 72
  • 8.2.1 data transmission of In Capsule when data are aligned 72
  • 8.2.2 data transmission of In Capsule when data are not aligned 73
  • 8.2.3 Data transmission of In Memory when SGL is in memory 74
  • 8.2.3 Data transmission of In Memory when SGL is not in memory 74

Nvme / nvme over fabric flow control processing

Nvme security authentication mechanism

  • 10.1 safety certificate summary 77
  • 10.2 relative commands 78
  • 10.3 certificate process 78

Stream data stream

  • 11.1 Stream summary 80
  • 11.2 Stream command 80
  • 11.3 Stream configuration and realization 82

Accelerate backstage operation

  • 12.1 ABO summary 84
  • 12.2 ABO formats 84
  • 12.3 ABO status inquiry 84
  • 12.4 Start / stop host triggered ABO 85
  • 12.5 ABO Parameter configuration 85

The realization of NVMe transmission binding

The principle of Sanitize mechanism

  • 14.1 Sanitize summary 85
  • 14.2 format differences between Sanitize and Format 86
  • 14.3 Sanitize operation range 88
  • 14.4 Sanitize operation mode 89
  • 14.5 Sanitize status mechanism 89
  • 14.6 Sanitize command 90

Analysis of reservations mechanism

  • 15.1 Reservations summary 90
  • 15.2 Reservation role 91
  • 15.3 Reservation type 92
  • 15.4 Reservations Operational flow graph 93
  • 15.5 Reservations supporting condition 94
  • 15.6 The realization and relative command 95

Keep Alive mechanism

  • 16.1 Keep Alive background 103
  • 16.2 function summary 103
  • 16.3 operation range 103
  • 16.4 Keep Alive realization 104
Interrupt mechanism
  • 17.1 Interrupt Concrete realization 104
  • 17.2 Interrupt Aggregation 105
  • 17.3 Nvme interrupt mapping mode 105

10NVMe Virtualization mechanism

  • 18.1 Virtualization mechanism summary 108
  • 18.2 Virtualization Application scenario 109
  • 18.3 Virtualization concrete realization 109
  • 18.3.1 Primary Controller109
  • 18.3.2 Secondary Contgroller109
  • 18.3.3 Privileged Actions109
  • 18.3.4 Virtualization command management 110
  • 18.3.5 Seconary Controller Command110
  • 18.3.6 Resource resource allocation 110
  • 18.3.7 Virtual Queue 112
  • 18.3.8 Virtual Interrupt112
post @ December 18, 2019 12:07 Ella Maclin views(682) comment(0)
Tags: #

Nickname* :
Email* :
Comment* :