Nvme / TCP is excellent in performance and software.
post @ January 04, 2020 09:16 Marge kooij

The most detailed answer for NVMe / TCP technical question .

Nvmexpress, Inc. recently announced the addition of nvme over TCP (nvme / TCP) to the nvme transmission family. Nvme / TCP is a very important development of vnme.

1.Should the official nvme / TCP documentation be expected when the nvme 1.4 specification is finalized?

Nvme / TCP is a transmission binding protocol implementation of NVMe-OF, so we should expect the approved technical proposal TP 8000 to be integrated into the specification document released by nvme of 1.1. Although the nvme board has not yet released a formal timetable, we expect it will be released later this year.

2.What does host need to support NVMe in TCP, hardware, firmware, software, driver, etc?

Nvme / TCP software can run without any special hardware or firmware, although different types of CPUs and network adapters can benefit from better performance. Of course, you need to install NVMe/TCP host software and NVN subsystem software to run NVMe/TCP. These software can be used with Linux kernel like V5.0 and spdk v.19.01 as well as commercial nvme / TCP target devices.

3. Is there any limit to the number of namespaces a host can have at runtime? What resources does the host need ,CPU core, memory, port?

Nvme / TCP does not impose any restrictions on the basic functions of nvme architecture because it is the protocol binding of nvme of transport layer. Therefore, there is no limit to the number of namespaces nvme / TCP can support. From a transport perspective, namespace is a purely logical concept, with no host resources allocated.

4.Will nvme / TCP increase latency for directly connected nvme SSDs?

Only through nvme / TCP, the delay of nvme may not be increased directly. Specific controller implementation may avoid the delay through special upgrade.

5.Which operating system kernel supports nvme / TCP?

Linux kernel 5.0 and above supports nvme / TCP.

6.Is there a significant performance difference when running nvme / TCP on a data plane based network stack such as DPDK?

If the platform running the controller has sufficient functionality, running nvme / TCP on top of the common Linux network stack is not expected to be fundamentally different. However, if the controller does not have enough CPU dedicated to running the Linux network stack. For example, if it has some other operations that require CPU processing, a dpdk based solution may achieve better performance due to increased efficiency.

7.Is it recommended to use data center TCP to run nvme / TCP workloads?

Generally speaking, data center TCP may have better congestion control algorithm than other TCP. Generally, no matter what nvme / TCP is, it has advantages in network traffic control. The question is, if mode congestion occurs in DC networks and modern TCP / IP stacks, do they have other mechanisms to deal with congestion effectively?

8.Does nvme / TCP have multiple E2Ts? How are these compared to buffers in FCP?

In theory, the controller can send multiple lightweight R2T PDUs to the host to obtain specific commands. However, the host has a maximum limit. R2T PDU has the same credit mechanism as fcpbbc in FC, but it runs at nvme command level rather than FC port level.

9.How to manage network traffic is only using R2T and standard TCP congestion window?

Yes, port-to-port flow control is handled by TCP / IP, and nvme transport level flow control is managed by R2T credit mechanism.

10.In SQ, are multiple outstanding requests constrained by PDU sequencing?

There is no constraint, and the PDU corresponding to different nvme commands has no collation defined.

11.In nvme / TCP, how to manage patches and upgrades? Is it non-destructive?

How to use the rollback process of nvme / TCP in a large environment? It is recommended to consult the supplier for specific solutions. Nvme / TCP protocol itself does not impose or prohibit such operation requirements.

12.Is nvme / TCP open source project available?

Yes, Linux and spdk both include the target implementation of nvme / TCP

13.Is there an equivalent nvme / TCP implementation in iSCSI?

There is no equivalent implementation of nvme / TCP in iSCSI, but there are many equivalent concepts. Nvme / TCP and iSCSI are equivalent in a sense, that is, iSCSI is the SCSI transmission running through TCP / IP, while nvme / TCP is the nvme transmission running through TCP / IP.

14.How do nvme / TCP compare with nvme / FC in performance like bandwidth, IOPs, latency, and so on?

I have not tested any nvme / FC product or open source implementation, nor seen any similar nvme / FC performance benchmark. However, compared with the connected nvme, how nvme / TCP and nvme / FC will have a relatively small decline.

15.Is there CPU utilization data for nvme / roce and nvme / TCP?

There is no official data, but nvme / TCP software needs more CPU resources than nvme / RDMA, because nvme / RDMA will unload part of the transmission protocol to the hardware. In addition, it depends on the workload and the stateless offload implemented by the network adapter.

16.Compared with nvme / RDMA, what are the advantages and disadvantages of using nvme / TCP? Is there a performance difference?

Nvme / TCP is just a transport binding, which can provide commercial hardware advantages and good scalability. RDMA can be supported without modifying the network infrastructure. Nvme / RDMA can have lower latency, lower CPU utilization depending on implementation and stateless offload effects. When deciding the investment, we should weigh the performance difference, cost, scale and other factors.

17.How is nvme / TCP different from nvme / RDMA? Can these two kinds of flow be combined on the same Ethernet 100GB / s network?

First of all, nvme / TCP is different from nvme / RDMA in that it runs nvme of encapsulation data on TCP / IP, while nvme / RDMA runs nvme of encapsulation and data through roce also named Infiniband over UDP or iwarp also called as TCP and DDP and MPA.

Of course, both nvme / TCP and nvme / RDMA run over Ethernet, so they can run on the same Ethernet 100GB / s network.

18.What is the maximum tolerable delay of nvme PDU in the data center or private cloud and on the Ethernet switch across the geographically distributed data center or private cloud?

Nvme / TCP does not specify the maximum delay. In fact, the network delay is not a problem. The default value of nvme keep alive timeout is two minutes.


Nvme / TCP transport binding specification has been publicly downloaded. TCP is a new transport protocol added to the existing nvme transport series, except PCIe, RDMA and FC. Nvme / TCP defines the mapping of nvme queues, nvme of encapsulation and data transmission on IETF transmission control protocol.  Nvme / TCP transport provides optional enhancements such as inline data integrity and online transport layer security.

What's exciting about nvme / TCP is that it can realize efficient port-to-port nvme operation between nvme of host and nvme of controller devices. These devices only need any standard IP network interconnection and have excellent performance and delay characteristics. This enables large-scale data centers to take advantage of their existing Ethernet infrastructure, multi-layer switch topology and traditional Ethernet network adapters.

In terms of software, nvme / TCP host and controller device drivers can also be directly used in Linux kernel and spdk environment. Both nvme / TCP implementations are designed to seamlessly insert them into their existing nvme and nvme of software stacks.

post @ January 04, 2020 09:16 Marge kooij views(211) comment(0)
Tags: {name}

Nickname* :
Email* :
Comment* :
Popular Posts