Homa, a transport protocol to replace TCP for low-latency RPC in data centers

ggm 2021-08-17 04:10:47 +0000 UTC [ - ]

What's missing is a 2021 perspective on why this is or is not useful at scale. Noting, that google does transform edge-IP into some mythic internal protocol which in part explains why google won't do IPv6 direct to some things like GCP: they couldn't, for a given generation of hardware.

Basically: yes, within a DC which is heading to some kind of traefik or HAproxy or other redirection/sharder, this could make sense. So.. how does this 2018 approach stack up in 2021

ithkuil 2021-08-17 09:02:44 +0000 UTC [ - ]

Fwiw IPv6 for Google cloud instances is finally GA (https://cloud.google.com/compute/docs/ip-addresses/configure...)

ggm 2021-08-17 10:58:02 +0000 UTC [ - ]

Forgive me if I am wrong but I believe this is only binding v6 to the outside edge of VMs. Not GKE. And the document notes you cannot connect to Google API services over v6.

I don't want to overdo the curmudgeon thing: I'm really glad they've started to deploy dual stack, it's long b overdue. And remember that Google has been a strong proponent of v6 in Android and in ietf standards across the board.

ithkuil 2021-08-17 12:37:46 +0000 UTC [ - ]

Oh yeah, the IPv6 saga at Google cloud is a long running joke. However this is indeed a decent step forward, in particular this is relevant to the comment I replayed to: in order to reach a VM traffic has to traverse the network fabric, which clearly now supports IPv6.

As for GKE that's another pair of shoes, probably more related to configuring calico and friends and less about limitations in the low level network fabric.

As for Google API, I read somewhere that they disabled it because billing exemption wasn't ready.

hardwaresofton 2021-08-17 09:29:50 +0000 UTC [ - ]

> traefik or HAproxy

A meta note but it struck me that seeing "traefik" where NGINX usually is is pretty fascinating. I rate it highly because of it's support for k8s (I've written before about how wonderful I think it is) but am somewhat unaware of how widely it's known. Guess it's pretty well known at this point if people casually mention it (then again, the audience is HN after all)!

ggm 2021-08-17 10:59:31 +0000 UTC [ - ]

It certainly wasn't up-sold to me when I was building things out. Maybe hiding the nginx light under the bushel because other tools were explicitly written up

Horusiath 2021-08-17 05:08:25 +0000 UTC [ - ]

It targets the properties that Aeron protocol aims for too (https://github.com/real-logic/aeron).

MisterTea 2021-08-17 17:33:31 +0000 UTC [ - ]

In a similar vein, Bell Labs developed the IL protocol in the 90's to facilitate better 9p performance on plan 9. It did not work well over the internet due to latency but was beneficial on local networks. It was most useful for disk servers serving root fs to CPU servers and terminals/workstations.

http://doc.cat-v.org/plan_9/4th_edition/papers/il/

(edit: forgot to mention IL is still usable on plan 9)

Ericson2314 2021-08-17 06:48:44 +0000 UTC [ - ]

I have a feeling the vast majority of traffic should be pub sub / love query not request response, and yet is currently the latter, and this might mean this is shooting for the wrong goalposts.

cprecioso 2021-08-17 09:10:41 +0000 UTC [ - ]

What is love query?

jcelerier 2021-08-17 12:16:37 +0000 UTC [ - ]

I'd guess it's a typo for "live query"

Ericson2314 2021-08-17 15:41:16 +0000 UTC [ - ]

Yes it is, sorry!

cowvin 2021-08-17 20:18:46 +0000 UTC [ - ]

"do you love me?"

hkt 2021-08-17 13:14:33 +0000 UTC [ - ]

Baby, don't hurt me

mosseater 2021-08-17 03:33:07 +0000 UTC [ - ]

Can you summarize why this is better than just using UDP?

wmf 2021-08-17 03:40:29 +0000 UTC [ - ]

UDP doesn't have reliability, flow control, congestion control, etc. Blasting RPCs over UDP can cause poor performance due to congestion.

bcrl 2021-08-17 13:33:34 +0000 UTC [ - ]

The data center bridging extensions ensure that packets won't get lost due to congestion or flow control. They were created in part because fibre channel over ethernet couldn't handle any packet loss in the fabric.

wmf 2021-08-17 21:43:09 +0000 UTC [ - ]

Running lossless with no congestion control leads to high tail latency though.

bcrl 2021-08-18 13:45:24 +0000 UTC [ - ]

Quite true. However, in the case of FCoE, given the performance issues with the Fibre Channel SANs I worked on a few years ago, tail latency in the fabric was the least of the concerns we ran into. Customers would wonder why their persistent messaging rates tanked when the system started pushing messages to disk. Meanwhile their SAN can't even sustain 100MB/s of sequential writes.

bullen 2021-08-17 09:06:00 +0000 UTC [ - ]

On a switch you could probably trust UDP no?

magicalhippo 2021-08-17 09:44:31 +0000 UTC [ - ]

Well, if by "probably trust" you mean you are happy if that prediction fails every now and then, then sure.

If a link is saturated, say via TCP, then UDP data can still get dropped AFAIK.

Searching I found this blog post[1] which takes a small stab at it. Would be interesting to try other setups with more data.

Regardless, using UDP one should always be prepared to handle dropped and out-of-order packets.

https://openmymind.net/How-Unreliable-Is-UDP/

bullen 2021-08-17 11:12:10 +0000 UTC [ - ]

That link actually proves my point, thx!

Nothing is perfect, so I'll keep using UDP on a switch and TCP on the internet.

Hikikomori 2021-08-17 12:33:18 +0000 UTC [ - ]

Has nothing to do with using a switch or not.

If send a 1Gbit udp stream over a switch to another machine there will be no drops if both are connected with 1Gbit, the same is true if you use tcp, and over a router assuming its capable of forwarding that amount of traffic. If you have a third machine sending udp or tcp traffic towards the one receiving the 1Gbit udp stream you'll have drops on both streams. Doesn't matter what protocol you use, if you have congestion you'll have drops. You typically use udp if your application is real time, or if you want to create your own reliability mechanism and/or avoid issues with devices in the middle that does things with tcp.

magicalhippo 2021-08-17 13:05:12 +0000 UTC [ - ]

That was my point. If you design your system so that the links stay well below saturation, then you would not expect to see drops.

However they can still occur, maybe due to some unexpected congestion or other issues. So long as your application can deal with that you should be good.

nitrogen 2021-08-17 13:59:47 +0000 UTC [ - ]

other issues

Like intermittent EMI causing bit flips and checksum failures, which happened to me once in an IoT application where the Ethernet cable to an outbuilding was buried next to the power line, and the network would die whenever the furnace kicked on.

bullen 2021-08-17 14:25:17 +0000 UTC [ - ]

So where can one buy consumer 10Gb/s switches?

magicalhippo 2021-08-17 15:09:22 +0000 UTC [ - ]

MikroTik makes some, like the CRS305-1G-4S+IN[1] which is a 5-port variant at a very reasonable price.

Nice review by ServeTheHome here[2].

[1]: https://mikrotik.com/product/crs305_1g_4s_in

[2]: https://www.servethehome.com/mikrotik-crs305-1g-4sin-review-...

bullen 2021-08-17 17:00:34 +0000 UTC [ - ]

Thx, say I have 4x 1Gb/s old-school eth-port machines that want to saturate their capacity at the same time on one of these, should I just buy 4x 10GBase-T SFP+ transceivers for it?

Edit: Apparently LGS105/LGS108 has 10Gb/s switching capacity allready so I'm good!

2021-08-17 05:33:31 +0000 UTC [ - ]

vlovich123 2021-08-17 03:54:41 +0000 UTC [ - ]

Doesn’t SRTP mean that a continuous stream of small packets will DOS any larger packets? I’m sure I’m missing something about how it works.

tyingq 2021-08-17 04:44:21 +0000 UTC [ - ]

I would guess "SRTP" is just the shorthand for something that's less simple than that sounds. It likely has some "starvation protection" in the same way that a typical priority queue does. Like having in-queue age time bump up the priority.

FrancoisBosun 2021-08-17 04:37:09 +0000 UTC [ - ]

From my understanding, this is total message size, but packet. If a receiver has two incoming messages, a batch upload and a query for a list, the batch upload may have 10MiB to transmit while the other message may only have 1KiB. This, to improve latency, the smaller message should be prioritized.

wmf 2021-08-17 05:39:37 +0000 UTC [ - ]

You can avoid starvation by increasing the priority of a job (an RPC in this case) when it hasn't made progress for a while. Or you can do weighted scheduling where the shortest job has a higher probability of being scheduled (e.g. WDRR) instead of absolute priority.

2021-08-17 09:05:10 +0000 UTC [ - ]

ngcc_hk 2021-08-17 09:12:01 +0000 UTC [ - ]

If one use internal ip 128. 10. Etc, why can’t one use another protocol for internal transfer. That means the internal traffic is cut off not just by ip but also by other internet protocol.

The ospf vs Bgp and within organisation why not …

gardaani 2021-08-17 05:15:37 +0000 UTC [ - ]

Quic is similar and standardized by IETF. https://en.wikipedia.org/wiki/QUIC

KaiserPro 2021-08-17 12:53:33 +0000 UTC [ - ]

Quic is not designed for internal low latency point to point.

Quic is basically the weird semantics of http2 transmuted to UDP with some retransmission logic. But everything is hilariously complex.

zozbot234 2021-08-17 05:22:58 +0000 UTC [ - ]

Quic is not that similar. Although SCTP could be given many of the same properties with a slightly customized implementation.

touisteur 2021-08-17 15:25:45 +0000 UTC [ - ]

Please can you say a bit more? I keep reading about quic as an sctp replacement? Apart from the 'whole thing encrypted' thing I'm wondering what differences there are.

2021-08-17 05:38:25 +0000 UTC [ - ]

baybal2 2021-08-17 06:52:39 +0000 UTC [ - ]

Deterministic ethernet varieties been around for a very long time.

If you have fabric determinism like Infiniband, and capacity reservation on the receiving side, you can just dispose of connection paradigm, flow control, and thus get great deal of performance, while simplifying everything at the same time.

I do not see much use of it though unless you are building something like an airplane.

The uncounted PhD hours spent on getting networks to work well do amount to something.

Dealing with RDMA aware networking is far beyond the ability of typical web developers.

Deterministic Ethernet switches cost a fortune, and are lagging behind the broader Ethernet standard by many years.

Making a working capacity reservation setup takes years to perfect as well.

99.9999...% of web software will most likely *lose* performance if blindly ported to RDMA enabled database, message queue server, or caching.

If you don't know how the upoll, or iouring like mechanisms work, you can not get any benefit out of RDMA whatsoever

I once worked for a subcontractor for Alibaba's first RDMA enabled datacentre.

ergl 2021-08-17 08:03:13 +0000 UTC [ - ]

> Dealing with RDMA aware networking is far beyond the ability of typical web developers.

Homa is designed for traffic inside a DC, same as RDMA or Infiniband. I don't think anyone is proposing to use it for normal web traffic.

eternalban 2021-08-17 13:20:32 +0000 UTC [ - ]

Which partly (besides the Iranians on the Homa team) explains the name of the protocol. Homa is the Persian ~phoenix, a mythical bird that never touches the ground and is always hovering above:

https://en.wikipedia.org/wiki/Homa_(mythology)

touisteur 2021-08-17 15:28:15 +0000 UTC [ - ]

Even inside DCs I thought everything was Web services upon Web services.