Awesome Conferences

Google's BBR fixes TCP's dirty little secret

Networking geeks: Google made a big announcements about BBR this week. Here's a technical deep-dive: http://queue.acm.org/detail.cfm?id=3022184 (Hint: if you would read ACM Queue like I keep telling you to, you'd have known about this before all your friends.)

Someone on Facebook asked me for a "explain it like I'm 5 years old" explanation. Here's my reply:

Short version: Google changed the TCP implementation (their network stack) and now your youtube videos, Google websites, Google Cloud applications, etc. download a lot faster and smoother. Oh, and it doesn't get in the way of other websites that haven't made the switch. (Subtext: another feature of Google Cloud that doesn't exist at AWS or Azure. Nothing to turn on, no extra charge.)

ELI5 version: TCP tries to balance the need to be fast and fair. Fast... transmitting data quickly. Fair... don't hog the internet, share the pipe. Being fair is important. In fact, it is so important that most TCP implementations use a "back off" algorithm that results in you getting about 1/2 the bandwidth of the pipe... even if you are the only person on it. That's TCP's dirty little secret: it under-utilizes your network connection by as much as 50%.

Backoff schemes that use more than 1/2 the pipe tend to crowd out other people, thus are unfair. So, in summary, the current TCP implementations prioritize fairness over good utilization. We're wasting bandwidth.

Could we do better? Yes. There are better backoff algorithms but they are so much work that they are impractical. For years researchers have tried to make better schemes that are easy to compute. (As far back as the 1980s researchers built better and better simulations so they could experiment with different backoff schemes.)

Google is proposing a new backoff algorithm called BBR. It has reached the holy grail: It is more fair than existing schemes. If a network pipe only has one user, they basically use the whole thing. If many users are sharing a pipe, it shares it fairly. You get more download speed over the same network. Not only that, it doesn't require changes to the internet, just the sender.

And here's the real amazing part: it works if you implement BBR on both the client and the server, but it works pretty darn good if only change the sender's software (i.e. Google updated their web frontends and you don't have to upgrade your PC). Wait! Even more amazing is that it doesn't ruin the internet if some people use it and some people use the old methods.

They've been talking about it for nearly a year at conferences and stuff. Now they've implemented it at www.google.com, youtube.com, and so on. You get less "buffering.... buffering..." even on mobile connections. BBR is enabled "for free" for all Google Cloud users.

With that explanation, you can probably read the ACM article a bit easier. Here's the link again: http://queue.acm.org/detail.cfm?id=3022184

Disclaimer: I don't own stock in Google, Amazon or Microsoft. I don't work for any of them. I'm an ex-employee of Google. I use GCP, AWS and Azure about equally (nearly zero).

Posted by Tom Limoncelli in Google

No TrackBacks

TrackBack URL: https://everythingsysadmin.com/cgi-bin/mt-tb.cgi/2115

1 Comment | Leave a comment

Also available on Linux 4.9.
I enabled it on my Debian server a few weeks ago. I didn't notice any difference but I have hints that it's a bit better (bounce rate dropped a little bit on my website, especially on mobile. There might be some other reason that explains that variation, though).
BBR increases the initcwnd faster than other algorithms, from what I remember. But I have set my initcwd to a bigger value than the default one, so maybe people with a default initcwnd will see a bigger change.

Leave a comment

Credits