Off-Call

It's The Network (part 3)

Episode Summary

Paige and Leon cover need to know networking concepts for developers and answer the very important questions “how much has networking really changed in the last 5-10 years?” AND “Has Kubernetes made any of this stuff easier?!”...

Episode Notes

Overview

00:45 | Network Observability FTW!
07:09 | How has networking changed in the last 5-10 years?
10:28 | Has Kubernetes made network engineers' lives easier?

Paige and Leon cover need to know networking concepts for developers and answer the very important questions “How much has networking really changed in the last 5-10 years?” AND “Has Kubernetes made any of this stuff easier?!”

Leon’s Links

Recommended Resources

Episode Transcription

Paige Cruz: Hello there, and welcome back to Off-Call, the podcast where we meet the people behind the pagers. You're listening to the final installment of my delightful conversation with networking whiz, Leon Adato of Kentik. In part two, we left off discussing how SLOs can help separate truly pageable issues from those that can wait until morning.

Today, in part three, we cover need to know networking concepts for developers. And I ask Leon, how much has networking really changed in the last 5-10 years? And, has Kubernetes made any of this stuff easier? Enjoy!

[00:00:45] Network Observability FTW!
Paige Cruz: I've got a couple questions to close us out. Specifically talking networking. You work for Kentik, a lovely network observability monitoring provider. This question is for the developers out there. If they're, if I'm a developer and I'm unfamiliar with infrastructure and operations, I have ignored as much Kubernetes as possibly I could handle.

But now I'm like, hmm, I want to help the Leon's and Paige's of the world. I want to help my network operations team. I want to be that person being the hero with a beautiful detailed bug report exactly showing that it totally is the network. What is the data that I can use and what should I be familiar with?

I'm looking around my monitoring tool. What metrics, what graphs? Talk us through this.

Leon Adato: Okay, so first of all, this is where I'm going to make a pitch for network observability, which a lot of people think is an oxymoron. And yes, the company I work for makes a tool that does network observability, fine. But nevertheless, be that as it may, it's important to have tools that understand it because the whole concept of observability is that there's so much data going on that one human brain can't typically reason about it.

So having something that will surface those events and contextualize them is incredibly important. Any robust, modern monitoring and observability solution should be able to pull the metric ass-ton, that is a technical measurement by the way, metric ass-ton of data and elements together to help you connect those dots.

Understand when you're looking at things, what kind of information is available to you. I'm going to talk about the specific technologies. You need to know the difference between ping and SNMP. You know, what it's telling you, like ping is telling you something is up and SNMP is telling you that an interface is down and how come those two things can be happening at the same time.

You need to have an understanding of what the data is able to give you in terms of status or rate or what have you. That's tool based as far as specific information or things to understand. First of all, a skill I wish every developer had, I will say, for the love of pants, and I credit Chloe Condon with that phrase.

For the love of pants, understand how IP addresses work. I'm not saying you need to know how to subnet in your brain, but know why those numbers are those numbers. Know why you can ping 74.208.236.13, but you can't ping 192.168.1.1 Like, why is that a thing? So it's not a black box, as an example, if you think that creating a subnet in your cloud environment that goes from 192.168.1.1 all the way to .100 is good because 100 is a nice round number and it looks nice and it's easy to think about, you are the problem.

Paige Cruz: Look in the mirror.

Leon Adato: If you don't know why that would be seen as a problem and why you are the thing that makes network engineers drink, that's why you need to understand why IP addresses work.

Once you've wrapped your head around just IP addresses and how they work, a lot of other things fall into place. Routing and what routing is and what it does will make a lot more sense. Firewalls, and what that means is really just another kind of routing, you know, will make a lot more sense. The similarity between routing and VLANs and VPCs and all the rest, but they're different, but they're similar, will make a lot more sense.

You'll be able to use those things, which you as a developer are often asked to do when you're setting up your cloud environments. You know, just click, click, click, click, click, click, click. Yes, you just click your way into a hellscape that doesn't work particularly well.

Beyond that, you know, the technologies - understand ping and what it does and what it doesn't do. Understand traceroute and what it does and what it doesn't do, understand MTR, which is a different kind of traceroute, and why it's different than traceroute. Once you understand those fundamental foundational things, you will see it reflected in the tools.

You'll say, Oh, they're getting the data from blah, or blah and you'll know, along with that SNMP, Simple Network Monitoring Protocol.

Paige Cruz: Simple is in the name. I like that.

Leon Adato: Right? Understand what it is and what kind of information you get. Understand NetFlow, a thing that a lot of people ignore and is, It's deeply rich and has a lot of really important information for the developer as well as for the network engineer.

Finally, your telemetries. There's OpenTelemetry, which I think a lot of dev and DevOps folks are familiar with, but there's also streaming telemetry, which is more or less specifically Cisco's take on OpenTelemetry, but it is being adopted in the industry. It is a variation of OpenTelemetry, which means that if you know OTel, you'll have a leg up on all the network engineers that are still right now today trying to wrap their head around streaming telemetry. It is a very network centric set of data and, and information that would be, and again, you should know, what do I get out of this? What don't I get out of this? How does it overlap or not overlap or whatever? Once you have those things, then it becomes a lot easier to look at whichever tool you're using and say, ahh, I understand this is the stuff that matters. This is the stuff that doesn't matter.

To, to put it into a DevOps context every time DevOps people get a monitoring tool, the first thing they do is they set up a high CPU alert because it's easy. And I will tell you that CPU alerts, high CPU alerts are garbage! They mean nothing! High CPU was never a problem. It has never been a problem. And now there are things that include high CPU that indicate there's a problem, but high CPU by itself says nothing more than you have correctly sized the server.

Paige Cruz: The computer is working. he showed up to work today.

Leon Adato: It's working good. You know, or you're paying for stuff you ain't using. So the same thing, in the networking space, bandwidth or whatever, like throughput that may or may not tell you something. It's not necessarily simplistic, but as you learn what those technologies show you, it will become clearer where and how you can get to the real problems with your applications using network data.

Paige Cruz: I love it. Back to basics. We've got some foundational concepts.

[00:07:09] How has networking changed in the last 5-10 years?

Paige Cruz: Really, how has networking changed in the last five or ten years? I will tell you, I worked at a company that ran Kubernetes, ourselves across a bunch of cloud instances, and one of my peers in stand up said, Oh, I've switched us from Flannel to Calico. And I was like, why? Like, what, what could have radically changed? Doesn't all this stuff still talk to each other the same way? So I'm curious your take, having seen a lot of different hype cycles, and we're talking about how important the foundations and the basics are. What has really changed? Is networking today the same as it was?

Leon Adato: A lot of the things have stayed the same. I mean, electronic signals form bits, and bits are grouped into bytes, and bytes are formed into frames, which are structured into packets, and they move along copper at some point along the way, even if it's moving through the air and some of it, and that's still the same. The fundamental structure of how data gets from one place to another, the OSI model still is relevant.

At the same time, I think what's really changed in the last 10 or 15 years is the scope and the scale. That if you think about on premises model that was massively distributed like, retail -McDonald's or Target or whatever or just organizations that have a huge technical center of gravity where there's a lot of big data centers that do a lot of work, like maybe IBM . Even if you think of that in the on-prem bears no resemblance and it is a order of magnitude less complex than what we deal with in even moderately sized cloud based applications with microservices and ephemeral systems and containers and Kubernetes orchestration and serverless and all of that stuff is that there's just a very different...

Now the network is still happening. The thing that I want the audience to understand is that it's not either or, it's both and. That the thing I just described with the cloud and all that cloud infrastructure exists for many companies along with the on-prem that has always been there. Sure, there are a number of companies that are completely cloud native. That the only thing that isn't cloud is their employees set up in their home and the internet connection and there is no office and there is no whatever. That's it.

But the vast majority of companies across the world are both and. They have a cloud presence that they have to manage, and they have an on-prem, and they have colo, and they actually have multi cloud. They have multiple things in multiple clouds all over the place, and those multiple clouds are talking to multiple different colos and multiple different on prems. All simultaneously. Which makes it the worst of all possible worlds in terms of network engineering, Because things are just moving in every direction and I, you know, is it fast? Is it slow? Who knows? And how would I ? Like it becomes a very difficult thing. So just be aware that, yes, networking has changed in some ways, but mostly it has grown.

Paige Cruz: Ah, it's sprawled out.

Leon Adato: It's that treehouse that started off when the tree was small and as the tree grew we kept on adding love, layers to it, and levels. And things have expanded in really odd ways. Another way to say it, it's grown organically, and we all understand that when you want to grow something organically, the first thing you do is pile a bunch of poop on it!

[00:10:28] Has Kubernetes made network engineers' lives easier?

Paige Cruz: Well, I think that also serves as the answer, has Kubernetes made networking engineers lives easier?

Leon Adato: We were talking about this beforehand, and I said, short answer, no. Long answer, also, no. I'm not a Kubernetes expert in any way, shape, or form. The one thing I want to make sure, if people listening to this are Kubernetes experts or they work with Kubernetes. I want to point out that every single Kubernetes cluster has a router, a real bits and bytes, packets and frames router in it. It's called IP tables. But it is a router, and you are not serving yourself by pretending it doesn't.

You're not serving your application or your organization by not monitoring that router inside your Kubernetes cluster as a router. You are leaving a lot of data, which is money, on the table. By not including that monitoring piece in there. There's a lot of things you don't know if you don't monitor the actual network layer of your Kubernetes clusters. Both intra-cluster, inter-cluster, and inter-cloud-cluster, like all this stuff. There's a lot of stuff that you don't know is happening if you refuse to include that in your monitoring schema.

Paige Cruz: Wow. I think we've given folks quite the study list and hopefully sparked some intrigue and curiosity. Unveil the layers of complexity behind Kubernetes. Break down Kubernetes networking into the fundamental pieces that we've talked about earlier. If you've got an IP table, you've got IP addresses in there. What are they doing? How did they get there? These are questions we leave to you, the listener. Go forth, discover, and make friends with your network engineering teams. They are valuable and they can teach you a lot along the way - you're not alone.

This has been a lovely conversation. Are there any pearls of wisdom, you would like to leave with folks? You've left plenty already.

Leon Adato: A standard question is what should I be reading? What should I do? The first thing I'm going to call out is actually you. If you're not reading paigerduty.com, p-a-i-gerduty, please. That's important.

I'll also throw out there that I do have a podcast that went for a while and I have to revamp. It's called Technically Religious and there'll be links to the Spotify and YouTube playlists in the show notes or in the comments below. Hit like and subscribe! You know, that kind of thing.

Also I am a middle aged white dude so I have opinions and I like to share them because I can't help myself and you'll find that on adatosystems.com

Looking further afield, I think that there's some really good conversations to be had on Corey Quinn's Screaming into the Cloud.

Paige Cruz: Yes.

Leon Adato: His podcast covers a wide range of everything from security to cloud to networking to application development, and you can learn a lot, I learn a lot from that.

Also Rachel Foster's Imperfect Genius series is really great. It hits a lot of things that people don't often think to talk about.

Then I will just mention that there are some shameless Kentik promotional things. There's Telemetry Now, and there's What's New at Kentik, and there's Kentik Close-Up. There's some things you want to see more about really what's happening in the Kentik space. What we want to be is the Rosetta Stone for developers who want more network information and observability goodness.

Paige Cruz: Absolutely. What is the golden signals - the rate, error, duration? I'm like, okay, but what about the network? I guess the network rolls up into it, but if you don't have networking data in your service overview page, or the one dashboard you go look at, go figure out how to add that, what data you've got.

And that brings us to the end of a lovely discussion.

Thank you so very much to Leon for being the first ever Off-Call guest. You can find his social media handles, website, and recommended resources in the show notes. Thanks to you for listening. I hope you're ready to go forth, learn, and make a friend on your network operations team. And a final big thank you to our sponsor, Chronosphere. The only observability solution designed to give you complete control over cost and complexity. If you want to know how we helped Snap reduce on-call pages by 90%, head over to chronosphere.io to find out. Until next time, cheers!