Mackenzie is a developer advocate at GitGuardian. We discuss handling security reports in Django, the pros and cons of environment variables, why ChatGPT is a security concern, and more.
Carlton Gibson 0:06
Welcome to another episode of Django chats podcast on the Django web framework Cultus Gibson joined deserve by Will Vincent. Hello Will.
Will Vincent 0:12
Hi, Carlton.
Carlton Gibson 0:13
Hello Will. Today we've got Mackenzie Jackson from GitGuardian. Thank you for coming on the show.
Mackenzie Jackson 0:19
Hey guys, great to be here. Thanks for having me.
Will Vincent 0:22
Welcome.
Mackenzie Jackson 0:26
I'll just say that I briefly forgot that this is an audio only podcast and I was waving at the camera like a lunatic. But I'm realizing now that I it's audio only so I'll stop making obscene gestures. No,
Carlton Gibson 0:39
no, it's good. I can wave back and then the audience I know why. So Mackenzie calm we always get the watch. Who are you tell us why you're on this podcast? How did you find how do you find Python? How did we meet? How was your story? Yeah, I
Mackenzie Jackson 0:53
got a interesting yeah, interesting, interesting backstory I, I was actually started off life and my first my first life as an architect, like a building architect. And I hated it. And I spent most of my job trying to automate, automate it by learning to code in some of these big systems that we have called BIM systems. And then I was kind of figuring out like, why don't I just skip this architecture thing and do what I want to do, which is write software. So then I kind of went down that path and had my own startup for a while called Ken Parga, which is, it's still around today, it's headquartered in Australia, I haven't been involved in I was in there for about four years as a CTO. And then I left, I guess, when the company got big enough that it needed a real CTO, and, and so and then, but the one thing that I loved about coding a little bit of a subsection is that we were a care provider, building technology in kind of healthcare space, which means we had to comply with a lot of different areas like HIPAA compliance and other things. And going down that I really learned how vulnerable software was to lots of different things and how to secure and I've got really into that. So when I left, I decided to kind of focus on security. And that's been pretty much my jam. I work now for a company called Get guardian, which is code security platform. And one of the coolest things about my job is I get to work with some research teams, we discover how attackers are kind of exploiting code, we create talks about that. And then we get to go to some cool conferences, which is where I meet cool people, like Carlton because we met at PyCon, in Italy, which was, I think, probably one of the I think we'll let some after party and you were the keynote, so I was trying to hassle you to use your keynote powers to get more drinks and the one drink limit that was if I remember correctly.
Carlton Gibson 3:02
Anyway, that's not suitable for the show. So let's, let's move on. So attacks is pointing. So I've, you know, as part of my fellow role, I joined the Django security team, and we get quite a lot of reports for you know, you know, every month we'll have reports go on. So you do a lot of that forensic work trying to find exploits, or you did you have work
Mackenzie Jackson 3:22
to do? Yeah, we kind of work with work with teams that we then get guy in, and also external that really look at some large scale research projects into how attackers are operating. So particularly around exploiting secrets, exploiting credentials, also exploiting, like misconfigurations. And one of the cool things that we really like to get into is when an attack happens to try and recreate that attack path, to try and figure out exactly, you know, not only like, how did they get them, but what tools that they use? Can we recreate the system and point out exactly what we meant vulnerable, because the nature of security vulnerability reporting, is that if an application has a security flaw that gets exploited, then the blog post that inevitably comes out from the company is extremely limited. And, you know, won't give too much information. So it's kind of taking that then trying to figure out what's behind it really?
Carlton Gibson 4:26
Well, I mean, so even with the issues that are on Django, we just kind of like, will put a post will say, look, there's a ders vulnerability or a mullet and it's vulnerable to maliciously crafted input, but we won't necessarily say exactly how you, you know, you do the attack, but for one reason, we don't sort of want to make it too easy for the people who are attacking unpatched changose So I guess it's difficult to know how much you should say because I kind of also in the same breath, I kind of thinking about an attack is not Gonna be put off by the lack of detail in the report? They're just gonna work it out? I don't know. Yeah,
Mackenzie Jackson 5:05
I think exactly. So the the, there's varying levels of thought behind us. And you have to be responsible in how you disclose, you know, information. And we certainly wouldn't just put out kind of like a cheat sheet of you know how to do it. But at least finding out, you know, at what point at what point where it was initial access made? Where did they where did they actually go? Was it a phishing campaign? How did the whole thing start and unravel? Now, when you're talking about CVS, or like, the vulnerabilities in dependencies and things like that, then well, it's a lot, you have to be a lot more responsible in how you disclose that, because as you would know, two years on, there's still people running vulnerable, vulnerable packages, and thanks. So you have to be a bit more careful. But in terms of recreating it, and not necessarily just just showing the logic behind which the attack is used, because I think a lot of people don't understand attackers logic, if that makes sense. One,
Carlton Gibson 6:10
perhaps one example you might be able to give his, we have various issues fixed over time about enumeration attacks, where, you know, people are able to make a request, and it's perfectly legitimate request, but it somehow reveals something about the the, the, the application or the ideas or something of that, and then they, they make the next one, and they make the next one. And by doing that kind of thing, they're able to kind of get a size of the application or predict ideas or predict URLs. And in its in and of itself, that's not really a vulnerability, but perhaps then they use that as part of a bigger attack. And, you know, I don't know,
Mackenzie Jackson 6:46
exactly, like exploiting logic flaws, and how to how to do how to do things. And you know, like a lot of, and a lot of, a lot of kinds of attacks happen because people use, you know, the wrong, the wrong function. So the wrong, you're generating a random number, there's multiple ways to generate a random number. Some are predictable using maps. But so if you're trying to use something to create access via these random numbers, then you you have to be sure to use the right thing. Now, that doesn't mean that the insecure version is pointless. It's it has its reason. It's just kind of the logic behind which you're implementing it is. It's kind of flawed, and people can often like figure it figure that out,
Carlton Gibson 7:30
I'd add one more question that came up from what you said, you know, see Will's got a question or something. But I kind of always say that you must stop date, you must be on a secure version, you mustn't use an end of life version of Jango. And I say that simply because I think kind of thing. As soon as these reports are out in the open within a short period of time, there's there's just bots, your kits you can download, which you can automatically run which test every available export node, and you're set, it's not a question of if your system is correct, it's just when it's correct. Is that fair? is a reasonable approach? Or am I being a bit over? cautious in your opinion?
Mackenzie Jackson 8:08
Well, I mean, like, there's, there's no, I don't think you're being overly cautious. I think it's like you absolutely have to patch regularly and definitely patch anything that has vulnerabilities against it against it, like it's absolutely vital to be able to do that. And what you I think what we typically find in companies that can't patch regularly is that they, if you, you should always have a system in place to to patch regularly, regardless whether or not there's some critical CVE. Because if you're in a habit of doing it regularly, it becomes easier when you need to do it. And it's critical to do it, right. And I think a lot of people kind of will patch when something's critical, but, you know, that can, you know, that can cause all kinds of havoc, and it becomes a scary thing to do. And I think people often shy away from it. But, you know, patching regularly is, is absolutely fundamental, especially when there's a vulnerability against it. The argument that kind of against patching regularly, it's a bit of a weak argument, but just to make it is that it takes about when a new version comes out. So often it can take about three months to figure out if it's vulnerable to anything, especially if it's like a big change. And so then, you know, the, the arguments is, do you patch immediately when a new version is out? Or do you kind of wait, and I think the solution is, you know, like, patch regularly, whenever there's a vulnerability known against it, and then stick to a regular patching routine for everything else, where it's consistent. And, you know, like, you're, you're doing it, but if there is a vulnerability, people will be able to find it, they build scripts to be able to, like exploit these automatically. You know, it's we're not talking about, we're not talking about the most sophisticated actors, if there's a CVE against it, like you've you've basically given the cheat codes to how to export your application. So you should definitely bet
Carlton Gibson 9:57
yeah, right. Okay. Okay. And I guess Just that point about new versions, I guess for Django, say, you know, 5.0 is about to come out. And you might wait for 5.0 point one or 5.0 point two, if you're particularly worried, you know about those first regressions. But meanwhile, 4.2 point, whatever is still being released with the security updates, so you should definitely be getting that each month. Yes,
Mackenzie Jackson 10:19
yes. Yeah. I mean, absolutely. If you're lucky enough to hopefully people like you here are using Django, you know, using well supported frame. Surely on this show, I tell
Will Vincent 10:27
you, on our podcast, yeah.
Mackenzie Jackson 10:31
Because I mean, well, there's a lot of frameworks that aren't, you know, that don't go through that. They don't have the security team. Going through it. So yeah, you already on the good step, if you're using Django
Carlton Gibson 10:45
will pull out quote on the website.
Will Vincent 10:48
Well, since since you mentioned that, I'll throw this question to you because Carlton i We're, we're not sure how to address it. So flask, which is another big Python web framework. Recently, there was some discussion, that community about something unrelated to security, but that's an example of something widely used, that, as far as I'm aware, does not have like a formal security team, for example. And so I'm curious if you if you saw any of that there was, there was a, an issue with like flask, flask login, which is a, essentially a third party package, where it's hasn't been maintained. And so there's a new version of flask, and all of a sudden login was breaking for everyone. And there were some comments about that, and people thinking that flask was this, you know, Microsoft or something and had all this support, when really, it's like one person and a handful of volunteers maintaining. So I guess, more like, I just wanted to address that, because it's been out there. But I'm curious, like where you sit? When you see web frameworks, you see that? Right? There's the whole gamut. There's Django and then there's many widely used frameworks that are, you know, may not have robust security things. I'm not saying, putting that on flask. But like, a lot of these projects may have lots of users. And it's a, you know, handful of people doing it. I
Mackenzie Jackson 12:02
mean, absolutely. And it's always just like, way up between using, like, what maybe cutting edge like, at one point Django was like cutting edge right out there. You were trendsetter, if you're using it, you know, and so a lot of security often doesn't come into the equation until later on to when I, when we first built the compiler, the startup that I was in, we had a dotnet back end and a React front end. Why? Because that was like, we had a dotnet guy and we had a React guy. Exactly. You know, that's what we, that's what we had, which is, you know, everyone's like, why aren't you using node and react because we didn't have a node guy we had we had. So when it comes to like, these types of frameworks and things like that, you know, like, if you have the foresight, and I think, you know, people that have been around for a while will be able to know, and they understand it, and you know, certain different frameworks will do different things, they may be more secure, they may be faster, they may be be able to handle large quantities of data better. So there's lots of considerations, and often security is a forethought. But you really, there's a couple of things to look for. When choosing a framework one, does it have a long history of being maintained? And I don't necessarily mean like years and years and years, I mean, you know, is constantly being maintained as your community around it. Because if if those are the cases, and you should, you know, you can you can start to check off and feel a bit a feel a bit better. Is there a security team for it? Well, that's not normal, you know, for everything, but that should definitely help you kind of make those decisions. So if you are in the fortunate position to be able to pick frameworks, then looking at the community looking at a team that maintains it, because you know, you will be surprised. You know, what happens is the example of the of UAE parser. And this is I apologize for everyone, this is a node package. There'll be there will be there'll be a Python equivalent, but you a parser was just a package that let you know what operating system your users were, you know, viewing your app on what it's something very simple. It was used by had 10 million weekly downloads. And it was maintained by one guy named for Shell. And his NP has noticed his MPN account got hacked, and then someone created a military diversion for it. You know, this is, you know, this is this one guy maintaining this. Thanks. You know, thanks. Thanks, Leslie. You know, and it would have passed everything, but you have to also under you know, consider the team behind these not just is it popular.
Carlton Gibson 14:40
Yeah. And thank God. So, thinking about that, I mean, pi pi have done an awful lot recently about tightening just within the Python ecosystem that tightening up, you know, you'd need to use two factor auth now of your particular project, and they've got this trusted publishers thing it would What's that? So in
Mackenzie Jackson 14:57
terms of pi it's actually was a hit MPN. And also get like GitHub is also forcing two factor authentication because this was pretty much one of the main ways attackers were kind of creating malicious applications is that what would typically happened is that people would specialize in kind of fishing these, you know, these maintainers, getting into their accounts. And if there's no two factor authentication, you know, like, a well structured phishing campaign isn't easy, but it's certainly a lot easier, there's no two FA to get into it. And this supply chain type of cat attacks can have massive implementation implementations, you know, because you're not just attacking this one system, you know, you're potentially creating a vulnerability on millions of applications. So one thing to remember about attackers is that they operate on economics, like a normal business does, where there's a risk reward like is the risk of me attacking this system, going to outweigh the reward that I, you know, that I could get potentially, and when you're talking about systems, you know, like pi pi packages that have been used by millions of different applications, then, you know, the risk reward is massive, potentially, because you could install, you know, even a crypto miner that gets released on to a million websites, you know, can create something so simple, can create, you know, pretty good profits. So it's, it's, it's really good that these, these package managers are actually implementing more security implementations around this because it like, two FA, may set not seem like a lot, but you will really strip back the amount of account takeovers from that it's
Carlton Gibson 16:43
just significantly harder because of that extra step, because we've got somehow get to trick you into entering that as well. At the same time.
Mackenzie Jackson 16:51
Yeah, yeah. And, and exactly I didn't like the step above that is the trusted publishers, I think, by pipeline, which, you know, like, they, which is another set of requirements that pi pi puts on these publishers to make sure that that doesn't happen. So, you know, making sure that there's not long live passwords, that's being accessed in there, so that when you look at that, it's just another tick box. And when you're choosing packages, you know, just you don't need to spend hours on it, it should be easy to look at something and say, Oh, this is secure. And that's why something like that trusted publishers, is really powerful, because you can quickly look at that and know that it meets the required, like the criteria of at least that minimum. So you can move forward as essentially quite powerful in terms of being able to quickly make decisions of what to introduce into your project. Okay, and is it going to be secure?
Carlton Gibson 17:48
Okay. I'm getting a little niggly inside, though, because there's some some of these metrics that you sometimes get on projects that they can be like your Did you check this box? Did you tick that box? And it's like, some of them are, are you using a particular feature on GitHub? And it's a bit like, Well, no, I'm not. We're not using that, particularly, let's say for instance, Django, Django has got its own release broker release process, it's got its own security process. It's got Security Archive, it does its way of handling CVS, it's all, you know, top quality, really, but it's not going through the GitHub security advisories panel, therefore, we don't get the tick in the box on that metric. And you're sorry, I sometimes get a little bit like, Oh, I hate these metrics. But
Mackenzie Jackson 18:31
I totally, I can totally understand that. But when you're the, the rebuttal to that I'd have to that is that I find that these metrics are more for smaller packages. Yeah, right. You know, use them with confidence, rather than, you know, something that's massive, you know, that's, that's trusted by people. If it doesn't have the tick box in, you know, like, you can, you can still get it passed. But in terms of like, if I'm just looking for, I just need a package that's gonna be able to do a small job, can I trust putting this into my production or not, if it has the tick box, then that's a good step that you know, but it's not the be all and end all and no security will, will be the be all end all. And I work for a vendor. And you know, we have people that come up to us and just be like, Okay, so here's my list of requirements that I need to take for for my sock certification or for this or for that, you know, like, Where does your product fit in? Oh, you don't fit into this tick box? Oh, I don't need it, then. It's kind of like so they're not worried at all about getting hacked. They're worried about like, compliance, and I've guessed that's important, but so I understand. Yeah. So how good is the Dan, where you're coming from? How good
Carlton Gibson 19:40
is the box on the questionnaire? Is it you know, because if the box on the questionnaire is right, it's no good at all. Okay. Yeah.
Mackenzie Jackson 19:47
And how do you make a questionnaire that fits everyone, right, that uses the same questionnaire box for for such wildly different applications. Well,
Will Vincent 19:58
the two factor authentication is interesting because this is a thing in Django with there is no built in two factor auth. There is a third party package that jazz band maintains. But maybe there's been a separate discussion around auth and Django. Recently Carl has been at you know, advocating for some changes, but you know, because there's a number of I'll just say, off the top, there's a number of things. You know, Django if you're gonna do it today, like there's first name, last name is the default. Well, that doesn't fit a lot of the world's population, for example. Also, it defaults to username, email, username, and password, most people want email. But maybe I'd look out and I just wind you up go,
Carlton Gibson 20:38
well, literally. So like Jack, the whole point about Django is a batteries included framework, right? It's meant to provide the batteries and but it's not any old battery, because like, for instance, it used to provide comments. But comments isn't something you can't build yourself or can't be maintained in the ecosystem. And it's, it's not it's not, it's a bit of a burden to maintain, because there's so many opinions about what it might have, and so many different ways it could go. So Django contract.com, comments was pulled out, and it's a third party package now. But AWS auth really is a battery that Django has to provide, because it's so central. And it's so hard. And if you get it wrong, the consequences are so bad that that's a battery Django should provide. And so yeah, we we've got good auth, we've got good central central, but we don't have this two factor bid yet. And for me, it's it's kind of like that's a missing battery, that would be really nice. If we could do something that's a one time passwords, or I don't know what the past keys are the new things didn't tell us about past keys and one time passwords, what are all these things, Mackenzie, because there's
Mackenzie Jackson 21:41
a there's a lot happening in the changing on authentication, because your authentication remains kind of like a big, a big, weak, weak leak, and especially our reliance on different things like API keys, and, and that are kind of just sprawling everywhere, because they're handled by so many different people. So some of the things that people are trying to do to essentially remove these points of view vulnerabilities is to create basically the same systems but only valid once and created for the purpose of that session. So you know, like, you have something like a dynamic API key that's managed by a truck, you know, like a vault or something where the API key is created, you then use it and then you then destroy it at the end. And it's like, only valid for one time
Carlton Gibson 22:35
and its lifetime is a matter of minutes, or at most. Yeah,
Mackenzie Jackson 22:38
seconds. Yeah, yeah, or, you know, whatever, however long that it takes, right, you know. And so these are kind of really, I think we can expect these to really start taking over along with kind of role based authentication that's being implemented in lots of different ways. Because one of the problems that we also have been facing is that when you're trying to manage past keys, and passwords and API keys, and all of that, you know, you can be tempted to create, you need to do multiple different jobs, if I create one admin key to be able to do all those different jobs, then I don't have to manage all these different keys, right, but then if that that key becomes so sensitive, so you know, having having role based authentication, where you restrict what, you know, to the absolute bare minimum, and your authentication is created, your for the purpose that you're trying to do with all the minimum permissions that you're trying to do. And with infrastructure, when we've kind of getting into infrastructure as code and on all of these different systems and secrets, faults, and we actually tie them all together. So that it works really, really nicely. And I think that we're not using these to to the full extent, but they're becoming more and more expected. And certain things that I mean, that we I guess using your analogy, I guess we can expect, you know, frameworks to start putting in these different batteries, you know, as we go, go down?
Carlton Gibson 24:08
Well, I think the thing that's one thing that's sort of, it's been discussed a few times, and it hasn't quite happened yet, but it's like Django hasn't really got a solution for secrets handling in place. So you start off you get a you get a settings file, and then the real secret is your, your database password and your your, your your this secret key which is used for signing and whatnot. So it's important that you, you don't commit those to get and we can talk about get guardian in a minute and, and whatnot. But the first part has always been stick nose in a setting in an environment variable. And kind of what would be what would be nice, I think for Django to have is a kind of a pluggable interface around that. So okay, if you're using Mbaise you get it from there, but then there's all these other mechanisms like, you know, volts and C Could managers and things that we, we could sort of swap out the back end? And you could be using those as well? I think it'd be nice to have something like that in the Django space.
Mackenzie Jackson 25:07
Yeah, for sure. I mean, it's secrets management is going to persist to be a problem that we go in that we have to kind of deal with. And I mean, people may have, it's, you know, it's funny, one of the, one of the common passwords that could Guardian detects when Slater is the Django is the Django secret keys. Because often people get excited, they created the first Django project they get at all and commit to to get and then all of a sudden, they have released this secret game Django key now, it's probably not that interesting. And to an attacker, at least, if it's your first project, and you're kind of having a play, you know, but systems don't know that. So they alert on it. But I kind of feel like that's a good process. Because if you leak something on GitHub, you're going to get an email about it. And then you kind of are forced to learn how to how to securely do it from the from the from the start. And, you know, when you're talking about environment variables, and dot EMV files, I mean, there's and vaults and secret managers, I mean, there's like, is huge arguments about like, what to use and when to use them. And I think I differ from most of the security community from what I think but
Carlton Gibson 26:22
because we this is one reason we get this is one reason we get stuck is that people say, Oh, we want this, but then there's a disagreement, but what about that, and we can't quite agree on what we should have. So we don't do anything. It's like, well, yeah,
Mackenzie Jackson 26:35
so. So a funny funny story about this is that at PyCon, Italy, where I met Colton, I had my talk. And, you know, my talk was on how to securely manage secrets in Python projects. And, you know, one of the I mentioned multiple ways to do it. But one of the ways and the way that I like particularly like is using environment variables, and dot env files, the talk before me, I don't know if this was like, organized by the talk before me, the entire talk was about why you shouldn't use environment variables for it. So and so I like environment variables. But I want to start off the bat and say that they're not the most secure thing to use them. So if you if you ask a security person, you say, how should I manage my secrets? Then the official answer is you should use something like a vault or a secret manager that's dedicated to that there's a server that's going to create dynamic secrets. So just in time, you can connect to that to, you know, to, to authenticate your developers so that they have access to secrets, or their apps have access to secrets that no developers, you know, and it becomes like this heavily complicated thing. And what will happen most of the time is that they will interact with that one. So go, this is such a pain in the ass to interact with the system. And then they create secrets dot txt on the desktop, and they store all their API keys there, because then they don't have to deal with this heavy system that some security person spent a whole year implementing that has 400 pages of Doc's to go through of how to correctly use it. And, and that creates another problem. And then that's why secrets end up in your history. Because you're being told, okay, I need you to create this feature. And you need to connect to some kind of data bucket to do it. So to start off with you just hard code the secrets because to do it properly, such a pain, yeah, that you will you know that it's the same, but don't worry, because by the time code review comes around, you'll have removed that not knowing that that secret is now in your Git history. But no one's seen it. So no one actually knows that it's there. And that creates like a big problem. And oh, this is getting very long. No, no, that's why
Carlton Gibson 28:40
gone. But that's brilliant, though, because you've got an intermediate commit that doesn't appear in the pull request for you. But it's still got the secret in it. But you just exactly.
Mackenzie Jackson 28:49
Yeah, yeah, exactly. Because you know, like because, you know, like you because you just you you're under time pressure, you're trying to use it quickly. And people don't understand that. And that's why an attacker if they make it into your Git repository, and they'll fish, there's lots of ways for them to do this. They'll scan your history now the top layer of gets probably not gonna have any secrets in it. But by what I mean by a top layer is it's kind of like what's on the main branch, you know, what's in the latest commits on on everything. But when you go deeper, you're going to find all the secrets that have been added and removed from people, because dealing with these heavy systems is is a nightmare. Now, does that mean you shouldn't use the heavy systems? I really think it. It, you know, like it's going to depend like everything. But why I like environment variables is because for most people, that's an adequate solution. And it's easy enough to secure. You create a dot env file in your repository, you create a dot Git ignore file. And that's going to solve a lot of your problems. The argument against that is that there are ways to dump out your environment. If you're kind of if your infrastructure if you're server or, you know, your other your has been your operating systems have been compromised, then if the first thing any attacker is going to do is type in EMV, and dump out the environment variables from a running application and see what's I must say that that is 100% going to be the first thing that they're going to do. And so the argument against it is that if you're using environment variables, you've just created a nice package for every, you know, for the attackers, but my argument is, if an attack is made it that far, like, let's face it, like you're not in a good position, anyway, like, I can I have access to your RAM, I can find secrets and other ways. Maybe it's not wrapped up. But let's not pretend that like, the E and V. file was the problem here. Like, you've got bigger problems that you need to deal with. Yeah,
Carlton Gibson 30:45
by the time they're on your server, you're in trouble. Yeah. So
Mackenzie Jackson 30:49
like, my kind of thinking behind this is, look, it is great to it is great to have these heavy systems in the air, and I think they have their place. But you have to understand like, are you mature enough to effectively use them as does the team have enough training, you know, around around that, because when you get to a large enough organization, you could segment people out so that, you know, small, small number of people have access to these machines, they know how to use it the systems, then that's fantastic. If you're a startup with 10 people, you know, environment, variable files are great, it will put them in local memory, and at least you're going to, you know, prevent them being exposed in other ways, like on Git. So you know, that's, that's my rant over because it's, yeah, yeah, there's, there's lots of things. But I, my personal opinion, is that it's not the most secure way in the world to do it. But it's so much better and easier. And it reduces the friction, which is part of the problem of security.
Will Vincent 31:53
I was I wanted to mention, like, I put my old man hat on, when I worked at startups in San Francisco, GitHub was like down the street. And back in the day, you could just search for anything. So you could search you know, we as a like game, we'd like get AWS keys, you can get stripe keys, because they are brand new, you know, so wild, wild west, like you just take searches powerful, boom, there it is. And like you could literally see it for so many, like every company, because there was no you know, there was no automated, there was no hiding it. There was no email there was none of this stuff was just like, yep, search search at all. Yeah, it was sort of a fun game, we would do
Mackenzie Jackson 32:29
it still basically like that. If you go Yeah, it's a little better if anyone's not. If anyone's listening. If you go to api.github.com forward slash events, then what that will take you to is the GitHub events API, there's everything happening in real time, you will get rate limited, but you can create lots of tokens to the front that you don't even need authentication for the first time that you can do it in incognito. So come up, in that you have all the commits that are happening, but you also have the email addresses from people there locally configured, get email address. So if you're interested in targeting a specific company, you can just filter out of domains for people that are committing with a, I don't know, pick a company at@twilio.com email address to you know, and find their personal GitHub IDs and start scanning all their stuff. Like it's still the search feature is a little bit harder, like dockings got harder for sure. But in terms of like, you know, finding keys, we found 10 million secrets on GitHub in public repositories last year, like to and 2 million of them were for a for a cloud provider keys. So like it's a, which were all valid, because we validate them, those particular keys we validate. So like, it's like, it's bloody wild, what you will find publicly but I will say GitHub, at the implementing other things that make it better, along with like some companies. Okay,
Carlton Gibson 33:57
so you've said we then you've mentioned get going to get gone? Can you tell us what is? You know, can I be using this if I'm a Django developer, is this helpful to me?
Mackenzie Jackson 34:08
Of course, of course, you're not using accounting. Oh, man,
Carlton Gibson 34:11
I never I never commit secrets.
Mackenzie Jackson 34:15
I've heard that. I've been working for kickout ID for for four years. Yeah, four years now. And in that time, I've committed secret by mistake. And it's my whole job to come on podcast and talk about why you should not do that. Anyway. Enough about my work. So get guardian. So we're a code security company. And we our platform was founded on detecting secrets inside repository. So we talked about, you know, Git history, things like that. So the cool get Guardium product is that we connect into your repositories and we will search all through the history and bring out any secrets and we can, if you're in a large company, the value of it comes in that We will prioritize them, we will validate them. And we will help you remediate them. So in a large company that sort of helps, but for individual developers, you know, all of our systems are free. We're the number one security app on GitHub. So I think we're about 400,000 users on GitHub, on the GitHub marketplace at the moment, it's just to make sure that you don't have secrets inside your repositories at any any point. And then we also have cool tools, we have a CLI tool called GG shield, which will help you do things like install a pre commit hook to that will sit between your local repository will pre commits, it's just in your local repository, it just kind of blocks any commits going through getting staged that have a secret in there. Because once it once a secret enters your repository, if you're in a team, it's going to be cloned into different areas, it's going to be backed up by different systems probably will end up inside like JIRA tickets or like disapproval everywhere. So once it's the repository, you have to revoke it. So only way forward. You know, by doing things like with Gd shields, CLI tool, you can block them, block them coming in. So it's really cool. But we've expanded beyond secrets. Now we also do you know, infrastructure is code scanning, some software composition analysis to find out if your dependencies are vulnerable. And the coolest thing that I think is we create called Honey tokens, which are fake credentials. So like, the main one is AWS credential that you can purposely leave in places and if someone tries to use it, it's like an early warning system that systems will be detected. And why this is so cool, is honey pots aren't new honey pots have been around for a while. But why honey tokens are cool is that not only can you put honey tokens kind of everywhere in your internal infrastructure, you can actually put them inside third party tools. So like circle CI had a big breach, the start of this year, and encrypted secrets were discovered. So if you put a honey token inside your circle CI environment, then you can actually know if that system has been compromised, or your other secrets in the a compromise. So that's the only type of honeypot that you can put in different systems. So there we go, that's a bit of a longer blog than I intended to go out for but I just put a
Will Vincent 37:15
plug for there's there's a giant Jenga package Django honeypot for your, your admin, so going to slash admin as the default. And so that's a pretty, like, good place to go look for stuff. And so you can set it up. And like log who's trying to get to your admin even though your admin somewhere else.
Mackenzie Jackson 37:30
Yeah, yeah, it's really cool. And what's a fun thing to do is create honey token, leak it on public GitHub, and then watch what happens in a matter of like, minutes, people will try and exploit it, but then it will also typically get sold, like as part of a package, like on the dark web months later. So you'll you'll start off by getting these random calls, like low level calls, and then it will kind of get sold, and then you'll start seeing like different types of activity, it's really fascinating to be able to track what actually happens when a credential gets leaked, you know, like, and how quickly it happens, and how it moves through these different levels of attackers, because someone for 20 bucks will purchase 1000 valid credentials, and then just spam them and see what they can do. And you know, and if they if three of them, allow them to do some mining or install some key loggers or something, then Happy Days,
Will Vincent 38:22
can I ask are? What are? My sense is that like, if if you want to do this kind of stuff openly? There's like a handful of countries you can do it in? Are? Is that where a lot of like these bad actors are? Or is it people in the United States who just covered their tracks a bit more? Like what is what does actual landscape look like of these, you know, business businessmen and women out there who are doing hacking?
Mackenzie Jackson 38:45
It's so hard to definitively say, like, where these groups come from. So like, we have a honey token that you can get the IP address, and you can see where the calls are being made from. But I mean, if that's, you know, if they're not, they're not using some third party service to mask that, then, you know, that's pretty surprising. But, you know, it really is everywhere you have, you have countries that are a lot more forward in sponsoring bad actors. So you have, you know, there's Russia, you know, there's North Korea, right, like North Korea, like the main, you know, the Lazarus group and North Korea, they're extremely notorious, but then you also have a bunch of teenagers like lapses, which are bait works were based in the UK that we're, you know, like that we're really out there for just clout not really doing too much damage, but just kind of reputational damage. So there's like the I mean, there's people kind of everywhere that interested in it. And I don't think it has any, any, any kind of real boundaries, and I don't think any countries apart from maybe North Korea, kind of like green lighting, green lighting this but there's certainly people Hold where it's more beneficial to, to, to do it. I will, you know, some different countries that don't have expert expeditions positions. Yeah. Yeah, do us right.
Will Vincent 40:11
Is there when people like, I almost wonder if they need like a retainer for people who leave guardian or any like, well, well established security group like, I mean, I guess, you know, you're not really doing it for the money, its reputation, but, you know, that would be the ultimate thing is to, nobody knows how to it's like, you know, how do you how do you steal money from a bank? It's like by a bank. It's like, yeah,
Mackenzie Jackson 40:35
it's. So I mean, like guardians really well set up and that they're, like, we don't have access to other people's secrets, like we can't get access to, like, as an employee, but we do know a lot about it. But I honestly feel like, people don't talk about it. But those people know, like, those that know really know, like, I was, I'd been talking to people at the moment, I got really into reverse, reverse, reverse compiling mobile applications, and then looking for secrets inside of them. And it was wild. What was happening inside there. But then, you know, like, you go talk about it to someone that knows, you know, someone found this really cool research, and I found all these keys. And they were just like, oh, yeah, we know already. But like, no one's kind of like talking about it. But those that know kind of already. Or, you know, like already know it, it is like, yes, like a secret little club out there. When you when you know, you know, Carlton,
Will Vincent 41:29
I want one more question. So we'll put a link to this. You wrote an article about, you know, why Chachi Beatty is a security threat, essentially. And that just like you win the like SEO lottery, right, because that's like every I wonder if you could expound expound upon that, right? Because that's, you know, something that is big everywhere, like I've been using every every day now. But it's like, oh, the, you know, something else is gonna come and get me so far. Yeah. So for individuals or corporations, what's sort of the the main threat that you outlined in that article?
Mackenzie Jackson 42:02
Yeah, this I mean, there's, there's a number of different ways that and it's kind of evolving. And it's funny, you mentioned like the SEO when, since that article came out, I've been published it so many, like the Financial Times did an interview with me as if I'm some kind of like aI expert. Like,
Will Vincent 42:18
I mean, you check all the boxes, or boxes here. We're looking for security. He's, he's that good guardian, he speaks English. So
Carlton Gibson 42:26
I've just got this vision of you and Simon Willison on a panel of expert panel at him. Yeah. But so the
Mackenzie Jackson 42:35
things like chat DBT and LLM large language models, like have have really changed the game for a lot of different people. So number one, do look at it from the company perspective, a lot of companies have now banned this and one notable one is Samsung. So you kind of look at okay, why did Samsung bought ban their, their employees using it is because this is now another system that, you know, sensitive information has a way of leaking into. So get using the example of GitHub because everyone knows that if you're with an organization, and they don't use GitHub, the chances are their employees still will, which mean that their organization still has some kind of risk associated to it. Chat, GBT is kind of similar. So you get it to summarize some legal documents, you get it to sift through a whole bunch of data or create some kind of connection using these, these credentials, that data is being stored somewhere, right. And because of that, that means that check that your sensitive data is now in check GPT server, which is not a you know, a super secure platform. So now it's a target for other people. So big companies have kind of have started to ban it. I think that's the wrong approach, because I think that your employees will use it anyway. Because it's such a production boost. You know, and I feel like there's a fear that if you're not using this, you're going to miss out the other area of check GPT security risk or other MLMs is that when you when you you trust AI systems a whole lot more than what you should. So where does Chet GPT get its data from? You know, so you ask it, hey, can you create a coding instruction that will do X, and it will spit back, and I'll do it in record time, and even like, explain how it works, so it feels like, super great. Chet DBT uses the largest data set around, which is the common crawl data set. So all of these systems use most of the code in their rubbish. Yeah, so that's kind of the point I'm getting it to, like, look at a random open source repository, is that's what it's learning against. Now, these systems can't really distinguish between good code and bad code, at least not in the vacuum of which you've asked them to do a query, it's found the most common result a greater example, is a different system copilot, GitHub copilot. Whereas at least this was true a year ago when I was really experimenting with it. If we used the author, if we put in as an author at the top author as a really respected, well known open source maintainer, someone that's very prolific, and asked it the same prompt, as if I used my name as the author would get different responses. You know, like, because, like, it's kind of looking at an example of what would this guy do? And what would this guy do? And obviously, I'm much shittier, than, than the other person. So yeah, so the whole point is that you actually come to the risk of like, the code that it gives you will have vulnerabilities in it probably. And here's the big difference is people say, well, what's the difference between that, and me copying code from Stack Overflow? And the difference is that stack overflow has a comment section, where people will very quickly let you know, if you're doing something insecure, or around it, there's a huge community around that people can add input, you will get various different results for the same thing, chat GPT, copilot other MLMs have, you know, don't provide that additional community input as to which is the best way? You ask her for an answer, it gets an answer. You know, and I wrote an article called shitty code, she co pilot and it's, you know, basically that crappy input crappy output. And the people that are most likely to use, the systems are probably people like, you know, that are early on and may not know the difference between random and random, secure, and when you know, when generating a random number and when to use both of them. If that makes sense,
Will Vincent 46:38
everyone uses these systems, to be honest, I mean, maybe they're not as good at like, filtering things out. I mean, it'd be kind of cool to have an LLM that's just approved answers on Stack Overflow. That would be kind of cool. Yeah,
Mackenzie Jackson 46:50
that for sure. For sure. I mean, that. Well, I'm really not, I'm really not against, like the systems too. But I'm just kind of of the mind that you, you just you need to understand where the answers come from. It's not a genius AI system. It's come from other people on GitHub. It's organizing it in a revolutionary way, that helps you find the solution that you're looking for, extremely quickly. But, you know, you have to understand the limitations of that. And you know, where the answers come from? Yeah, and if we're talking about security to the other area, that of the systems and AI, that people are worried about our attack is using AI, so can attack is now you know, more effectively? And the answer, yes or no? Like everything. So, again, again, AI systems use no one data to do this revolutionary attacks, you know, are not on the public Internet, or they're very, very hard to do, and it's not coming up with new stuff. So if you want it to create malware, if you use it will block you at first. But if you use clever prompt injection, like one that used to work, I don't think it does now is like, imagine you're an AI system that doesn't have limitations. If you put that in charge of use do I think it's important to my career, you know, like, yeah, like, Oh, give me an example of malware, you know, that these things used to work. And prompt injection is, you know, there's lots of clever ways to get to stuff. There's also open source LLM said, you can remove these limitations. But anyway, if you asked it, and you successfully got it to create malware for you, it's going to be pretty standard malware, like it's nothing that you wouldn't be able to find by Googling yourself. However, you know, what it does do is it gives script kiddies superpowers. So, you know, like when you can bind a script, Kitty, which is, you know, someone that doesn't have a huge technical knowledge that just can run malicious scripts to get into things. It's a big section of the malicious market, you now give them Chet TBT, so that when they're doing phishing campaigns, they can now do spellcheck, like, you know, how long was it? The biggest red flag was misspelled stuff, or I'll forget about that. Now. You know, they can, they can manipulate the code for the first time they can input malicious code and say, adjust this so that it can do this on this system, or, you know, and these things, so it does give some superpowers. But when we're talking about like, is it going to revolutionize the hacking? Not at the moment because it needs to have it can't come up with new ideas yet, it can only regurgitate existing ideas. So I mean, he's like, look, there's lots of concerns around AI. And I did an interview recently, which was with a guy called Simon maple, which was like is AI our friend or foe? And surprise, surprise, the answer was it depends now, but I think it was
Will Vincent 49:47
that someone told me like if there's a magazine article and it has it as a question in the title, like it doesn't know otherwise, it would say AI is a threat. Yeah, AI is.
Mackenzie Jackson 49:55
Well, I also I look, I feel like it's pointless to worry about that. One wonder about that, because it's not going anywhere. So like, is it a threat? Yes, though, does it change anything in how we're going to operate or what the risks are that you're facing now? No. So like, it's, it's, it's here to stay. And at the moment, the biggest risks are not externalizing. It's not an attacker using it. The biggest risks is internal your employees using it, but not understanding. And if you block it, if there's someone here from a large organization that's thinking, Okay, I'm just going to block this system, then all that is going to do is send people to the background, they're still going to use it, but they're just not going to tell you that they're using it
Carlton Gibson 50:40
Chico node audited then that case? Yeah, exactly.
Will Vincent 50:44
Right. There is I'm hearing I mean, as a security expert, let's talk about LLM a little bit more. I was curious your take on there is this issue of like, or Boris of like, now that we have, you know, so much of you know, so it's like Google's the problem with Google, right? Google searches the internet, but then people like want to feature on Google. So then you write articles for Google. It's like, why YouTube videos are all like 10 minutes in one second, right? Like, so it becomes like just gets crappier and crappier and crappier it's like self building up behind, like a damn. And that's I've seen, and I can believe this based on just typing in that, like, they think the majority of Article articles are soon going to be aI generated, because it's like, perfect at, you know, these these topics. And so over, there's an argument that over time, like, this is as good as it's gonna get, it's just gonna get more and more crappy as AI consumes AI. And then, like, you know, and of course, it's all copyright laundering, which we don't need to get into. But I curious if you have a quick take on that, or does that seem right to you? Because to me, I don't really see a way out of that, you know, both in terms of, I guess, code, you can run and you can test it, but like, things you search for, I mean, I use chat UBT as Google now, because I used to, I've been using DuckDuckGo for years to also just because Google's doing its inevitable thing. But I kind of wonder like, maybe, maybe I'll just be stuck on like, you know, September 2021, or whatever is the date, because it's just gonna get worse. I
Mackenzie Jackson 52:09
have an interesting take on this. So I don't know this is going to be like the popular take, but I have an interesting take. I think it's going to have the opposite effect. So here's why is that we came from it like 30 years ago, the main source of information was all books that came from trusted kind of sources. And you could say whether or not you think that's bad or good, you this way you agree, very trusted, right? Yeah. And it's hard to it's hard to, it's hard to publish it. It's still hard. It's hard. Yeah. And it's and it's like, it's so easy to publish junk now. And now tat GPT has made it so easy that I feel like it's going to have the effect that the that there's going to be so much shitty information that everything's going to get so much shittier that we're all going to go back to trusted sources, instead of instead of kind of going on. So like, are we going to be like googling answers, or we go back to the trusted peer reviewed research or articles from the Orion jango.com? Yeah, yeah, yeah. django.com or, you know, they all the, I feel like, I feel like it's actually going to kind of make it bake the people that had some kind of platform, but no brains, like it's gonna, they're gonna become so noisy, that you're just going to ignore them. And then and then what will become kind of the new system will be that we in you will need to take a step back and go back to trusted sources. And that's kind of well, that's what I'm hoping that's going I
Will Vincent 53:31
like, I like that. Well, you know, I have I've my robots dot txt is updated for all the crawlers. So I'm sure they're gonna abide by it. And so I'm not worried about my stuff being pilfered.
Mackenzie Jackson 53:44
Yet, of course, yeah.
Will Vincent 53:45
I joke to people, I've got an update. There's, like 10 on there.
Mackenzie Jackson 53:48
That reminds me of that, I always ask people kind of like, what's the worst security advice you hear? And there was a guy that was that was providing, providing feedback on a on a penetration test. And he explained how this person's organization their infrastructure was vulnerable. And the person's response to that would be yes, but that's illegal. So no one would do that.
We don't people don't abide by laws, everything would just be lawless. And it's like, well, welcome to the internet.
Carlton Gibson 54:25
Okay, good. So we're coming up on time a bit. Mackenzie, I just wanted to ask you one more question. So if you if we would say, first thing, move all your secrets into embers and then use a secret Stan or something like get guardian and make sure you that you catch anything you do commit? Is there a third kind of obvious thing that we should be doing that we you would say?
Mackenzie Jackson 54:47
Yeah, definitely. So we need to add additional layers of security upon things. So you know, there's a concept called zero trust, which is talking about, you know, just because you have an API IQ you have a password doesn't mean that you need to trust explicitly that person. So there's something that you know, something that you have and something that you are. And we should be implementing those rules across security as well. So some some simple things that you could do you know, if you're listening to this go like, okay, that all sounds nice, but what can I actually do? So, come up with ways to rotate your keys regularly. The reality is, no matter what you do, your keys will leak, the best companies in the world, their keys will leak hashey Corp, they created a product called vault is one of the best secrets managers, you know, they had a breach because they had secrets leak into the Git repository. So it happens to everyone. So what you can do is you can implement a rotation policy that rotates keys regularly. So that's one thing so that that doesn't happen. The advantage of that is that you actually know where your keys are, or you know how to rotate them. Because like, then if you have a leak, it's kind of like, Oh, what the heck does this key do? Am I going to break production, if I revoke this, like, if you're regularly rotating things, and you know what they are, the other thing is like Stop, stop producing admin keys, like if you need a credential, DON'T GIVE IT admin credential I had so many stories of when I read only credential would have been totally sufficient. But, you know, that requires creating another key. But if you've got a secret manager, then that then the problem goes away. So you know, make sure that keys has limitations of what they can actually do to suit, you know, to suit the job. And in whitelist, your services, if you have a system that's only meant to talk to this other system, right, then you can set it up so that they can only talk now if I find the credential between them that allows me to talk to it, then I'm still coming from coming from a different area or David different IP. I'm from different systems. So these are things that you can kind of actively do to add layers on top because you just got to remember that there's no bullet you know, everything about security, the famous words of this podcast, it depends. But you can add layers in that create friction. And that's ultimately kind of like the best way to handle this is to create enough friction. You know, that, that we're adding so many different layers to to an attacker? Yeah,
Will Vincent 57:12
they're gonna attack someone who's easier. Yeah, that's like that's like one of those. Yeah, startup sayings, like there's no silver bullet. There's just a lot of lead bullets. Yeah,
Mackenzie Jackson 57:20
I live in the Netherlands. And like, there's a saying here, like bikes. Everyone has like, at least two bikes each, especially in my town. And one of the things is like no one has any nice bikes, because you always want your bikes shittier next year, because then the person's going to because we're terrible at
Will Vincent 57:37
locking our bikes. Like don't outrun the bear just be next to someone. Yeah, slow
Mackenzie Jackson 57:40
it so that's like Yes.
Will Vincent 57:43
Story in in like Amsterdam, after, like the Nazis, like fled that there was some like, a quarter of the bikes or like in the cat, or the cat, the waterways, or there was some stat I didn't bike tour, and he was giving these crazy numbers speaking to the fact that like, you know, I guess for a century now, everyone bikes everywhere, but everyone, like stuff gets stolen. And there was some crazy statistics about when the Nazis left and the number of bikes that were trashed or something
Mackenzie Jackson 58:09
there, I think this I don't know if that I think that still happened. They've been out again, you'll see the canals of crane will come into the canal, this is big claws. And they'll just scoop up all the bikes that are in there, because it just happened. But you know, bikes, bikes are pretty easy to come by. And it's an unwritten rule that if you know, if it's past 2am, and your bikes been stolen, it's socially acceptable to steal someone else's bike to get home. From the cloud. Right.
Will Vincent 58:42
That's good. That's good, too. Good to remember.
Carlton Gibson 58:43
Well, I think we should wrap up there. That's perfect. I don't know how we got into this stuff. Well, it was it was the backwards analogy from making sure your site's slightly more secure than other sites. You know, if you've got your bike slightly worse than the neighbor's bike, it's the opposite the mirror. Great weekend. So anything else that you'd like to call out? I've got my end that we haven't mentioned that. You mentioned your podcast. You mentioned get Guardian compound you've mentioned we'll put all of those in the show notes. Anything else that you think Oh, yeah. No, I
Mackenzie Jackson 59:16
mean, I feel like I feel like that's, I mean, that's kind of it. I think that's enough enough. Enough plugs for for one episode, but anyone that wants to follow me, you can follow me anywhere on social media at the handle at Advocate Mac. I'm even on threads I've never posted but you know, I could just do what it does explode. And then I might do so make sure you follow me. But yeah, and then that's it. I mean, if you want to take everyone to listen to me geek out about security stuff. That's where that's where you can find me.
Carlton Gibson 59:43
Okay, brilliant. Well, thanks for coming on. Really good. Really enjoyed it.
Mackenzie Jackson 59:46
I appreciate the invite. Thanks for great to great. Great to meet you willing. Great to see you again. Kelton.
Will Vincent 59:51
Alright, well, thanks, everyone. We are at chat django.com And we'll see you next time. Bye. Bye. See you next time. Bye bye.