Django Chat

URLs: PKs, Slugs, & UUIDs

Episode Summary

Designing proper URLs is a core architectural decision in any website. In this episode we discuss the pros and cons of common approaches, from PKs to Slugs, UUIDs, and beyond.

Episode Notes

SHAMELESS PLUGS

Episode Transcription

Will Vincent  0:06  

Hello, and welcome to another episode of Django Chat. This week we're going to talk about URLs, Id slugs, you IDs and explain how it all works. I'm Will Vincent and I'm joined as always by Carlton Gibson. Hi Carlton Halliwell. Alright, so let's get into URLs. So I'm asked a lot of questions around slugs in particular, which is sort of a Django specific newspaper term, but we'll get into why URLs are confusing. So basically, when you're building a web application, and especially if you're new to Django, you will have two major types of views, you're going to have a list view. So let's take a blog application. The ListView will just show all the blog posts or some set subset of the blog posts. And then you will have if you use a generic class based view in Django a detail view. So that's an individual page for just that post. And this is where the question comes up of, what do you put in that URL? I would say the first place to start you're making a no no

 

Carlton Gibson  1:00  

Thinking about? I was just thinking through, I was just listening to you and thinking it through and thinking, yeah, I mean, like the URLs like the, the interface, right. So it's you've got it in your address browser. And sometimes in browsers now they don't show the URL fully. I'm old school, I like to see I like to see it but a nicely designed URL scheme, it means that you've got se posts, and then a forward slash, and then an ID of some kind. So like, you know, GitHub has GitHub, the the organization or the user that the repo, the issue, and then issue number, and you can just adjust, you can get to a different issue number just by changing one little bit of the URL, and it's like each of these path components is meaningful, and

 

Will Vincent  1:40  

you can kind of drive Yeah, and you can share it and it helps with SEO versus a fully, you know, full JavaScript setup, which is what you're describing before, which is Google starting to be able to figure out how to search but it's,

 

Carlton Gibson  1:52  

you know, it's part of one of the URLs or like the one of the major components of what the web And what the internet is, is what the web is. And it's glorious. I love them. I'm just geeking out momentarily.

 

Will Vincent  2:07  

Yeah, well, I think and I think that's a, I think until, for me anyways, until I started building building websites. I never thought about URLs, I think your average, you know, consumer doesn't think twice about it. Right? I mean, now, they don't have to remember them. They just typed the name of the site into whatever search engine, but they do matter. They matter. For developers who know what they're doing who want to switch through the the architecture, and it matters for SEO, and it just matters, the cleanliness, structural thing, and one of these huge questions is, so how do you, especially when you have a detail page? So an example would be so GitHub example. So you'd have, you know, GitHub slash for me? It's Ws Vincent. That's a list of all my repos. And then what is it? What does it look like for that individual repo for Django x, which is my starter project, you could do that. You could have it just be Django x. So that's what GitHub does. That's basically like a slug. Switch. Putting the title into the URL and you would often if it's multiple names, you would have a dash not an underscore. That's a point of confusion. Why don't write school? Well isn't a dash. You can do both. But I, I think it's I usually see dashes is the preferred method. Do you have a really?

 

Carlton Gibson  3:17  

Yeah, I mean that she's probably a bit more readable, something.

 

Will Vincent  3:21  

There's something in the Django Doc's about it. I know this is like an ongoing debate among core people around like when to use underscores when to use dashes. And some examples, use one and not the other. I don't know if that's the only thing solved yet in

 

Carlton Gibson  3:31  

the Python identifier, a dash isn't valid. It has to be an underscore, because otherwise it wouldn't serve. It's like a variable name or module name or package names in Python code. It has an underscore. That's why Python,

 

Will Vincent  3:45  

but a URL, I would I think the default is a dash. So the names of your naming if you're naming the URL, you would use an underscore. If you're crafting the URL, maybe you'd have a dash in there normally slugs.

 

Carlton Gibson  4:00  

That Yeah, if you've got a slug, so what is a slug. So if you create a blog, a blog post model in Django, and you add a slug field, and you can link it to the title tag, and with the, in the admin, for instance, it will auto automatically populate the slug field if you tell it which field to base it on. So you create a blog post with a title migrate post, and then it will automatically put my dash great post dash post in the slug field. And that that slug field is is what will appear in the URL is what you later then use in the URL.

 

Will Vincent  4:33  

Right when there's a there's a specific slug fields in the Django models and there's also a there's a slug a phi function that will that will do that. Yeah,

 

Carlton Gibson  4:43  

it takes a text string, and it returns a slug like, string and you know, yeah,

 

Will Vincent  4:51  

yeah, so that's it. So I think we're slugs too. There's so there's two things about it. One is that you can do it in the admin. So again, if you take you know Django is built for a newspaper, someone going in and making it, it will automatically do it for them. Or you can have them, let them manually do it. Actually, if you're doing it automatically, then you need to get into the Save method right to override that to automatically do a slug, I believe. Yeah, I guess I mean, if the slug is right, then yeah, it needs to be set. If it's required, I mean, so if it's, so if it's, if you're going to have the Creator, do it in the admin, you can just have this slug field and they'll just type it. But you can also to your point, what's more common is based on the title or whatever, you can auto populate it. And you do that by overriding the Save method, which is a little black magic key, but you get used to, to that. That's a common thing you'll do in the real world. Django applications override the Save method, but it's, it's confusing, and it's confusing In any case, and then when you get to your URL, so now you're in your urls.py file, you now after 2.0, you can use path instead of using regular expressions in There, and you either will use the primary key or ID. So the Django models will. Django databases will, Django ORM will automatically add an ID an auto increment for you, under the hood. So your first blog post has a one, the second has a two, so on and so forth. You can use that in the as the ID for a detail verge generic class based view detail view. But confusingly, you can also call it a primary key pk. And I actually I had a nice description of this, but I forgot enough time I had the difference between it and a PK is a difference with an ID and a PK is you should probably use a PK a primary key because a primary key can refer to something other than an id like a UID, which we'll get to, whereas an ID is very specific. Also ID has a specific meaning and a lot in programming languages. So when you are using referencing the ID, you should probably call it a primary key.

 

Carlton Gibson  6:57  

Yeah, in Django, the PK shortcut is it We'll we'll pick out whatever field on the model happens to be the primary key. Now if you don't specify a primary key, it will be an auto increment integer field. Right?

 

Will Vincent  7:11  

Right. And we'll get into changing that primary. But

 

Carlton Gibson  7:14  

if you've, you might have used a text field for instance, if you're just writing a small Contacts database just for your friends and family where you know you you're not going to have conflicts, then there's no harm. We're just using the text field for name the coffee, the char field for name as the primary key because it's unique. It's just well and Colton, you know, Jessica, and

 

Will Vincent  7:35  

as long as Ranger that Heather calm that the challenge was right with your slugs, like, you know, hello worlds, you write to hello world post, or, you know, the reason to get back to GitHub. The reason GitHub partially solves this by it goes github.com slash user, your username, slash your repo name. And so even if I have a repo called hello world, and you have one called repo HelloWorld, they're at different URLs because our usernames are different, and you can't have them To repos with the same name, so that's a way that you can. So jet, it's very rare to have a slug that is just, you know, example comm slash slug, usually you want something prepended to not make it unique, but make it distinguishable. So you don't have these these conflicts. And if you do have these conflicts, you can do things like you can automatically add integers or strings onto the end. But that gets really nice. I mean, so you want to add, like

 

Carlton Gibson  8:23  

a unique for user, or unique for date or unique for some other field. Right, and you can add validators on forms or models to help you enforce these. So you know, Django ships with unique for date and unique together and, you know, other other constraints on the model.

 

Will Vincent  8:41  

Yeah, and it used to be you know, I don't know the, the SEO answer on this, it used to be common, you would have the date in your, for example, your blog, but that's not great for SEO in some ways. And if you update your blog post, then it gets out of date. So for me,

 

Carlton Gibson  8:55  

like I think if you using if you're creating an actual web blog and actual blog, that Like a historical record of what you've been doing that you put on the internet, then great have a date in the URL because on you know, this day, I wrote this post, that people came to realize that they wanted evergreen content, you know, specifically for marketing sites where they put out two or three really high quality posts, which are evergreen content, which they're going to help to drive search traffic to and all these things. You don't want the date in those posts, because the date is irrelevant. They're evergreen, right? Or if you write Django tutorials, like on my personal site, and you update them to the latest version of Django, you don't want you know, Django 2.2 on a post from 2016.

 

Will Vincent  9:37  

The sooner It

 

Carlton Gibson  9:38  

depends, is it web blog, or is it evergreen content, which the date isn't relevant?

 

Will Vincent  9:44  

Yeah. So that brings us so we talked about so the default is probably to use a primary key. I think, when I teach detail views to people, I usually start with a primary key and then I'll discuss slugs, slugs are a little bit more complicated has to do with the admin or override the Save method. But there is another option that is probably a better choice, which is a universally unique identifier, a u u ID, which Django added some really nice features around recently. So do you want to explain what a u u iD,

 

Carlton Gibson  10:13  

u u ID is what this long string, you think of it as long string that's universally unique, right? So it's part of the Wii U, U, U ID for is a particular algorithm for constructing you these unique identifiers. And part of it is like them based on the MAC address of the machine that it was generated on. Part of it is based on you know, the time pilot unit. And so the chances of a conflict between these are microscopically slim. And so in a way that what Id one or ID two might not be unique. The UID will be unique. And this is super good for if you're creating model instances in multiple places. So let's say you've got a Django app with a server and everybody's creating their using traditional web requests. And next Creating the instances on the server will primary key integer primary keys are no problem because the database will ensure that the next one created will get the next primary key and you don't have to worry about it. But then let's say you had a mobile client. And that has offline capabilities where people are able to create instances on the mobile client and then sync them to the server later on. All of a sudden, your garden you've got the danger of a the same UI or the same ID being created both on the mobile client and on the server at the same time by different requests. And so the way you get around that is to use you you IDs, because you know, no matter where it was created, there's not going to be a conflict. So later on when you go to upload the the the instance that was created on the client in an offline context to the server, you know, there won't be a conflict of the ID.

 

Will Vincent  11:48  

Yeah, and this is a bit similar to the mobile example when we talked about our authentication podcast why using tokens rather than session IDs. Again, this the sinker this sinking issue, crops up another issue. Why are you UID is a good idea is, if you have your ID hard coded in the URL, like, let's say, for example, I've got a list of, you know, clients, and each one has an ID and someone, you know, creates a new account and sees, oh, I'm client number 500. Now they know exactly how many clients you have. If you're at a banking site, all these issues, it's just too much information to display publicly. It's it really is a security concern, in most cases, to just put the literal database ID in the URL. I mean, for a blog, it doesn't matter. But if you're building a certainly anything, enterprise or anything, anything charging money, you know, just for a security standpoint, a UID is safer.

 

Carlton Gibson  12:43  

Yeah, I mean, you like, you don't want to people to just be able to guess the URLs and you you IDs aren't really guessable, right? So you can't you can,

 

Will Vincent  12:53  

yeah, they're

 

Carlton Gibson  12:54  

sorry. So if I give you a URL that's got ID 500 in it, and I wonder what I do. 500 monitors, look, does that come up? Yeah. Whereas if I gave you a UID, you could type in any random string. And it's not it's likely not to be an entry in the database.

 

Will Vincent  13:11  

Yeah. And then, and then there's a further level of you actually share this with me, Carlton? Yes, hash IDs. So there's a hash id.org site, and there's a Django hash ID field, third party package, but what is what is the hash? That's why I've seen that.

 

Carlton Gibson  13:26  

They enable you to have integer primary keys in the database, but then they create a nice short slug, which is, you know, half a dozen letters long. Which isn't guessable, isn't it? You can't go from one to two to three by just incrementing it because that the algorithm that generates the hash ID isn't guessable, but that they're much shorter and nicer than you UID shoe IDs are long and ugly and horrible, was six, seven characters. Brilliant. That's nice. So hash IDs are lovely. I like them. They score. They kind of solve the They're disposable in a way that primary keys you might not worry about. They're not use assault in them so that they're not predictable. But they're they still enable you to use integer primary keys under the hood.

 

Will Vincent  14:13  

Yeah, they're nice. And if you're, you know, so how do I change? How do I go through step through this process? This is actually a chapter in my book change for professionals, which should be out now, when this podcast is released, because it is I think, you know, it really is tricky to go from ID to slug to you UID level into hash ID. And once you've done it all, you can sort of make these trade offs and think about what to do. But I think a takeaway is you can do these things. If you're in doubt, use a UID or a hash ID. Well, so and it's not really that much more work if you do it. Yeah. So since switching over is a little bit of a pain,

 

Carlton Gibson  14:53  

so if you've got a model, which uses insure primary keys, and you want to use a hash ID in inside an API will the Django Hash IDs package has a rest framework serializer field which will serialize the integer primary key to a hash ID and vice versa. So that will that will handle exposing in your API. That's not that that's great. You can do this similar to put them into template context if you need to do that. If you need to migrate to a UID Well, first thing when you when you're coming up with your model, ask yourself this am I going to need to sync this if you are going to need to sync it but from a mobile client and using the user ID to begin with. If you're not going to learn how to stick with an ID, it's easier, it's simpler. If you do need to migrate, we'll probably add the add the UID field check everything's working when you've adjusted and then switch over the primary key in the in the field definition and remove the which will create and then create a migration which will remove the auto created Id feel for you. So the Django migrations Yeah, package will do that. But do it slowly add the user I'd feel first, adjust your API, make sure everything's working and then switch over because,

 

Will Vincent  16:06  

yeah, it's not a it's not a small undertaking, not to mention your existing API endpoints or existing pages. Try and think about it in advance if you can. It's slightly more complex. But I would say when in doubt, just default to a UID, or hash ID. And you'll future proof it a little bit like using a custom user model for most people. You can also do profiles, you can change it later downstream. But

 

Carlton Gibson  16:30  

yeah, and so I guess I think the general advice when you're designing your application, think about your URL structure, think what you want your URLs to look like spend a bit of time because cool, like cool, cool URLs, they don't change. They stay the same forever. And they're reliable, and they're addressable. And you can bookmark that and you can go back to it. So think about your URL structure and try and design your application nicely around your URLs.

 

Will Vincent  16:54  

Yeah, and I would, I would say that with a bit of experience, you know, after the model after the schema The URLs is the second most important thing I think about in terms of architecting. A project. Because Yeah, the pages themselves that can change. But really, it's, you know that that yeah, hopefully that doesn't change as much as even the, you know, the views for what's displayed in the page itself, that's more likely to change then your underlying URL structure. And

 

Carlton Gibson  17:20  

I think of URLs as well as like a power tool for power users of your site, like, so it's like the command line interface on it. That's true. In on your computer, if you fire up the terminal, and you can drive your computer from the command line, you can do things very quickly and very powerfully that you might be more long winded by the GUI. Now, the gooeys, obviously easier, and that's great. It's more accessible, and we love gooeys. But sometimes that power tool is exactly what you need. And if you've got a really nice URL structure, it just enables people who are really into your site and into your application to use it more efficiently. Agreed.

 

Will Vincent  17:53  

So this wasn't the longest episode, but I think it covers an important point and something that trips people up and Hash IDs in particular, if you're already familiar with you, you IDs hash IDs are really cool and worthless. It's a nice little library. It's a nice little tool, if you know this middle ground between the two. Yeah, exactly. So as always, you can reach us at the Django chat comm website. We're on twitter at chat Django. If you like this podcast, please also leave a review on whatever service you use. We've received some really nice reviews, but reviews help people find our work and keep us motivated to keep doing these. That's it. Anything else? Questions? Well, you know,

 

Carlton Gibson  18:29  

send in some things we'd have a we should have an episode on user questions.

 

Will Vincent  18:32  

Read listener question. Yes, actually, we should we we've been getting a number and we've done a couple that are full length episodes like the admin because we got asked a couple questions on that. But especially if there's, it can be a small question. That'd be fun to do a grab bag of, of user questions, so send those in. All right, we'll see everyone next time. Thanks for listening. Take care. Buh bye. Bye.