Django Chat

Search

Episode Summary

Despite being a "batteries included" framework, Django provides no built-in support for search. And yet almost every website needs it! We discuss how to add search to any Django site via filters and Q objects. Then move on to more advanced options including full text search with PostgreSQL, Elastic, and more.

Episode Notes

Episode Transcription

Carlton Gibson  0:05  

Hi, and welcome to another episode of Django Chat. I'm Carlton Gibson, I'm joined as ever by Will Vincent. Hi, Well, hi, Carlton. This week we're going to talk about search, which is something you might want in your Django application.

 

Will Vincent  0:16  

Yes, it is something you might want your Django application, it is not a built in feature. So that is a conundrum that there's two stages, I would say to the journey of search in Django. If you're like me, my first major Django project, I stumbled through Django and then a certain point is a school rating search. So I had 120,000 K through 12 schools in the US and at some point, I wanted a search bar. And I was like, Okay, well, Jingo, help me out here. Now. Okay, how do we implement basic search? It's not that it's hard. It's that it requires a pretty deep understanding of Django which we'll get into you have to understand forms and query sets and passing things around. And then once you get past that hurdle, the second phase is search is actually really really, really hard. And everyone expects it to be really good because they use Google or DuckDuckGo all day long. So Django happens to have now built in support if you use Postgres for full text search, which we'll get into. So we'll get into all that, starting with the very basics, and then advanced options, but basically search sounds easy until you've tried it, and then you go,

 

Carlton Gibson  1:19  

hold the fun. Right? So what's the first step lady, you know, you've you've created a form and your HTML, so you've got search and a button that says search next to it? And you know, maybe the magnifying glass? Well, yes.

 

Will Vincent  1:31  

So what would I say the first thing is you need some data. So you need to start at first principles, you need to spin up your project, make an app, then you need some data pipeline in the admin or a script to load it in. So get your models. And then I would say the way I like to teach this is, if you think think about what it is, it's a form and it's a view of some kind. Basically, it's kind of a list view. So if you start by saying, Okay, I'm just going to list for you. Let's say there's 1010 rows or 10 rows of my database. list all 10. Okay, now I want to filter them because basically searches a filter, you're saying of all the things in my database, how do I filter them based on the user query. So we'll get to the user query, but first with filters, and actually, so I would say, you know, basic ways use a list view. And then you can use the filter method you can use contains I contains, which is case insensitive, all get excluded. And you can do some filtering on your list. And actually, you maintain the Django filter package, right currently. Yeah, so that applies.

 

Carlton Gibson  2:30  

Yeah. Okay. So Django filters great for this exact use case, you've got a form, which then submits something probably via GET requests. So query Pro, we'll get to that in the URL, right? And you want to first of all, validate that those are saved. And then you want to take a query set and you want to filter it by whatever fields you allow to, you're allowed to filter. So let's say you've got an address book with some some names, just one single name like So Adam Johnson. Will Vincent calm Gibson, Tom Christie, you know, these are names that you've got in there, and you just want something that matches it. So you have a car, like a character, car filter character filter for the filters on strings, and you set it to filter on that name on that name field, and anything that matches will be returned. And that's a nice way of filtering down from your address book of 300. To four matches. Say,

 

Will Vincent  3:22  

yeah, and I'd like to say that as I'm taking a slightly different, more abstract, because I was thinking in terms of if I show code to people, but podcast, I can't show code. So the first step is you set up your basic form, right. And then the key thing I think that's really hard to understand is you can use the name field and you can set an input, which so this is the this is the query itself, you can call it whatever you want. Generally, you would call it Q, you go to Google and type in a search, but you have the flexibility to call it whatever you want. It just happens to be often called Q. And then in your for using a ListView. You need to modify the query set. So you can do that just as a field query set or you can do get query set and have a little more options. And you pass in the queue as the query that you then filter on. I have a post on this, which we'll link to. But so your description was great and incorrect, but in terms of like the logistics of like, Well, how do I get the query? That's a aha moment for people to see how it's passed from the forum to the view to be used.

 

Carlton Gibson  4:21  

Yeah. So I mean, I would use an actual Django form here, Django filter does this under the hood creator, a Django form, and then you validate the form so that, you know, you know, it isn't some crazy value that might do harm to your to your database, it's at least gonna look like a character field. And then you get the for the data out. So that comes in in the request, get, you put it into a Django form, and you validate it like you should do with all user input ever. You get it out of the form. And then you pass that value to the query set to your filter method or the query set. Right. Another instance, if you've got an if you've got an integer field, and if you just pass the rules value from the, the query set into an interview field, it's not the right type, because you need an integer. Whereas if you pass it through a form field, which is an integer form field, it'll give you back an integer. So it casts the order date field, it will give you back a date from a string, you know it, if you need those kind of that kind of casting, if you're not dealing just with plain strings, the form can do that transformation for you and clean that stuff up. So that's why I always, always, always all input from all users all times through either a Django form or a rest framework serializer or something to sanitize it

 

Will Vincent  5:33  

and Nadia that Mozilla has a very in depth to guides to this to sending the form data and then separately to validating it. So I'll link to those. But the good news is if you use Django form, you get most of that out of the box. So

 

Carlton Gibson  5:48  

an almost like a form rendered in HTML.

 

Will Vincent  5:50  

Yeah. Nice. Okay, so you've got the data. Now what so you can do these basic filters, as I mentioned, you know, and specify on Say The Name field and the title field, whatever you want. And you can also chain them together so you can and them. But sometimes, and definitely you will want to do or so maybe search by the name of the author or the title. And for that you need a cue object. Do you want to take a stab at cue objects? Because that's a pretty cool, I would say intermediate advanced features that you use all the time once you know it's there.

 

Carlton Gibson  6:25  

Yeah, secure object. Exactly. Exactly. There's a level. So a cue you will use when you call filter and you pass in. I don't know name equals Adam.

 

Right. And

 

that's that shortfall name, underscore underscore exact equals Adam. Because there's an implicit lookup there, the lookup is exact or contains or it contains any of these lookups. And the default one is exact. So if you don't specify one, the the RM puts that in under for you. And you pass those in as keyword arguments into the filter method or the exclude method of your query set. Q objects, just let To create, just take that pair that look up and create it as a queue object, which you can then pass into the filter method or the exclude method in place of those keyword arguments. So it takes cues. And if you pass in just a keyword argument it creates it turns it into a queue under the hood, and then process after that. The great thing about queues is they can you can apply Boolean logic to them so they can be added or combined in whichever way. So they're really handy is worth checking out the docs. We should link to those in the show notes.

 

Will Vincent  7:30  

Yeah, we will. So and then the form itself, you talked briefly about this, but get versus post. So this is the two ways you can send data so I get wood. So both so posts, bundles, the form data, encodes it and sends it directly to the server, get bundles it, but puts it into a string that is in the destination URL. So again, if you do a Google search, go look at the URL and you will see it will be google.com google.com q equals and a string that matches The query that you made. So, in general, for a search, the basic rule of thumb is if something's going to touch the database and update it, you should do a post or if it's something that's secure credit card information, you should do a post. But if it's just a search query, a lot of the times you can get away with just using a get

 

Carlton Gibson  8:17  

not only get away with it's much better, because then you can bookmark that URL. Yeah. So say you do a search and you've spent, you know, the Django issue. Issue Tracker is a classic example. There's, you know, 1200, open accepted tickets, and there's 48,000 ways of filtering those tickets and you fight you get you, you know, you learn to filter by component, and then, you know, needs darks. And you get it down in this four tickets. And you think, yeah, that I can take on those tickets, and they're all related and, and then you can just get that URL and you can bookmark it, and you can go back to that search. And if you want to, it's there. Whereas to go back and have to rebuild your search from scratches.

 

Will Vincent  8:55  

Wow. Yeah, that might be particular to your situation. But nobody understand. I've never done that in my life. But I understand that I could

 

Carlton Gibson  9:04  

say that you. Have you ever taken a quick Google URL and sent that to your partner or to you?

 

Will Vincent  9:11  

Yeah. Well,

 

Carlton Gibson  9:14  

yeah, but you're doing exactly the same thing. You've taken the

 

Will Vincent  9:17  

Yeah, that's true.

 

Carlton Gibson  9:18  

Okay. With parameters already, and you're sharing it with somebody else. Now, I'm in my case, I'm sharing it with my future self, but I could equally be sharing it with you.

 

Will Vincent  9:26  

I have. I have seen developers I respect implement search with posts and had this argument with them. Not to me, it seems like you'd always use a get in. But next time I have a date with them, maybe

 

Carlton Gibson  9:41  

anyone wants to come on the show and tell us why you should use post post for your URLs, huh? Yeah,

 

Will Vincent  9:44  

I won't. I won't name publicly but I'll have that debate. Okay, so what's the next stage? So it'd be nice to go beyond a query to maybe an entire document, book an email and you can do that with what's called Full Text Search. So this is a search for used to this is what Google does, where you can search, what the technical term is, in Postgres is a document. And that can just refer to any kind of body of knowledge. So what are the options for this? So there's a couple, we'll get into how that works. But standard solutions, you've probably heard of elastic Elastic Search, you could use solar, there's hosted solutions like algolia swift type, which is now owned by elastic. So if you don't want to spin up your own server, you there are some hosted solutions. Or you could also use there's a third party app Django haystack, which actually, I used for one of my, one of my companies back in the day that connects, that's a really nice interface and connects via driver to solar Elastic Search, or even whoosh, which is built in Python. So you can do it on SQL Lite. So you can try out cool search without having to get into Postgres SQL and installing that. So yeah, so full text search. So what is it actually before we get what is it it's been in postgrads since two 2008 and in Django since version one point 10. So 2016 Mark tamlyn LED that charge.

 

Carlton Gibson  11:11  

Were you a rat? Were you? Were you Django fell in 2016. Uh, what do you know that was? So those days I was, I mean, been a Django user for quite a while I followed that project with great interest as I was on the contract post Chris stuff as soon as it

 

Will Vincent  11:24  

came out. I played a Kickstarter to fund it.

 

Carlton Gibson  11:27  

Yeah, yeah, that was super. You know that. So they were there were good times, though. Because Tom Christie had the DRF Kickstarter rest framework Kickstarter that really pushed that forward. And then there was the country post with country, Postgres Kickstarter at about the same time. You know, Django was

 

Will Vincent  11:46  

async, because Andrew was talking about some Well, I guess it ties into the DSF.

 

Carlton Gibson  11:50  

Well, this is exactly where Andrew is at at the moment thinking about fundraising for async stuff. So I mean, you know, realistically, I think we can go You know, we'll get async views in. And then the question is, where's the money come from? So the LRM, right? That's right. Big and there are back end, there are async back ends. And there is the appetite for it, but it's a big job. And that's going to need that's gonna need support somehow. So it'd be a Kickstarter, Mozilla grant, whether it's one of the big corporations can come in, you know, I don't know.

 

Will Vincent  12:25  

Yeah, but it's worked before. So in this case, there is now a dedicated module that will wrap up all these Full Text Postgres features for you. And so full text itself, without getting into all of how it works, you can do things like rankings, you can do indexing, which is important for performance, you can do phrase search. So just more intelligent queries, you can do stop words. So ignore things like the that are very common. This is all language specific. You can do stemming, which is also language specific, so you can match so for example, if someone typed in Rand, you can say well, that matches to run Whereas if you're just doing a straight query, there's no intelligence there. You can do accent, different languages JSON support. So all the things you would expect from a decent search, which really comes down to relevance. Full Text Search gets you there. And it's a really deep and interesting field and the docs on the postgrads. site, Chapter 12, which we'll link to our fantastic, and I'm giving a talk at Django con. So I'll get into this a little bit, but I was wondering, have you seen anything on like, the, the performance of Postgres Full Text Search versus say something that's leucine backed like Elasticsearch or solar? Because,

 

Carlton Gibson  13:38  

yeah, you know, if you've already got Postgres up and running, it's quite good. It gets you quite an away and what I'm not sure about is what point it's like, I know the quality is better here. And then it's worth the extra ops.

 

Will Vincent  13:52  

Yeah, because Elasticsearch, totally based on what I've read and spoken to people about it is very, very performant. So I would believe that there are cases where elastic and these other ones are better. And there, it might depend on the structure of the data, but it is very performant. And a lot of folks that I know are setting up using this module, it's all built in, they don't have to rely on external service and they're very happy and these are large sites. But that is that would be a good size you couldn't hear if you couldn't avoid spinning

 

Carlton Gibson  14:21  

up one extra service. I mean, an Elastic Search isn't easy service to run in the world. So if you can avoid spinning up that extra dependency, you just got so much more capacity. Exactly.

 

Will Vincent  14:32  

It's it's easy. It's built in, it's free. Yeah, I think we're gonna have we'd like to have someone from a number of prominent Genco, people, I think, work for elastic, we could have them on and ask them that question. Yeah, it's a good question. I would definitely start somewhere, search. And then, you know, see where he go from there. So, so the, the package in Django has a number of fields that makes some things easier. We'll link to this this is in the docs. So it has a search vector. So you can query against more than one field in your database. Source search vector is a search query. So you can add stemming and stop words search rank, so you can get into ranking. And you can I guess the last big one is there's also search vector field, where for performance, you can add that but you need to do a manual trigger. So with all these things, it really depends on is the data static, or is it dynamic? So how much is it changing with indexing? Postgres has a just which is faster for dynamic data, versus a Gen for static data. And same thing with just in general with search. The question is, how fast is your data changing? Because you don't ever want to actually do a query against the data, you want to do a separate process, that pre process everything creates the indexes, but do you do need to do that? Every hour every day, every minute? trade offs? trade offs, that kind of fun to do, but That's why it's hard to give a blanket statement on best way to do it. It depends on the data and depends on your needs.

 

Carlton Gibson  16:04  

Yeah, I mean, that that's always true. I mean, what's interesting is you get like, it's a massively complicated,

 

Will Vincent  16:10  

it depends, right? We got to our tagline, what's the answer? Yeah,

 

Carlton Gibson  16:13  

depends. But like, you know, my experience is more with Elastic Search, and you will spend, so long filtering with indexes, and you know, you will reindex and it's like, that will be a most painful operation, because you realized you needed to query your data in a slightly different way, or you weren't getting the results you want. And so you reindex the whole thing, and that takes hours and then it's like, search is tough. It just really is. So

 

Will Vincent  16:42  

yeah, if you're in if you're in like an e commerce situation, you know, there's huge teams that focus just on this because it is a tough thing. But it's speed a result as well, right? Because I mean, this is the other factor but at some point, just doing a road, scan on your database to see what what can

 

Carlton Gibson  17:03  

see the records which contain the text you're looking for. There's isn't, it's not going to be good enough, but it's not going to be fast enough either you need it to be pre indexed so that you get a result quickly. You know, Google found 2 million results in naught point naught naught four seconds or whatever.

 

Will Vincent  17:17  

Yeah, yeah. So there there is, in addition to my talk, which we'll, we'll link to, I'm not sure if we'll be out, I'm giving a talk at Django con on this topic. There's some fantastic existing talks that have criminally low view counts on YouTube, so we'll link them in the notes, but Paolo, mouse Yari. Sorry if I said that wrong, has a full text search in Django talk he's given, I think euro Python, Marcus holderman, last year at Django con Europe gave on the lookout for your data talk. I know there are some talks by on Django and elastic in particular, that are worth looking at that will link to so it's a deep topic. You know, this podcast in my tutorial and the talk, I'm really want to get people up. To speed where they can start doing Full Text Search, instead of just saying, we'll just airdrop you in with a fully configured Django project. And of course, you know how forums and views works. You know, I want to be able to help people ramp up and see how you can progress

 

Carlton Gibson  18:13  

this. Well, it's a good learning curve, because you don't need like we, you know, we finish off talking about Full Text Search and Postgres elastic and these kind of things, which are much bigger, but you can get quite a long way with just, you know, filtering on the URL. All right, yeah, it right until the thousands of records, like many thousands of records, and it contains will nine times out of 10 get you everything you need.

 

Will Vincent  18:36  

Yeah, that's all yeah, that's all I wanted. Yeah. And unless you're an e commerce site, is it worth the time? You know, maybe not. It is fun to play with. So there's the highlights, and definitely for forums to using a built in Django forums. You know, the forums is maybe one of the coolest, I would say least, at least from the beginner perspective, appreciate aspects of Django because you Don't think about the fact that forms are really hard. And there's so much security and thought put into what we already have in Django. So

 

Carlton Gibson  19:07  

yeah, and who likes the writing the HTML by hand, it's just gonna kill you. So

 

Will Vincent  19:13  

I thought that I give an example. But yeah, it's better to use it. As with everything in Django, you can rely on the work of millions of smart people you should do. So, alright, that's the highlight of the talk. If you have any feedback, you can find us at chat Django on Twitter. We're at Django chat comm site, and we have a newsletter, which you're welcome to sign up for if you want regular updates. So that's all for now. See you next time. See you next time. Bye. Bye bye.