Django Chat

Auphonic - Georg Holzmann

Episode Summary

Georg Holzmann is the founder of Auphonic, a leading provider of automated audio post-production powered by Python and Django.

Episode Notes

SHAMELESS PLUGS

Episode Transcription

Will Vincent  0:05  

Hello and welcome to another episode of Django chat. This week we're joined by Georg Holzman the creator of Auphonic, which handles audio processing and uses Django. We'll dive into all those things, how to scale it, and it should be a great discussion. So, Carlton, how are you doing? I'm very well. Well, thank you very much.

 

Georg Holzmann  0:21  

Well, how are you? Good? Yeah, thanks. I'm fine. And thanks for the invitation and curious to speak about here.

 

Will Vincent  0:28  

Oh, I think there's too many things to speak about. I guess let's start with how did you get into programming and there's so much we can speak about the audio part, but also around the Django part. So maybe what's your, you know, what's your what's your background? How did you get into programming?

 

Georg Holzmann  0:41  

Yeah, so well, I get into programming already quite early. So I think I was in primary school and my dad always had the old computer and I always managed to destroy the computer and play around with it. And at some point, yes. Added programming. I don't know why actually, I think some, some friend was showing it to me. And the first programming language was to buscar. And then I learned c++ and C and all these more low level things. And yeah and into buyten came actually quite late. It was at the university.

 

Carlton Gibson  1:22  

Okay. Did you study computer science said

 

Georg Holzmann  1:25  

Not really. I actually studied audio engineering, okay. But this was a lot. It was not really only audio, it was more like electrical engineering and computer science, and a lot of signal processing, machine learning and all these things. And so basically, it was computer science. Yes.

 

Carlton Gibson  1:46  

Okay. So you don't like gum high low filters, and you know, all those fancy things there.

 

Georg Holzmann  1:52  

Yeah, exactly. Okay. And this is also why I got into Python at some point because we were using for everyone checked NumPy inside pi and all these things when they were the early stage and yeah, so I came into Python and I loved the language a lot. And so I got into programming

 

Carlton Gibson  2:13  

Okay, and this led straight on to all phonic then which we should say is yo tell us about funding what isn't funny?

 

Georg Holzmann  2:19  

Yeah phonic phonic was was built a little bit later then. And yet this is basically also but all the processing and some other things. So it started as a web service. Now there's there are some desktop apps, but you can basically upload an audio file, and then this file is analyzed on our servers and then according to the analysis, it's processed and encoded into different formats and distributed to other services and things like that.

 

Carlton Gibson  2:51  

Right. And by the sounds I mean this is using exactly the techniques that you studied at school right. The signal processing Cleaning up.

 

Georg Holzmann  3:00  

Yes. Exam quality.

 

Carlton Gibson  3:02  

Yeah. Okay. And so I guess the question is, is most of that done in Python? Or is that done in? You know, other languages?

 

Georg Holzmann  3:09  

Yeah, their signal processing and machine learning things, mainly built in Python. So we use a lot of NumPy, and machine learning libraries, different machine learning libraries. And yeah, also some bots also written in C, which are optimized a little bit. And for the for that we mostly use seitan. If you know this, yeah. Which is basically, I guess everyone on your podcast knows this. But this is basically a compiler from Biden to see. And, yeah, then then we also had to build a web system around it. And I think this was 2009 or 2010. And back then there was basically everyone was using Ruby on Rails. Yeah. Okay, this was very popular and I was thinking yeah, I don't want to get in a whole nother software stack and I was going on in the Biden world as well. And found chango. And yeah, since then I'm a Django user. Okay, fantastic. So

 

Carlton Gibson  4:14  

So you've been through the long haul the but you know, like, you've seen the introduction of, I don't know, let's say migrations and the big changes. One, five, and then one seven.

 

Georg Holzmann  4:24  

I started with South. This was the framework before the migrations were introduced. Right. And yeah, but but but my main focus was never was never mainly on the web tools. It was more on the senior processing and machine learning part and the web tools just discovered them by next to to the other things. No, it was never my main focus. Yeah, okay. I just had to learn it because I needed them.

 

Carlton Gibson  4:56  

So, Goma, are you go ahead, Carlton. Well, I was going to say So is the main, the main uses for Django here is to build the API around the data processing pipelines that you have.

 

Georg Holzmann  5:08  

Yes. So there are various things in our system. So of course, the first thing is the web GUI. So how you see the website or how you can create these productions, these other productions, then this is the client side, front end, then there is the server side. So after you upload an audio file, you have to process and analyze it, which takes quite some processing power. So we need to have a distributed system where we can process all the audio files. So task queue for that we use this DJ celery and rabbit mq.

 

Carlton Gibson  5:50  

Okay, and have you found Have you found working with celery and scaling rabbit and Q and things like that over the years because I mean, you've been obviously at this for quite a while now. And so you'll have You've seen it evolve you tell us you must have some war stories there.

 

Will Vincent  6:05  

Oh yeah I'd like I just like to add to I I found a phonic I've never had an issue with it. I mean, just last night I was having an issue with our podcast host which seems far simpler than what you're doing so however you're doing it, it's a it's very speedy and I've never had an issue which is unusual dealing with audio for me online. Okay, that's great.

 

Georg Holzmann  6:26  

Rather Debbie I of course issues in the past.

 

But not for you. Nope.

 

Yeah. So yeah, for us. rabid mq and celery worked quite well. I mean, there were some issues at some point. Also, at some integrations, there were some problems but no major things. So I cannot tell anything. anything bad about it.

 

Carlton Gibson  6:55  

No, it's not. It's not so much bad. It's more like As you scale out these things

 

Will Vincent  7:01  

well, Carlton has feelings on celery.

 

Carlton Gibson  7:03  

Yeah, no and no rabbit. mq, you know, I've had scars of trying to make that work I've wanted to and yeah now one bit for quite a while now I've been firmly in the reddest camp and using that and working with that well, and finding that it's very scalable. But, you know, rabbit mq is still there, and still has lots of adherence. So I'm, you know, just wondering, you know, the reason I ask is to try and draw out your, your experience, which sounds very positive.

 

Georg Holzmann  7:34  

Yeah, so our setup is maybe a bit special, because we have only very long running tasks, right? Because Because if you process one audit file, this usually takes it depends on the file, but it takes quite long. It's not, yeah, not thousands of small task, just few long running tasks. So yeah, I think for our use case, it doesn't matter much if you use rabbit in queue or something Else. So basically the same I guess,

 

Will Vincent  8:03  

okay interest? Do you think that with your setup, it seems like if there was ever a case for potentially serverless? It would be what you have, right? Because I imagine you have spikes of activity and then inactivity. I wonder if how what is your traffic look like? Do you have those spikes? Or is it more kind of level? How does it look? So I think I know you're global.

 

Georg Holzmann  8:20  

Yes, we have this spikes, but they are not that big a routine. So it's the averages. So the standard deviation is not so big, I would say because, you know, in Europe, they're processing. Now maybe, and then in the US, it's it's a little bit later. So it's not such a problem. But of course, the the infrastructure, so maybe let's speak a bit about the infrastructure. Yeah, yeah. So we do not use Amazon, AWS or things like that. So we have root servers which are rented because they are money. lappa for our use case, because we need a lot of memory and a lot of processing power. Therefore, of course, a lot, many routers don't do much most of the time. But then when we have spikes in processing, we need all of them. So of course, these spikes are important and determine how many servers we have to use. But in the end, it's still, I would say it's still much cheaper compared to to AWS or other services. And

 

Carlton Gibson  9:29  

so are you managing the actual hardware yourself there? Or is that hardware managed? Because this is one of the sort of selling points that Amazon tell us or, you know, it's not just Amazon, it's all these providers, they sort of say, look that you're not, you're not having to go to the data center to change the hard disk on, you know, these kind of activities that you had to do that when you were entering space in a colo,

 

Georg Holzmann  9:50  

yeah, no, you don't manage the hardware. So they have to manage, but we have to manage the whole software. So from the operating system up to all this To install, but the hardware is managed. So I mean, maybe you know this also its heads not from Germany. Yeah, yeah, they are very big cost in Germany and Belgium for for the, for the

 

Will Vincent  10:12  

routers, right? Okay. Can you give us a sense of how big is the Sonic Team and kind of the sort of traffic you're dealing with now versus I guess 10 years ago? Or what, eight, nine years ago when you when you started, because, you know, to me, you're just a service that I've seen recommended and that works. So I'd love to hear more about the company if there are other people involved and as much as you feel like you can say about that. Yes.

 

Georg Holzmann  10:36  

You're so he has a we're about five people. Okay. And I mean, you You mean the the amount of audio we process

 

Will Vincent  10:47  

sure just how how has traffic? Divorce again, how is that child has that changed over time? I mean, are you just been linear Have you spiking

 

Georg Holzmann  10:55  

a row? That's interesting.

 

It's not the Yeah, I mean, it is a little bit linear as well, of course. But then there are always spikes. If some other user groups find out about this service, and then they bring in a lot of more people, then there is a spike. And but then it just evolves linear. So, so it's it spikes and linear.

 

Will Vincent  11:20  

Yeah, it will. Sorry, I was going to ask because I know that. So when we, when we started using this podcast, we started using a service called Zen caster, which I found out uses you. So I'm curious how is there? What's sort of the mix of services that use Oh phonic versus people like us who do it directly? Do you have a sense of what that that mixes kind of companies versus individuals using it directly?

 

Georg Holzmann  11:44  

Okay, yeah, this is maybe half half. So we have an API where for the other listeners, we have an API where other services can integrate our services and algorithms so that the users of TF But the servers don't have to know that there is a funny insight like you did with St. caster, for example. Right. And so there are some companies which are integrating our services into their systems. And yeah, then then there is our website and and direct interface, like you're using now, I guess. And so yeah, this is the Sherry's half half, I would say,

 

Will Vincent  12:20  

what's your sense of how people find you? Because I think I, the only reason I found you is because I think I saw Tim Ferriss mentioned that he used you directly and I thought, Oh, that sounds like what we're using Zen caster. And then Carlton, I had a number of issues with doing it web based. And so it was really kind of circumstantial, that I even found a phonic itself. Is there a standard path for how clients kind of find, you

 

Georg Holzmann  12:43  

know, I don't know. So this, basically, we don't make any marketing or things like that. So usually, it's word of mouth, especially in the bottom. It's kind of advertising. If you build it, they'll come

 

Carlton Gibson  12:56  

you built it and they came.

 

Georg Holzmann  12:57  

Yeah. So at the beginning At the beginning, we had all started with a podcaster from Germany, because he I was listening to his podcast. And he always told everything is a complicated all the audio things and generating all the file formats. And then I thought, well, that's actually quite easy. So let's build something. And it started with some small script. And this is really one of the famous podcasters here in Germany. So after we released it, a lot of people used us already.

 

Carlton Gibson  13:30  

Okay, so you could you had a kind of critical mass to begin.

 

Georg Holzmann  13:33  

Yeah, this was and we got a lot of good feedback, of course, and lots of training data and test data. And yeah, then then we got a grant from the government here in Austria, to so we got some money basically, and could build a prototype or continue to build the system. And with this money, I could hire someone who helped to build the system and then actually another one as well. Then after, after this this grant, already some, some other podcasters got to know our system. I don't know why, but somehow it spread over the ocean to the US. And the other way other people which are playing around with it, and yeah, afterwards, after this grant, we had to introduce a pricing model. So in the beginning, everything was free for everyone. Right. And then we had to do introduce some pricing, because otherwise we could not live from it anymore. Yeah, but because you initially you were funded by the government grant. Yes, exactly. Okay, fine. And that, but it seems

 

Will Vincent  14:37  

like that maybe let you take a mean, you seem to have a long term approach to this. Whereas, I mean, I used to be out in San Francisco and if you take venture capital money, you pretty much within 18 months and really 12 months, you have some very high goals you need to hit you're not allowed to take your time. Yeah, it really sounds like it's it's that's fantastic that you you had that ability instead of being fired. pressured to introduce things faster before they were ready.

 

Georg Holzmann  15:03  

Yes, we went? Well, the The nice thing we had in our situation is that we did not have to do any fancy marketing or other kind of strategies how to get more users. I mean, that's, that's good to do if you're interested in these things, but we were all engineers, basically. And we did not know anyone who can do this with other things. So we tell you try to avoid them.

 

Will Vincent  15:33  

Well, this is marketing. Now, I've seen you've been on a couple of podcasts. So that that's marketing. That's Yeah, that's amazing. Well, so my question I had is, are there when you create this, were there any competing services that do that and kind of what do you see in terms of competition because it seems, I'm not aware of any, but it seems like such a. Now that we have it as part of our post production flow. I can't imagine doing our podcast without it,

 

Georg Holzmann  16:02  

there are no direct competitors, which which bill the same because we are of course, very, very specialized into this podcasting use case or also other spoken word recordings, I would say, like conference recordings and lectures and things like that. I mean, there are of course, other audio software companies like isotope which which were built their tools for for editing. So basically, there is no such automated way, like, like redo it. So that's why we tried to focus on this automation and workflow aspect and how to not try to build an editor which would be of course also very useful. But it's, of course, a lot of work if you want to build it. Right and and then you get into direct competition to additional isotope or other companies

 

Carlton Gibson  16:59  

and those tools those tools are super, super powerful, but super complicated. And yet, you know, I consider myself an audio beginner. And there's just no way I can apply the right filters and you know, in anything like a time, efficient way, like so for me to be able to upload the file, and it comes back with the noticeable difference in sound quality you think? Yeah, that's amazing. That's just pretty.

 

Georg Holzmann  17:22  

Yeah. So this is Yeah, it was also disadvantages. So in our system, basically, you get this is exactly the initial This was exactly the initial goal when we started the system so that users which don't have a lot of out in the village are also users which have a lot really lots of outage which cannot be handled manually, so that they can just use this tool and get out order which is okay. But of course, if you have very specialized use cases and you want to get out every detail, then these this editor tools, of course, much more powerful because You, you can really work on every detail. But of course it needs more time and knowledge. Yeah,

 

Carlton Gibson  18:06  

um, but I guess one thing that we talked about in product development is to focus on your particular niche. And yes, don't worry about trying to serve the other needs. So you know, that you don't have these features is it's a selling point, right?

 

Will Vincent  18:18  

Yes. Yeah, it's very clear what you do well, so maybe we can talk about some of those features, because there's a lot that are awesome. I wonder, what would you say? What are the main features that people kind of come to you with? I mean, I mean, for us, I mean, when I'm doing the audio files, I love that you have the presets, just in terms of the web base. That's great that you know, because we do the same thing for every podcast, so I can just load our presets. And then we usually do, you know, basically go for almost everything. So the compression leveling normalization noise hum, I guess that's a broad question. But as you think of the features that you have now on the site, what do you think are the core features and then we can get into some of the more advanced ones because I know you have some new features you've just launched?

 

Georg Holzmann  18:58  

Yes. So I would say the most important feature where everything started is our leveler. So what does it do? It levels out the elder which means that if you have multiple speakers like like you're having a conversation now, then all these speakers can have different levels and loudness values and some, you have to balance them. So, otherwise you would always have to do to use your volume control and the chest for the levels. But this is actually a very complicated task because it is it is easy if you just have speakers. But in order you also have then bigger bands that there is for example, just background noise, then of course, you should not amplify this background noise like he would do with the speakers. Or then there are some music paths which should be handled quite differently because music music you want to have more inner dynamics. And if you have speakers you want To sound them equally loud, but music should have some differences of course. And therefore you you have to analyze the audio first and see if they are different speakers where music paths, they're just background noises and things like that. So basically like an audio engineer would do it and then you have to balance these different parts and use compressors and limiters and things like that. So you said you using machine learning for a lot of this so I when you're talking about the different parts I'm imagining, you know, machine learning thinking and think okay, so are you using Are you tagging particular parts as this this kind of this is speech, this is music this is you know, some other category and then you'll apply different filter or set of filters depending on what gets tagged by the machine learning algorithm. Yes, exactly. So, we analyze the audio and classify various things like different speakers, music bands and different noise Or if the background noise changes, for example, this is this is another algorithm, we have a noise. It's called noise reduction. So basically, this algorithm first analyzes the audio and sees very different background noise scenarios. For example, if you record in a room and then we go outside, then there is another background noise scenario outside. So we have to segment the audit first in these spots and then do noise reduction, the first part and then the second one,

 

Carlton Gibson  21:31  

right. And there's no there's no tooling to do that in something like audacity, you'd have to, you'd have to do it by hand, you'd have to sort of manually identify the the segment.

 

Georg Holzmann  21:40  

Yes, in most of the Edit doors, you do this by hand. And basically, we always try to automate these steps you have to do by hand. We try to automate these these things by machine learning. And then just apply the algorithms like you would do it in an audio editor for example. Okay. Okay,

 

Will Vincent  22:00  

fantastic. Well, what and then you also have I mean, there's chapter marks, which I believe is that that's something you had before the sort of Apple podcasts and stuff. How does a feature is that right?

 

Georg Holzmann  22:13  

Yes, I think in Apple podcast, it's no, it's that's not really true. So this is a very complicated topic. Because there is so much confusion about it. So what this chapter marks a chapter mark is basically just a timestamp and the title. So you say at this time, this chapter starts like, like video does it quite long already. And the problem is with audio, that there are many file formats like mp3, mp4, oboes, or whatever. And usually this this chapter max only defined for video. So in the mp4 container, everything was defined quite well. It is still very complicated, but at least was defined. And usually, at the beginning, Apple was mostly used mp4 audios or a C audio in an mp4 container and the apple podcast app always or since very long supported chapter max in mp4 files, but most podcasters use mp3 files. So they did not recognize chapters in mp3 files. And there was a very old specification for for mp3. So four, actually 483, which is the metadata standard for mp3, how to put chapter max also in mp3 files, but nobody used this specification. And yeah, then we just implemented the specification basically. And then more and more board catches added support for it. And I think now since last year, also the apple podcast app supports mp3 chapters as well.

 

Carlton Gibson  23:49  

Right. Good, because if enough files are using it, then they'll have to support it.

 

Georg Holzmann  23:54  

Yes, it seems so. Yeah.

 

Will Vincent  23:56  

Yeah. Well, speaking of mobile apps, I mean, you have it so impressed by keep finding new features that you have. I mean, you you have mobile apps as well right for a phonic for Yes, Android and iOS. Yeah. And so what's the how recently was that? And how did that come about? Right? That's another thing to build and maintain on top of everything else.

 

Georg Holzmann  24:14  

Yes. Well, the thing, basically, we started mobile app because a friend of the first developer at a phonic wanted to do a project with us. And he's a web and mobile developer. And now he works at Facebook. So he's quite, quite good in the things. So he said, Yes, he wants to build the iOS app for us. And the problem in iOS was was long that you cannot upload audio files in in a web site. I think they changed it now. But for a long time, you could not select audio files and upload them to a phone for example. So the so the idea was to build just a simple recorder and then Use our API in the mobile app so that you can basically export files from your phone to do a phonic. And yeah, this this was the start, basically. And then we also did an android version. And on Android, the situation was even more complicated because there is no us able audio editor on Android. And then we thought, well, we could also build a little audio editor. Right, brilliant. And

 

Carlton Gibson  25:30  

yes, how does that go with device compatibility? Because that's, that that's the great challenge on Android. Right? Is it works on your Samsung that you've got in the office, but not on, you know? Yeah. Device out there in the street.

 

Georg Holzmann  25:42  

Yeah. But actually, there are not so many problems. But of course, that's that's more difficult.

 

Carlton Gibson  25:49  

Okay, interesting. And so you talk about your eight year API there, and I guess that's built with Django and Django rest framework.

 

Georg Holzmann  25:55  

Yes. Okay.

 

Will Vincent  25:57  

Yeah. So if you have problems, talk to Carlton

 

Georg Holzmann  25:59  

you developing this.

 

Will Vincent  26:01  

Yeah, always the container.

 

Georg Holzmann  26:03  

Very nice. So thanks a lot.

 

Carlton Gibson  26:08  

I don't do too much. But if you so you've been using that from, I mean restaurant for? Well, it wasn't quite around in 2010. But like 2012 2013,

 

Georg Holzmann  26:20  

I think, yeah, I would have to look for an

 

Will Vincent  26:24  

exact date. Right. When you think of all these technologies that you're juggling is the I assume that the web piece just sort of follows the, the audio part, or how much time do you spend on just scaling up the the web part that's sort of the front door for everything. I'm just curious if you know where you where you spend your time. Now, given that you're at scale, and you're still interested, we can talk about I know you just introduced a new a new leveling algorithm.

 

Georg Holzmann  26:51  

Yeah, I mean, this is always different. So sometimes we work more on the vet bot or other bots and sometimes more on the algorithms, but the I'm in scaling scaling was not so much of an issue, I would say. So Well, basically, we always build some, we always have to fix some things. If If you see, it will get very hot. But But yeah, actually, I mean, you know, if the traffic goes up, we just ran some more servers. So basically, there's not so complicated, right? Because in our case, I mean, the website itself does not need much scaling because we don't have hundred thousands of users every minute of course,

 

Carlton Gibson  27:36  

yeah, so I'm imagining you could run the website on you know, a medium sized server and it would chug away quite happily in a lot of spare capacity. And then the big need to scale is the the back end processing unit.

 

Georg Holzmann  27:47  

Exactly. So the the database or the all the web front end and things they are just on one big root server and then we have various other servers, for for The audio processing and for the, for the long running tasks,

 

Will Vincent  28:05  

processed audio files, I believe it's 30 days that you store it, and then it goes away. Is that correct? Yes. 21 days 21. How did I assume that came about? Because you looked around and said, Oh, my God, we have all this data, or did you have that from the beginning? Sort of a limit on how long you would host processed audio?

 

Georg Holzmann  28:22  

I think we had this quite early because otherwise it would need a lot of storage. Yeah. Yeah. Yeah, of course, also, data protection.

 

Reasons reasoning.

 

Carlton Gibson  28:38  

And can can ask that you store on disks, physical disks, in your, in your, on your rented servers. Are you using cloud storage for

 

Georg Holzmann  28:45  

no restore on our routers on the disks? Right. Okay. Okay.

 

Carlton Gibson  28:49  

So it really is all all hardware in the data center.

 

Georg Holzmann  28:54  

Yes, exactly. Fantastic.

 

Will Vincent  28:56  

I always like to ask this question. So if you if you could just wave a magic wand. And add a new feature to a phonic. What would it be? Oh, hmm. Because I assume you you have. I mean, obviously, I don't have a sense of what is truly challenging or what you think your customers are demanding. I mean, but yeah, I'm curious where where you see that need? Or you know, if you had all this time, you would spend it.

 

Georg Holzmann  29:18  

Well. Oh, that's that's difficult to say our future feature list is very long.

 

Will Vincent  29:24  

Okay. Yeah, sure. Or maybe there's something that you don't even know how to tackle it, but it's sort of an unresolved problem and processing audio. Because again, Carlton, and I don't know that space at all.

 

Georg Holzmann  29:35  

Yeah, okay. This direction. What would be really cool is if you have, let's say, you have very bad audio like mp3 with 32 kilobits, so everything is compressed already, completely, and you can hear anything about it anymore. And then one could build an algorithm which makes the audio as the original again.

 

Okay, this would be Yeah.

 

It really is a magic one. But yeah, at some point, you will lose information and then it's difficult to restore the information again.

 

Carlton Gibson  30:18  

But there are there are you do see these kind of AI or ml applications where they kind of guess what the missing data is and interpolate that and they you know, sometimes come up with good results.

 

Georg Holzmann  30:31  

Yes. So, there are of course, also people which are trying exactly that with badly encoded audio. But yeah, this is of course, not possible for every situation. Unfortunately,

 

Will Vincent  30:44  

one of the features among the existing features you have that actually we're not using which people have asked for is you you link in with speech recognition, right, where someone can link up a third party transcription but this this

 

Georg Holzmann  30:56  

is so perfect. Speech recognition system.

 

Will Vincent  31:04  

Yeah. What was that something that you had, again from the beginning or something that users asked for? That integration because it looks, it looks really nice because that is another step in the tool chain of producing a podcast where do the leveling and then transcription. I mean, it's something we should we should add,

 

Georg Holzmann  31:23  

yes, this this was not from the beginning. So, but this was always very interesting for us because especially for podcasting, you know, podcasting and search, podcasts are not search able. So if you would have a transcript, then you can search within the podcasts. So that's, of course perfect. But yeah, some years ago, there were no services which produced reasonable output for a reasonable price. Well, yes, it was actually, I think two years ago or three years ago, when First API was the No, it was not the first one. But the one one of the first ones was the Google Cloud Speech API, which which had an acceptable quality and also reasonable price. And yeah, now there are some there are various others, other API's as well, which can do that. So and back to the beginning. So actually, we wanted to build our own speech recognition system. But then we we, we thought we cannot do everything. Because it's, it's it's really very time consuming, especially if you want to build it for multiple languages. So you have to have specialists for every language and a lot of data for every language. That's why we decided to integrate various third party services for speech recognition. So we still do our own both pre processing and flight the audio into small parts or cut out the music paths and take out only digital coasts and then 70s slices to the external services to the speech recognition services and then combine it again so that you have two time codes in the transcript and so that you know at which time who is speaking what we should be using this Well, why don't we use

 

Will Vincent  33:17  

it sounds amazing you use us the Royal week, Carlton, you're welcome to Yeah, I think it's it's so brilliant how you've, you've stayed kind of within your lane yet still expanding features because I mean, podcasts in general are going through quite a bit of consolidation right now. But it seems like what you're doing is so much better and so hard to do well, that it's not really as maybe appealing for Spotify or someone to say Oh, we'll just do a phonic or perhaps maybe, or, or is that an existential threat? Do you think down the line that one of these places will say we'll just scoop that part in because again, unless you really know what you're doing, I don't it seems very difficult to have a high quality podcast without Using a tool like oh phonic,

 

Georg Holzmann  34:02  

rather, that's not true. If you're an audio engineer, you can just do it yourself of course.

 

Will Vincent  34:06  

Sure, sure, sure, but most but that, you know, that's what 1% if that, you know, everyone else who's a, you know, cousin. Yeah,

 

Georg Holzmann  34:15  

yeah, thanks for the to your question.

 

Yeah, I think our our service is just very specialized and there is just a lot of time already put into it, how to how to really optimize it for this use case. I mean, of course, someone else can build similar things. But it's, it's it's, of course, a lot of work and you would need I think, quite a bigger team than than we are. If you don't feel it for yourself. And sure. And yeah, I don't think that there is so much money in it that you will throw 20 people team on it,

 

Carlton Gibson  34:54  

but I don't know right. So it's a nice niche in that if you as long as you stay the right size in the scale up to big

 

Georg Holzmann  35:01  

Yeah, maybe you can

 

Will Vincent  35:02  

you can survive in that space. Can you give us a sense of how many what metric Do you use for for size? Is it the size of files processed? Is it, you know, hours? What, when you internally look at your metrics, what are the How do you manage growth, because I think that there's a number of different ways to potentially measure that.

 

Georg Holzmann  35:21  

Usually, usually, we take I mean, also for our users, the important thing is the length of the audio file. So in our system, you have it so two hours of audio is free for everyone in our system. And if you process more audio, then you can buy additional credits and the credits are in hours of audio. So if you have one hour of credits, you can process one hour of audio. So therefore, therefore, of course, the most important measure I would say is the hours of audio the process may be

 

Will Vincent  35:57  

well, because I guess yeah, cuz audio files can be One hour can be. I mean, yeah, very different in size was, was that a?

 

Georg Holzmann  36:05  

Yes. So did

 

Will Vincent  36:06  

that seems smart from a marketing perspective, but I, as you know, suffer someone like you who really knows the audio, I mean, the cost could be wildly different for one hour of, you know, one file versus another.

 

Georg Holzmann  36:17  

So yes, the size of the audio file itself does not say much, because if you have an mp3 file compared to an

 

WAV file with

 

a very big bid size, then the mp3 file is very small, of course. But then if you process the file, you have to decode the file again to get the raw data. So and then it doesn't matter if it was an mp3 or WAV file because you have the same you need the same processing amount of processing power for it. Oh, interesting.

 

Will Vincent  36:47  

So I'm curious you because you went to university for audio engineering. What? What did you think you're going to be doing at this point in your career and kind of what are your friends from university doing? Because I'm sure they're not all doing

 

Georg Holzmann  36:58  

startups. No, it's difference. So well actually, after university, I started. So I did my master studies here in Austria. And then I started a PhD in Germany and also about machine learning and audio It was It is called this field is called music information retrieval, where you try to extract information out of audio with machine learning techniques, but yet, and I did not finish this PhD because basically, I was the only one there which did these things. And then I started the chop at the web company and get to know all these web things, or the Django, so they also use Django there. And here, then I thought, okay, I can also build something for myself. And I also get to know this podcaster, which always had this problem with audio processing and encoding. And I just tried to build this, these things myself,

 

Carlton Gibson  37:57  

you know, I mean, it sounds like the perfect pipeline. You know, you've You've done the signal processing, you've done the machine learning, you've done you, you're into the audio stuff, you've done the web programming site, you've got all the parts together, and you've just created this awesome business out of it. That's exactly.

 

Georg Holzmann  38:11  

I still had no idea about doing business. But But yeah,

 

it's okay. Nonetheless.

 

Will Vincent  38:20  

So did you go to a web place that was doing audio stuff? Or did you just separately, go to a web place? And then later said, Oh, I can combine these these two?

 

Georg Holzmann  38:29  

No, it was not about audio, it was something completely else. But at the time, I was interested in, in, in web things, because I did never do two big web projects. So I just want to get into this topic. And yeah, for that it was quite well,

 

Will Vincent  38:47  

yeah. Well, it's still the case in certainly in the US that you can't, it's very difficult to learn web in school. There's sure some, some efforts on it, but basically, it has to be hands on. And even something like Django is largely not taught at all. And if it is taught it's by an adjunct. And it's sort of an elective. So that's, hopefully that will change. But it's it's hard to get a backbone in web technologies, even though the majority of undergraduate computer science graduates probably go work on something web related. So there's definitely a Yeah, educational mismatch there right now.

 

Georg Holzmann  39:24  

And you just have to do it because all the resources are out in the web, of course, and you just have to do it and learn it by yourself.

 

Will Vincent  39:33  

Yeah, spoken like an engineer. So what are you working on right now?

 

Georg Holzmann  39:37  

Yeah, so as I said in the beginning, so a phonic is started as a web service, with this channel, band, and etc. But we also have a desktop application with our algorithms. So they don't include all the features of the web service. But the advantage here is, of course, that you don't have to do the processing in in the cloud, and you don't have to upload files, so the processing is just done on your computer, which needs of course, both is normal, but you don't have to upload download. And yeah, we also have a different pricing model for this desktop app. So it's just a one time purchase varies in the web service you have to pay credit to based on how much you process but yeah, but I wanted to tell is that we are working on a new version of the desktop software at the moment, because it was very difficult to get all these Python and machine learning tools to to the desktop computers, and especially difficult was the GUI part. Because we be used in the desktop apps, we use this w x widgets framework with their Python bindings, and it was so frustrating and so difficult. And now we did rewrite everything with web based tools. So we use electron and Okay, she's electron Combined with Python, but yeah, electron is still a very big framework and not so nice to handle. But at least now everything is HTML based the whole user interface. And the back end is all done in Python. And so as long as a long term goal, we could maybe use the same interface for the web and for the desktop version, and just reuse all the components and use Python as the processing engine,

 

Carlton Gibson  41:27  

and you bundling Python in the application. Yes. So using like the pi b project there, or something similar?

 

Georg Holzmann  41:34  

No, we are using, I will there are various ways how you can do that. First we're compiling everything you see with seitan. And all this. And we are using this tool is called pi installer, which was creates such a bundle with Python and on the develop all the dependencies you have and then this is bundled into one binary and you can afterwards distribute this

 

Carlton Gibson  42:00  

Okay, fantastic. And then that that will, because that's electron base that will be on Windows and Mac and Linux as well.

 

Georg Holzmann  42:07  

Yes, exactly. Okay. And then the electron app just also takes this Python binary and includes it in their framework. Wow.

 

Carlton Gibson  42:17  

Yeah, that's really cool.

 

Georg Holzmann  42:19  

Too many tools. Yeah.

 

Carlton Gibson  42:23  

Yeah, I mean, Python doesn't really have a good story for I mean, it's good on the command line. It's good for building applications. It doesn't have this story about desktop integration or even front end web integration. So

 

Georg Holzmann  42:34  

yeah, well, it has it is good for desktop, I think but the missing part is the GUI part.

 

Will Vincent  42:41  

Yeah, yeah. Yeah. Awesome. Well, well, thank you so much for coming on and sharing this story. Um, we love using a phonic and I actually, what I love in a way too, is that the only reason I thought that maybe use Django is I saw your the signup page and you're using slash accounts and username, email password, and I thought, Oh, that looks that looks like Django is I just sent you an email. It's like, Oh, yeah, it is Django. Yeah. So there's all these, there's all these, you know, a big thing for us in this podcast is try to highlight all the different ways Django is being used in all these different realms, which are largely kind of hidden, because it's, you know, for you, it's a sort of a secondary thing, but still a key part of your, you know, pipeline of your process. Yeah.

 

Georg Holzmann  43:19  

And what I find quite important is that changa is now already quite old, and it still evolves. And it's still a good choice. If you would start today, I would say that is the perfect parent. And this is, of course, very important if you are a small company like we are, so you cannot repeat everything from scratch after five years, sir.

 

Carlton Gibson  43:39  

Yeah, no, but also as well, you need to know that Django is going to continue working, you know, the next release isn't going to totally change the API or you know, exactly, on such things and those stability guarantees and the deprecation policy are important.

 

Will Vincent  43:53  

Yeah, well, actually, maybe maybe one last one if, if there's something you could change about Django Or what would that be? Or? Or do you have thoughts on maybe how the future async wave? Would that impact phonic at all? Or is that?

 

Georg Holzmann  44:09  

Not so much? I don't know, I did not do so much with it since now because I mean, the thing is our processing is done in a different stage. So it's done offline, basically in the queue. So here, we don't, would not need this a sync thing. But yeah, I think for for user interface improvements, it's of course, very, very, very interesting. But I have to play around with it a little bit more, I think.

 

Will Vincent  44:39  

So if listeners want to use Oh, phonics, should they just go to the the main website or where where should they be directed?

 

Georg Holzmann  44:44  

Yes, sure. So our website is a phonics.com. And yeah, just try our system. So two hours per month for free and if you have any questions or feedback, we are always very, very happy to to get feedback and also Reports and whatever.

 

Will Vincent  45:03  

Great. Okay, well thank you so much for coming on and giving us

 

Carlton Gibson  45:06  

for your time. Really interesting. Your Thank you. Bye bye. Thank you, ciao.