After spending the summer more or less not touching computers at all (except for loading models into the 3D-printer that I got), work life is back with full force. With that comes a lot more time hands on keyboard, and I think that I’m enjoying it more than when I left for summer vacation.
Still, it’s kind of a strange time to be in software with the economy not being great and AI-hype along with predictions that anyone who writes code for a living will be replaced by a machine any day now. I guess that the jury is still out on that one, and if it’s a guilty verdict I’m going to have to come up with something else to do with my time.
Back at work, I’ve had reason to deep dive a bit into the operations side of large language models. It’s been fun reading up on what goes into running these things efficiently and the resources required to do so. Fun, because it’s something completely new that I get to have a better understanding of the workings of, and also because of the tinkering involved. Now I just wish I had some hardware on which to run local inference.
I have been running inference on an ASUS PN64 mini pc however. It has slightly faster RAM than my work laptop and thus performs a bit better. Qwen3-coder 30b will generate a working, single file implementation of Tetris in about ten minutes. That’s about 7 tokens per second or so. I installed Debian 13 on it, and I really like it. Ever since I installed a desktop OS on it, it’s been crashing randomly. It turns out that it was a power management issue and the kernel does indeed spew out warnings about that at boot. Just disabling anything power management related seems to have fixed the issue. Now I just wish that it would take the all of the 64 Gb of RAM that I have for it, but it appears that it’s caped at 32, which is annoying when you’re tinkering with local language models.