You're doing AI wrong.
Visiting r/LocalLLaMa shows people cutting down model fidelity to get faster and faster inference (tokens per second of output).
You don't need the LLM to output data faster than you can read - because following the model's "thought"-process is how you stop an LLM from making time consuming mistakes (thus negating that high output speed) as well as keeping yourself both in-the-know of what's happening to your system & code, as well as you learning (!) things from the LLM.
A few weeks ago I started using a local LLM to not only aid in software development, but also system maintenance. While giving the LLM a task I also sit and "hand-hold" it like I would a junior. I go in and interrupt if I see that the thought process is going in the wrong direction, and I also learn from the investigations the LLM makes in areas I'm not familiar with myself.
The utility in "one-shotting" (as it's known) work with LLMs seems counter productive. If it generates something that works, you don't know how. If it generates something that doesn't work you've wasted time and resources.
The LLM is your very eager, and very junior, aid. Outsource repetitive and basic tasks, and work together with it on the things that are important.
I have no boss telling me I must use AI. I decide what I do and how I do it (for these projects, I have other ones at customers where I don't) - and I wouldn't use tools that don't have a net-positive contribution to my productivity.
Local LLMs do, at least for all tasks that have a very binary works/doesn't work completion control point.
Just don't treat them as knowledge databases. That they aren't. They're workers, like you, that need to produce and read guidelines, documentations and follow well-written plans.
@troed What system maintenance tasks do you use them successfully for?