Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a server at home sitting IDLE for the last 2 years with 2 TB of RAM and 4 CPUs.

I am gonna push it this week and launch some LLM models to see how they perform!

How much electric bill efficient are they running locally?




Depends on the server. Probably not going to be cost effective. I get barely ~0.5 tokens/sec.

I have Dual E5-2699A v4 w/1.5 TB DDR4-2933 spread across 2 sockets.

The full Deepseek-R1 671B (~1.4 TB) with llama.cpp seems to have a in that local engines that run the LLMs don't do NUMA aware allocation, so cores will often have to pull the weights in from another socket's memory controllers through the inter-socket links (QPI/UPI/Hypertransport) and bottleneck there.

For my platform that's 2x QPI links @ ~39.2GB/s/link that get saturated.

I give it a prompt, go to work and check back on it at lunch and sometimes it's still going.

If you're going to want to achieve interactively I'd aim for 7-10 tokens/s, so realistically it means you'll run one of the 8b models on a GPU (~30 tokens/s) or maybe a 70b model on an M4 Max (~8 tokens/s).


Unless it's actively processing something it's also sitting idle, so pretty efficient besides it vacuuming up all your system memory.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: