I'm serving AI models on Lambda Labs and after some trial and error I found having a single vllm server along with caddy, behind cloudflare dns, to work really well and really easy to set up
It's really best to avoid running web servers as root. It's easy to forward the port 80 with iptables, change the kernel knob to let unprivileged users use port 80 and above, or set the network capability on the binary.
vllm serve ${MODEL_REPO} --dtype auto --api-key $HF_TOKEN --guided-decoding-backend outlines --disable-fastapi-docs &
sudo caddy reverse-proxy --from ${SUBDOMAIN}.sugaku.net --to localhost:8000 &