Hacker News new | past | comments | ask | show | jobs | submit | from login
K-Quants (github.com/ggerganov)
2 points by tosh on Dec 29, 2023 | past
CUDA: Faster Mixtral Prompt Processing (github.com/ggerganov)
3 points by tosh on Dec 21, 2023 | past
Performance of llama.cpp on Apple Silicon A-series (github.com/ggerganov)
100 points by mobilio on Dec 19, 2023 | past | 41 comments
Llama.cpp: Support for Phi-2 (github.com/ggerganov)
3 points by tosh on Dec 19, 2023 | past
Wchess (github.com/ggerganov)
4 points by tosh on Dec 14, 2023 | past
QMoE Support for Mixtral (github.com/ggerganov)
3 points by tosh on Dec 14, 2023 | past
Llama: Add Mixtral Support (github.com/ggerganov)
2 points by tosh on Dec 11, 2023 | past
Performance of Llama.cpp on Apple Silicon (github.com/ggerganov)
2 points by tosh on Nov 29, 2023 | past
Adjust VRAM/RAM Split on Apple Silicon (github.com/ggerganov)
1 point by tosh on Nov 29, 2023 | past | 1 comment
Running Llama.cpp on AWS Instances (github.com/ggerganov)
96 points by schappim on Nov 27, 2023 | past | 10 comments
(2) Apple Silicon Performance · ggerganov/llama.cpp · Discussion #4167 (github.com/ggerganov)
2 points by gavi on Nov 26, 2023 | past
Whisper.wasm (github.com/ggerganov)
4 points by tosh on Nov 13, 2023 | past
Llama on Mac M2 Ultra (Literally) (github.com/ggerganov)
1 point by behnamoh on Nov 10, 2023 | past
Talk-Llama (github.com/ggerganov)
474 points by plurby on Nov 2, 2023 | past | 140 comments
LLM quantization severely damages model quality and perplexity (github.com/ggerganov)
2 points by behnamoh on Oct 20, 2023 | past | 3 comments
gg: "M2 Ultra is the absolute best personal LLM inference node you can buy." (github.com/ggerganov)
8 points by behnamoh on Oct 12, 2023 | past
M2 Ultra can run 128 streams of Llama 2 7B in parallel (github.com/ggerganov)
268 points by behnamoh on Oct 11, 2023 | past | 173 comments
Llama.cpp Was Hacked in an Evening (github.com/ggerganov)
2 points by behnamoh on Oct 11, 2023 | past
I got llama.cpp and StarCoder – 1B to run on my P4 Retro PC (github.com/ggerganov)
1 point by vkaku on Oct 1, 2023 | past | 1 comment
llama.cpp now supports StarCoder model series (github.com/ggerganov)
6 points by wsxiaoys on Sept 18, 2023 | past | 1 comment
Llama.cpp speculative sampling: 2x faster inference for large models (github.com/ggerganov)
4 points by bobivl on Sept 5, 2023 | past | 1 comment
Speculative: PoC for speeding-up inference via speculative sampling by ggerganov (github.com/ggerganov)
16 points by kristianp on Sept 2, 2023 | past | 1 comment
Llama.cpp Supports Falcon Now (github.com/ggerganov)
2 points by gslin on Aug 25, 2023 | past
AMD ROCm Support Added to Llama.cpp (github.com/ggerganov)
4 points by irusensei on Aug 25, 2023 | past
New llama.cpp format GGUF now merged (github.com/ggerganov)
2 points by mchiang on Aug 21, 2023 | past
GPU Support to Ggml (github.com/ggerganov)
2 points by melenaboija on Aug 19, 2023 | past
Llama: Add grammar-based sampling (github.com/ggerganov)
417 points by davepeck on July 21, 2023 | past | 105 comments
Llama 2: poc for running 70B on CPU (github.com/ggerganov)
3 points by tosh on July 19, 2023 | past
LLama.cpp now has a web interface (github.com/ggerganov)
328 points by xal on July 5, 2023 | past | 49 comments
Llama.cpp now has a simple web UI for chat (github.com/ggerganov)
1 point by wsgeorge on July 4, 2023 | past

Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: