| | K-Quants (github.com/ggerganov) |
|
2 points by tosh on Dec 29, 2023 | past
|
| | CUDA: Faster Mixtral Prompt Processing (github.com/ggerganov) |
|
3 points by tosh on Dec 21, 2023 | past
|
| | Performance of llama.cpp on Apple Silicon A-series (github.com/ggerganov) |
|
100 points by mobilio on Dec 19, 2023 | past | 41 comments
|
| | Llama.cpp: Support for Phi-2 (github.com/ggerganov) |
|
3 points by tosh on Dec 19, 2023 | past
|
| | Wchess (github.com/ggerganov) |
|
4 points by tosh on Dec 14, 2023 | past
|
| | QMoE Support for Mixtral (github.com/ggerganov) |
|
3 points by tosh on Dec 14, 2023 | past
|
| | Llama: Add Mixtral Support (github.com/ggerganov) |
|
2 points by tosh on Dec 11, 2023 | past
|
| | Performance of Llama.cpp on Apple Silicon (github.com/ggerganov) |
|
2 points by tosh on Nov 29, 2023 | past
|
| | Adjust VRAM/RAM Split on Apple Silicon (github.com/ggerganov) |
|
1 point by tosh on Nov 29, 2023 | past | 1 comment
|
| | Running Llama.cpp on AWS Instances (github.com/ggerganov) |
|
96 points by schappim on Nov 27, 2023 | past | 10 comments
|
| | (2) Apple Silicon Performance · ggerganov/llama.cpp · Discussion #4167 (github.com/ggerganov) |
|
2 points by gavi on Nov 26, 2023 | past
|
| | Whisper.wasm (github.com/ggerganov) |
|
4 points by tosh on Nov 13, 2023 | past
|
| | Llama on Mac M2 Ultra (Literally) (github.com/ggerganov) |
|
1 point by behnamoh on Nov 10, 2023 | past
|
| | Talk-Llama (github.com/ggerganov) |
|
474 points by plurby on Nov 2, 2023 | past | 140 comments
|
| | LLM quantization severely damages model quality and perplexity (github.com/ggerganov) |
|
2 points by behnamoh on Oct 20, 2023 | past | 3 comments
|
| | gg: "M2 Ultra is the absolute best personal LLM inference node you can buy." (github.com/ggerganov) |
|
8 points by behnamoh on Oct 12, 2023 | past
|
| | M2 Ultra can run 128 streams of Llama 2 7B in parallel (github.com/ggerganov) |
|
268 points by behnamoh on Oct 11, 2023 | past | 173 comments
|
| | Llama.cpp Was Hacked in an Evening (github.com/ggerganov) |
|
2 points by behnamoh on Oct 11, 2023 | past
|
| | I got llama.cpp and StarCoder – 1B to run on my P4 Retro PC (github.com/ggerganov) |
|
1 point by vkaku on Oct 1, 2023 | past | 1 comment
|
| | llama.cpp now supports StarCoder model series (github.com/ggerganov) |
|
6 points by wsxiaoys on Sept 18, 2023 | past | 1 comment
|
| | Llama.cpp speculative sampling: 2x faster inference for large models (github.com/ggerganov) |
|
4 points by bobivl on Sept 5, 2023 | past | 1 comment
|
| | Speculative: PoC for speeding-up inference via speculative sampling by ggerganov (github.com/ggerganov) |
|
16 points by kristianp on Sept 2, 2023 | past | 1 comment
|
| | Llama.cpp Supports Falcon Now (github.com/ggerganov) |
|
2 points by gslin on Aug 25, 2023 | past
|
| | AMD ROCm Support Added to Llama.cpp (github.com/ggerganov) |
|
4 points by irusensei on Aug 25, 2023 | past
|
| | New llama.cpp format GGUF now merged (github.com/ggerganov) |
|
2 points by mchiang on Aug 21, 2023 | past
|
| | GPU Support to Ggml (github.com/ggerganov) |
|
2 points by melenaboija on Aug 19, 2023 | past
|
| | Llama: Add grammar-based sampling (github.com/ggerganov) |
|
417 points by davepeck on July 21, 2023 | past | 105 comments
|
| | Llama 2: poc for running 70B on CPU (github.com/ggerganov) |
|
3 points by tosh on July 19, 2023 | past
|
| | LLama.cpp now has a web interface (github.com/ggerganov) |
|
328 points by xal on July 5, 2023 | past | 49 comments
|
| | Llama.cpp now has a simple web UI for chat (github.com/ggerganov) |
|
1 point by wsgeorge on July 4, 2023 | past
|
|
|
More |