More

behnamoh · 2025-06-01T18:23:45 1748802225

What are some other companies like that?

The ones that have an MBA-driven anti-user mindset IMO:

    Apple, Google, Microsoft, Meta, Oracle, Adobe, Nvidia, AMD

behnamoh · 2025-06-01T01:19:46 1748740786

I think the pydantic library has something similar that involves validating streaming JSON from large language models.

behnamoh · 2025-05-30T21:13:49 1748639629

I agree. I lost hope in Apple's AI "efforts" when I found out that their MLX team (responsible for making a Pytorch/CUDA alternative for Apple Silicon) DOES NOT have access to the source code of ANE (Apple Neural Engine)!! Only the team behind Apple Intelligence uses ANE, not the MLX team.

Talk about fragmentation and lack of trust/hope in your product.

behnamoh · 2025-05-30T20:48:07 1748638087

Apple Intelligence had so much promise and so little delivery.

gherkinnn · 2025-05-30T21:46:55 1748641615

Let's see how they dig themselves out of this hole in WWDC25

behnamoh · 2025-05-30T21:56:59 1748642219

they fired Scott Forstall over less, I expect a number of their managers to be axed, including Tim himself.

behnamoh · 2025-05-30T14:01:07 1748613667

Is my understanding correct that Lisp's powerful macro system stems from the ability to write the eval function in Lisp itself? From what I gather, Lisp starts with a small set of primitives and special forms—seven in the original Lisp, including lambda. I recall Paul Graham demonstrating in one of his essays that you can build an eval function using just these primitives. Those primitives are typically implemented in a host language like C, but once you have an eval function in Lisp, you can extend it with new rules. The underlying C interpreter only sees the primitives, but as a programmer, you can introduce new syntax rules via eval. This seems like a way to understand macros, where you effectively add new language rules. I know Lisp macros are typically defined using specific keywords like defmacro, but is the core idea similar—extending the language by building on the eval function with new rules?

samth · 2025-05-30T14:05:08 1748613908

No, macros and eval are quite different. You can see this for example in Python or JavaScript, which have eval but not macros.

Y_Y · 2025-05-30T14:11:48 1748614308

You can make macros in Python: https://github.com/lihaoyi/macropy (note that that project was started for a class taught by Sussman)

There's also a PEP to make them first-class: https://peps.python.org/pep-0638/

sparkie · 2025-05-30T14:35:03 1748615703

That's a different meaning of first-class from Strachey's definition of a first-class citizen[1] - ie, one that can be passed as an argument, returned from a function, or assigned to a variable.

Syntactic macros are still second-class, like Lisp macros, but an improvement over text-replacement style macros.

For something macro-like which is first-class, there are fexprs[2] and operatives (from Kernel[3]) - these receive their operands verbatim, like macros, so they don't require quotation if we want to suppress evaluation. fexprs/Operatives can be passed around like any other value at runtime.

[1]:https://en.wikipedia.org/wiki/First-class_citizen

[2]:https://en.wikipedia.org/wiki/Fexpr

[3]:https://web.cs.wpi.edu/~jshutt/kernel.html

Y_Y · 2025-05-30T15:45:36 1748619936

Stratchey defined "first-class objects". This was by analogy with "first-class citizens" in a legal/political sense, since they are treated just as well as any other object and have no additional limitations. If we extend the analogy to syntax then I think it's clear enough that it means that it is a piece of syntax which is treated the same as any other and does not require special treatment or impose additional restrictions.

Thank you for the clarification and the additional information, I think having macros as first-class objects is a cool (but separate) idea.

matheusmoreira · 2025-05-30T21:09:06 1748639346

They aren't that different. Fexprs are essentially additional eval cases.

sparkie · 2025-05-30T15:23:21 1748618601

> Is my understanding correct that Lisp's powerful macro system stems from the ability to write the eval function in Lisp itself?

I wouldn't say this is the case. Nearly any language could implement an `eval` for itself, but obviously, it is much simpler in Lisps because there's little syntax and few rules.

What makes Lisp macros different from say, C preprocessor macros, is the body of the macro is just Lisp code - so the "preprocessor" in this case has full access to the host languages facilities, which may include `eval`. The macros don't take textual input, but they take structured input in the form of S-expressions.

Implementing macros is obviously simpler due to eval, because we need to run the evaluator on the macro body, but it's not a strict requirement, as macro functionality could be provided by the implementation and could encapsulate its own evaluator.

Lisp macros are also simple due to the fact that Lisp code is just lists of data - you don't have to navigate a complex AST to walk through the code and emit particular syntax. You walk through the input with `car` and `cdr`, and you emit new syntax with `cons` (or `list`/`list*` which are derived from it). Macros can take code as their argument, and produce new code which is evaluated in place.

Macros still have hygiene issues though, because they're based on expanding code before evaluating it, variables used in macros can accidentally shadow variables in the scope of the macro's caller. There are workarounds (gensym) to navigate these hygiene problems.

> From what I gather, Lisp starts with a small set of primitives and special forms—seven in the original Lisp, including lambda. I recall Paul Graham demonstrating in one of his essays that you can build an eval function using just these primitives.

This is largely a theoretical demonstration but not real-world usage. In practice, Lisps have dozens or hundreds of "primitives". Common Lisp in particular is a big language and not trivial to implement. Scheme is a bit smaller, though r6rs started to also grow quite large, but this approach was revisited in r7rs (current), which aims for a small core, with additional functionality being provided through SRFIs (Scheme requests for implementation).

> Those primitives are typically implemented in a host language like C, but once you have an eval function in Lisp, you can extend it with new rules.

Using Scheme as an example, some SRFIs can be implemented purely in Scheme, as libraries, but others require the implementation to provide support, which often requires writing C code to provide them.

> This seems like a way to understand macros, where you effectively add new language rules. I know Lisp macros are typically defined using specific keywords like defmacro

As you note, it's `defmacro`, or `macro`, or `syntax-rules`, `syntax-case`, etc, which introduce new syntax - not eval in particular. Some macros will use `eval` in their bodies, which permits control of evaluation other than the regular applicative form of lambdas.

Macros are more than just `eval`. They're a multi-stage evaluation model where we first need to do some `macroexpand` (which will internally use `eval`), and afterwards the the resulting expression from the macro call is evaluated.

> but is the core idea similar—extending the language by building on the eval function with new rules?

There are some Lisps which still attempt this kind of minimalism.

One example is Ian Piumarta's Maru[1], which support extending `eval` (and `apply`) with new functionality at runtime based on the type being evaluated. Maru basically has global maps of type->evaluator and type->applicator, where we can add new pairs to at runtime and augment the behavior of `eval`/`apply`.

Kernel[2] also aims for the minimalist approach and does away with macros, quote and special-forms entirely, instead replacing them with a more general feature called an operative. The Kernel evaluator does not need to implement special rules for things like `lambda`, `cond`, `car`, `cdr` (as in Graham's "On Lips" essay) - but it just discriminates two forms - operative or applicative. Obviously, some kinds of operative are "primitive", but there's no difference from the PoV of the programmer. Which set of symbols you decide to implement as primitive is up to the implementation. The Kernel report suggests a small set of primitives and demonstrates the remaining standard features can be implemented using only the functionality provided so far.

[1]:https://piumarta.com/software/maru/

[2]:https://web.cs.wpi.edu/~jshutt/kernel.html

kazinator · 2025-05-31T03:26:08 1748661968

The relationship is this: the ease of writing a meta-circular eval in Lisp, and the ease of writing macros are related by common causes.

behnamoh · 2025-05-29T00:13:08 1748477588

right, all benchmarks collapse once you go beyond 32K tokens. I've rarely seen any benchmarks focusing on long range, which is where most programming needs are at.

behnamoh · 2025-05-29T00:11:55 1748477515

> No sign of what source material it was trained on though right?

out of curiosity, does anyone do anything "useful" with that knowledge? it's not like people can just randomly train models..

marci · 2025-05-29T04:56:18 1748494578

When you're trully open source, you can make ethings like this:

Today we introduce OLMoTrace, a one-of-a-kind feature in the Ai2 Playground that lets you trace the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace is a manifestation of Ai2’s commitment to an open ecosystem – open models, open data, and beyond.

https://allenai.org/blog/olmotrace

kreijstal · 2025-05-29T06:22:14 1748499734

you can do these same, except you would need to be a pirate website. It would even be better. except illegal. but it would be better.

marci · 2025-05-29T11:41:28 1748518888

That is why the others can't provide stuff like this. RAG/Hallucination check. I just wish Allen.AI models had bigger context, 4k is too small nowadays.

ToValueFunfetti · 2025-05-29T00:16:55 1748477815

Would be useful for answering "is this novel or was it in the training data", but that's not typically what the point of open source is

anonymoushn · 2025-05-29T09:34:20 1748511260

If labs provided the corpus and source code for training their tokenizers, it would be a lot easier to produce results about tokenizers. As it is, they provide neither, so it is impossible to compare different algorithms running on the same data if you also want to include the vocabs that are commonly used.

m00x · 2025-05-29T05:51:26 1748497886

Many are speculating it was trained by o1/o3 for some of the initial reasoning.

fulafel · 2025-05-29T05:53:53 1748498033

Are there any widely used models that publish this? If not, then no I guess.

DANmode · 2025-05-29T08:43:40 1748508220

Depending on how you use "randomly", they absolutely can..?

behnamoh · 2025-05-29T00:10:56 1748477456

it's got more 'source' than whatever OpenAI provides for their models.

numpad0 · 2025-05-29T00:36:21 1748478981

less alcoholic beverages are fully alcoholic beverages

subscribed · 2025-05-29T16:36:37 1748536597

0.5% or 0.03% satisfy my "nonalcoholic" criteria.

> Studies have found ethanol levels in commercial apple juice ranging from 0.06 to 0.66 grams per liter, with an average around 0.26 grams per liter[1]

Even apple juice is an alcoholic drink if you push your criteria to absurdity.

[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC5421578/

fragmede · 2025-05-29T01:00:07 1748480407

but they're not bleach, and no amount of adding or removing alcohol can transmute the alcohol into something else.

stavros · 2025-05-29T00:19:38 1748477978

No it doesn't, it has exactly the same source, zero. It has more downloadable binary.

Aeolun · 2025-05-29T04:44:41 1748493881

That’s the ‘source’ for what the model spits out though, if not the source for what spits out the model.

prmoustache · 2025-05-29T05:10:49 1748495449

It is just freeware, not open source.

stavros · 2025-05-29T22:49:28 1748558968

The "source" for something is all the stuff that makes you able to build and change that something. The source for a model is all the stuff that makes you able to train and change the model.

Just because the model produces stuff doesn't mean that's the model's source, just like the binary for a compiler isn't the compiler's source.

behnamoh · 2025-05-29T00:09:57 1748477397

they responded to my tweet last year and said they didn't quantize the models.

boroboro4 · 2025-05-29T00:19:17 1748477957

It's very hard to find right now but I'm sure they said they don't quantize KV cache, but their weights are in fp8.

behnamoh · 2025-05-29T00:08:43 1748477323

> 1.58bit quantization

of course we can run any model if quantize it enough. but I think the OP was talking about the unquantized version.

danielhanchen · 2025-05-29T01:14:08 1748481248

Oh you can still run them unquantized! See https://docs.unsloth.ai/basics/llama-4-how-to-run-and-fine-t... where we show you can offload all MoE layers to system RAM, and leave non MoE layers on the GPU - the speed is still pretty good!

You can do it via `-ot ".ffn_.*_exps.=CPU"`

behnamoh · 2025-05-29T03:09:22 1748488162

Thanks, I'll try it! I guess "mixing" GPU+CPU would hurt the perf tho.