More

jxmorris12 · 2025-11-24T20:20:17 1764015617

Keep typing.

gus_massa · 2025-11-24T21:38:40 1764020320

Only real words!!!

At the top it says how many boards are solved. I'd coppy that info at the bottom so it's easier to find.

gus_massa · 2025-11-24T23:22:22 1764026542

95/1005 :)

It shoulb be always solvable in 26×5=130.

Feature requests:

Important! Save the state in a cockie, in case the user refresh the page. (Don't ask.)

Perhaps make the header always visible.

Perhaps make the new word always visible.

Perhaps show a star moving across the screen for every word that is solved.

jxmorris12 · 2025-10-21T18:41:24 1761072084

It's great that people are starting to take continual learning seriously, and it seems like Jessy has been thinking about LLMs and continual learning longer than almost anyone.

I especially like this taxonomy

> I think of continual learning as two subproblems:

> Generalization: given a piece of data (user feedback, a piece of experience, etc.), what update should we do to learn the “important bits” from that data?

> Forgetting/Integration: given a piece of data, how do we integrate it with what we already know?

My personal feeling is that generalization is a data issue: given a datapoint x, what are all the examples in the distribution of things that can be inferred from x? Maybe we can solve this with synthetic datagen. And forgetting might be solvable architecturally, e.g. with Cartridges (https://arxiv.org/abs/2506.06266) or something of that nature.

jxmorris12 · 2025-08-30T00:05:11 1756512311

Matryoshka embeddings are not sparse. And SPLADE can scale to tens or hundreds of thousands of dimensions.

faxipay349 · 2025-09-01T02:27:57 1756693677

Yeah, the standard SPLADE model trained from BERT typically already has a vocabulary/vector size of 30,552. If the SPLADE model is based on a multilingual version of BERT, such as mBERT or XLM-R, the vocabulary size could inherently expand to approximately 100,000, as does the vector size.

CuriouslyC · 2025-08-30T00:26:37 1756513597

If you consider the actual latent space the full higher dimensional representation, and you take the first principle component, the other vectors are zero. Pretty sparse. No it's not a linked list sparse matrix. Don't be a pedant.

yorwba · 2025-08-30T06:55:54 1756536954

When you truncate Matryoshka embeddings, you get the storage benefits of low-dimensional vectors with the limited expressiveness of low-dimensional vectors. Usually, what people look for in sparse vectors is to combine the storage benefits of low-dimensional vectors with the expressiveness of high-dimensional vectors. For that, you need the non-zero dimensions to be different for different vectors.

zwaps · 2025-08-30T06:44:58 1756536298

No one means Matryoshka embeddings when they talk about sparse embeddings. This is not pedantic.

CuriouslyC · 2025-08-30T06:48:31 1756536511

No one means wolves when they talk about dogs, obviously wolves and dogs are TOTALLY different things.

cap11235 · 2025-08-30T13:56:20 1756562180

jxmorris12 · 2025-08-11T03:31:22 1754883082

It seems to be an error with the classifier. Sorry everyone. I probably shouldn't have posted that graph; I knew it was buggy, I just thought that the Perl part might be interesting to people.

Here's a link to the model if you want to dive deeper: https://huggingface.co/philomath-1209/programming-language-i...

jxmorris12 · 2025-08-11T03:30:42 1754883042

Hi again. I had already written about this later in my blog post (which is unrelated to this thread), but the point was that RLHF hadn't been applied to language models at scale until InstructGPT. I edited the post just now to clarify this. Thanks for the feedback!

jxmorris12 · 2025-07-17T21:27:29 1752787649

Whoops. I hope you can overlook this minor logical error.

streptomycin · 2025-07-17T21:51:15 1752789075

Oh yeah it's absolutely an interesting article!

jxmorris12 · 2025-07-17T19:55:29 1752782129

I don't think architecture matters. It seems to be more a function of the data somehow.

I once saw a LessWrong post claiming that the Platonic Representation Hypothesis doesn't hold when you only embed random noise, as opposed to natural images: http://lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-p...

SJC_Hacker · 2025-07-18T00:05:33 1752797133

Its a bit like saying algorithms don't matter for solving computational problems. Two different algorithms might produce equivalent results but if you have to wait years for an answer, when seconds matter, the slow algorithm isn't helpful.

I believe the current approach of using mostly a feed-forward in the inference stage, with well-filtered training data and backpropagation for discrete "training cycles" has limitations. I know this has been tried in the past, but something modelling how animal brains actually function, with continuous feedback, no explicit "training" (we're always being trained), might be the key.

Unfortunately our knowledge of "whats really going on" in the brains is still limited, investigative methods are crude as the brain is difficult to image at the resolution we need, and in real time. Last I checked no one's quite figured out how memory works, for example. Whether its "stored in the network" somehow through feedback (like a SR-latch or flip-flop in electronics) or whether there's some underlying chemical process within the neuron itself (we know that chemicals definitely regulate brain function, don't know how much it goes the other way and it can be used to encode state)

blibble · 2025-07-17T20:39:15 1752784755

> I don't think architecture matters. It seems to be more a function of the data somehow.

of course it matters

if I supply the ants in my garden with instructions on how to build tanks and stealth bombers they're still not going to be able to conquer my front room

jxmorris12 · 2025-07-13T20:05:40 1752437140

I recently wrote a post about scaling RL that has some similar ideas:

> How to Scale RL to 10^26 FLOPs (blog.jxmo.io/p/how-to-scale-rl-to-1026-flops)

The basic premise behind both essays is that for AI to make another big jump in capabilities, we need to find new data to train on.

My proposal was reusing text from the Internet and doing RL on next-token prediction. The linked post here instead suggests doing 'replication training', which they define as "tasking AIs with duplicating existing software products, or specific features within them".

jxmorris12 · 2025-06-25T15:49:01 1750866541

My bad. Do they make this for Chrome?

mrguyorama · 2025-06-25T20:09:08 1750882148

Only until it affects Google's bottom line.

WorldPeas · 2025-06-25T23:03:10 1750892590

I believe brave has this functionality, and an adblocker

jxmorris12 · 2025-05-21T21:22:03 1747862523

Yes, you can definitely convert the outputs from one model to the space of another, and then use them.