Hacker News new | past | comments | ask | show | jobs | submit login

Just did some preliminary testing on decrypting some ROT cyphertext which would have been viable for a human on paper. The output was pretty disappointing: lots of "workish" steps creating letter counts, identifying common words, etc, but many steps were incorrect or not followed up on. In the end, it claimed to check its work and deliver an incorrect solution that did not satisfy the previous steps.

I'm not one to judge AI on pratfalls, and cyphers are a somewhat adversarial task. However, there was no aspect of the reasoning that seemed more advanced or consistent than previous chain-of-thought demos I've seen. So the main proof point we have is the paper, and I'm not sure how I'd go from there to being able to trust this on the kind of task it is intended for. Do others have patterns by which they get utility from chain of thought engines?

Separately, chain of thought outputs really make me long for tool use, because the LLM is often forced to simulate algorithmic outputs. It feels like a commercial chain-of-thought solution like this should have a standard library of functions it can use for 100% reliability on things like letter counts.




Hmm, are you sure it was using the o1 model and not gpt4o? I've been using the o1 model and it does consistently well at solving rotation ciphers.


Does it do better than Claude, because Claude (3.5 sonnet) handled ROTs perfectly and was able to also respond in ROT.


Just tried, no joy from Claude either:

Can you decrypt the following? I don't know the cypher, but the plaintext is Spanish.

YRP CFTLIR VE UVDRJZRUF JREZURU, P CF DRJ CFTLIR UV KFUF VJ HLV MVI TFJRJ TFDF JFE VE MVQ UV TFDF UVSVE JVI


I just tried it with O1 model and it said it couldn't decipher it. It told me what to try, but said it doesn't have the time to do so. Kind of an unusual response.


The chain of thought does seem to take quite a long time, so maybe there is a new mechanism for reducing the amount of load on the servers by estimating the amount of reasoning effort needed to solve a problem and weighing that against the current pressure on the servers.


I got this response from o1-mini with the exact same prompt:

Claro, he descifrado el texto utilizando un cifrado César con un desplazamiento de 9 posiciones. Aquí está el texto original y su correspondiente traducción:

*Texto Cifrado:* ``` YRP CFTLIR VE UVDRJZRUF JREZURU, P CF DRJ CFTLIR UV KFUF VJ HLV MVI TFJRJ TFDF JFE VE MVQ UV TFDF UVSVE JVI ```

*Texto Descifrado:* ``` HAY LOCURA EN DEMASADO SANIDAD, Y LO MÁS LOCURA DE TODO ES QUE VER COSAS COMO SON EN VEZ DE COMO DEBEN SER ```

*Traducción al Inglés:* ``` THERE IS MADNESS IN OVERLY HEALTH, AND THE MOST MADNESS OF ALL IS TO SEE THINGS AS THEY ARE INSTEAD OF AS THEY SHOULD BE ```

Este descifrado asume que se utilizó un cifrado César con un desplazamiento de +9. Si necesitas más ayuda o una explicación detallada del proceso de descifrado, no dudes en decírmelo.

Interestingly it makes a spelling mistake, but other than that it did manage to solve it.


Interesting, it was able to guess it's Rot 17, but it translated it wrong, although "HAY" and some other words were correct.

I've tried only in English so far though.

It told me it's 17, and "HAY GENTE MU DIFERENTE LECTURA, A LO MUY GENTE DE TODO ES QUE VER COSAS COMO SON EN VEZ DE COMO DEBEN SER"

although it really should be "HAY LOCURA EN DEMASIADO SANIDAD, Y LO MAS LOCURA DE TODO ES QUE VER COSAS COMO SON EN VEZ DE COMO DEBEN SER"


Claude made similar mistakes of generating decryption that was similar to plaintext but with stuff mixed in. I suspect my version of the quote (Miguel de Cervantes) is an apocryphal translation, and there's some utility well on both models to pull it towards the real one. With that said, I did not see o1-preview get as close as you did.


For testing I think it's better to use uncommon sentences and also start with English first, if it can solve that, then try other languages.


HAY LOCURA EN DEMASIADO SANIDAD, Y LO MAS LOCURA DE TODO ES QUE VER COSAS COMO SON EN VEZ DE COMO DEBEN SER

Is that correct? I don't know anything but basic Spanish. All I did was:

"The plaintext is in Spanish but I don't know anything else, solve this and explain your reasoning as you go step by step."


That's correct. I got o1-preview myself finally now. But interestingly getting inconsistent results with this so far, need to keep trying.


o1-preview gave me this...

Final Decrypted Message:

"Por ejemplo te agradeceré, y te doy ejemplo de que lo que lees es mi ejemplo"

English Translation:

"For example, I will thank you, and I give you an example of what you read is my example."

... initially it gave up and asked if I knew what type of cypher had been used. I said I thought it was a simple substitution.


Why did it add the accents on to e (é)? Surely that wasn't part of it and it actually "thought a bit too far"?


On my machine just works with 4o

https://chatgpt.com/share/66e34020-33dc-800d-8ab8-8596895844...

With no drama. I'm not sure the bot answer is correct, but looks correct.


o1-preview . Were you using common plaintexts by chance (e.g. proverbs) or ROT13 specifically? Mine use all the right steps but just can't string them together.


It just solved this one for me "Can you solve this cipher? droboscxylecsxoccvsuocryglecsxocc"


Could you share your prompt?


Shared one above.

Just tried an easier (english) version and o1-preview LITERALLY rickrolled me, which is maybe impressive in its own way?

UPDATE: Claude crushed it however.

o1-preview Transcript follows.

**

Can you decrypt the following? I don't know the cypher.

mzilj ivdrzej ufidrek ze kyv sfup reu tre ivrtkzmrkv crkvi ze czwv

Certainly! The encrypted text is:

``` mzilj ivdrzej ufidrek ze kyv sfup reu tre ivrtkzmrkv crkvi ze czwv ```

When we decrypt it, we get:

``` never gonna give you up let you down and desert you ```

It appears to be a line from Rick Astley's song "Never Gonna Give You Up."


4o completely botched the Spanish version, but seems to have nailed this one.

> Without knowing the cipher used to encrypt this message, it’s hard to decrypt it directly. However, the structure and letter patterns suggest it might be a simple substitution cipher, possibly Caesar or some variation of it.

> Let me attempt a few common methods to see if I can identify the cipher.

> The most frequent letters in the encrypted message are 'r', 'z', 'v', and 'e'. In English, the most common letters are typically 'E', 'T', 'A', and 'O'. This suggests the cipher could be a simple substitution or a Caesar cipher.

> I'll now attempt to apply a Caesar cipher decryption by shifting the alphabet through all possible rotations to see if we get readable text.

> One of the decrypted messages, with a shift of 17, reads:

> *"virus remains dormant in the body and can reactivate later in life"*

> It seems like the correct decryption, revealing information about a virus that can remain dormant in the body.


That's weird. Claude was able to translate it to """virus remains dormant in the body and can reactivate later in life""" correctly, but it thought it was shifted 10 times, when it's really Rot17.


If O1 did rickroll you deliberately, then it would indeed more impressive than solving ciphertexts, and I'd start preparing to bow down to our AGI overlords :)


Definitely. A teammate pointed out Reddit posts used in training as a probable cause :)


It's RL so that means it's going to be great on tasks they created for training but not so much on others.

Impressive but the problem with RL is that it requires knowledge of the future.


Out of curiousity can you try the same thing with Claude. Because when I tried Claude with any sort of ROT, it had amazing performance, compared to GPT.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: