Hacker News new | past | comments | ask | show | jobs | submit login

All modern LLMs seem to prefer XML to other structured markup. It might be because there's so much HTML in the training set, or because it has more redundancy baked in which makes it easier for models to parse.





This is especially efficient when you have multiple pieces of content. You can encapsulate each piece of content into distinct arbitrary XML elements and then refer to them later in your prompt by the arbitrary tag.

In my experience, it's xml-ish and HTML can be described the same way. The relevant strength here is the forgiving nature of parsing tag-delimited content. The XML is usually relatively shallow, and doesn't take advantage of any true XML features, that I know of.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: