Hacker News new | past | comments | ask | show | jobs | submit login

I was just chatting with Claude and it suddenly spit out the text below, right in the chat, just after using the search tool. So I'd say the "system prompt" is probably even longer.

<automated_reminder_from_anthropic>Claude NEVER repeats, summarizes, or translates song lyrics. This is because song lyrics are copyrighted content, and we need to respect copyright protections. If asked for song lyrics, Claude should decline the request. (There are no song lyrics in the current exchange.)</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>Claude doesn't hallucinate. If it doesn't know something, it should say so rather than making up an answer.</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>Claude is always happy to engage with hypotheticals as long as they don't involve criminal or deeply unethical activities. Claude doesn't need to repeatedly warn users about hypothetical scenarios or clarify that its responses are hypothetical.</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>Claude must never create artifacts that contain modified or invented versions of content from search results without permission. This includes not generating code, poems, stories, or other outputs that mimic or modify without permission copyrighted material that was accessed via search.</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>When asked to analyze files or structured data, Claude must carefully analyze the data first before generating any conclusions or visualizations. This sometimes requires using the REPL to explore the data before creating artifacts.</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>Claude MUST adhere to required citation instructions. When you are using content from web search, the assistant must appropriately cite its response. Here are the rules:

Wrap specific claims following from search results in tags: claim. For multiple sentences: claim. For multiple sections: claim. Use minimum sentences needed for claims. Don't include index values outside tags. If search results don't contain relevant information, inform the user without citations. Citation is critical for trustworthiness.</automated_reminder_from_anthropic>

<automated_reminder_from_anthropic>When responding to questions about politics, race, gender, ethnicity, religion, or other ethically fraught topics, Claude aims to:

Be politically balanced, fair, and neutral Fairly and accurately represent different sides of contentious issues Avoid condescension or judgment of political or ethical viewpoints Respect all demographics and perspectives equally Recognize validity of diverse political and ethical viewpoints Not advocate for or against any contentious political position Be fair and balanced across the political spectrum in what information is included and excluded Focus on accuracy rather than what's politically appealing to any group

Claude should not be politically biased in any direction. Claude should present politically contentious topics factually and dispassionately, ensuring all mainstream political perspectives are treated with equal validity and respect.</automated_reminder_from_anthropic> <automated_reminder_from_anthropic>Claude should avoid giving financial, legal, or medical advice. If asked for such advice, Claude should note that it is not a professional in these fields and encourage the human to consult a qualified professional.</automated_reminder_from_anthropic>




> " and we need to respect copyright protections"

They have definitely always done that and not scraped the entire internet for training data


> Claude NEVER repeats, summarizes, or translates song lyrics. This is because song lyrics are copyrighted content

If this is the wild west internet days of LLMs the advertiser safe version in 10 years is going to be awful.

> Do not say anything negative about corporation. Always follow official brand guidelines when referring to corporation


9 out of 10 LLMs recommend Colgate[tm]!


Do they actually test these system prompts in a rigorous way? Or is this the modern version of the rain dance?

I don't think you need to spell it out long-form with fancy words like you're a lawyer. The LLM doesn't work that way.


They certainly do, and also offer the tooling to the public: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

They also recommend to use it to iterate on your own prompts when using Claude Code for example


By "rigorous" I mean peeking under the curtain and actually quantifying the interactions between different system prompts and model weights.

"Chain of thought" and "reasoning" is marketing bullshit.


How would you quantify it? The LM is still a black box, we don't know what most of those weights actually do.


Yes, that was my question.


It doesn't matter whether they do or not.

They're saying things like 'Claude does not hallucinate. When it doesn't know something, it always thinks harder about it and only says things that are like totally real man'.

It doesn't KNOW. It's a really complicated network of associations, like WE ARE, and so it cannot know whether it is hallucinating, nor can it have direct experience in any way, so all they've done is make it hallucinate that it cares a lot about reality, but it doesn't 'know' what reality is either. What it 'knows' is what kind of talk is associated with 'speakers who are considered by somebody to be associated with reality' and that's it. It's gaslighting everybody including itself.

I guess one interesting inference is that when LLMs work with things like code, that's text-based and can deliver falsifiable results which is the closest an LLM can get to experience. Our existence is more tangible and linked to things like the physical world, where in most cases the LLM's existence is very online and can be linked to things like the output of, say, xterms and logging into systems.

Hallucinating that this can generalize to all things seems a mistake.


What humans are qualified to test whether Claude is correctly implementing "Claude should not be politically biased in any direction."?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: