> One coulld hire a software developer to write such a program. But, in general, software developers can be untrustworthy and prone to stealing ideas for their own selfish purposes.
Ehhhh? Yes there are examples of that, as there are for any arbitrary group of humans you could select, but [anecdotally] I've noticed the opposite... it's not uncommon to find a passionate developer that's only interested in the challenge/problem solving aspect - it's a lot less common for say.. real estate agents.
I don't really get the point you're making beyond "people be greedy sometimes" (which I do agree with, don't get me wrong).
> The "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.
Then don't give it your API keys? Surely there's better ways to solve this (like an MCP API gateway)?
I'll preface this comment with: I am a recent startup owner (so only dev, which is important) and my entire codebase has been generated via Sonnet (mostly 3.7, now using 4.0). If you actually looked at the work I'm (personally) producing, I guess I'm more of a product-owner/project-manager as I'm really just overseeing the development.
> I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts.
There's a few crucial steps to make an LLM-generated app maintainable (by the LLM):
- _have a very, very strong SWE background_; ideally as a "strong" Lead Dev, _this is critical_
- your entire workflow NEEDS to be centered around LLM-development (or even model-specific):
- use MCPs wherever possible and make sure they're specifically configured for your project
- don't write "human" documentation; use rule + reusable prompt files
- you MUST do this in a *very* granular but specialized way; keep rules/prompts very small (like you would when creating tickets)
- make sure rules are conditionally applied (using globs); do not auto include anything except your "system rules"
- use the LLM to generate said prompts and rules; this forces consistency across prompts, very important
- follow a typical agile workflow (creating epics, tickets, backlogs etc)
- TESTS TESTS AND MORE TESTS; add automated tools (like linters) EVERYWHERE you can
- keep your code VERY modular so the LLM can keep a focused context, rules should provide all key context (like the broader architecture); the goal is for your LLM to only need to read or interact with files related to the strict 'current task' scope
- iterating on code is almost always more difficult than writing it from scratch: provided your code is well architected, no single rewrite should be larger than a regular ticket (if the ticket is too large then it needs to be split up)
This is off the top of my head so it's pretty broad/messy but I can expand on my points.
LLM-coding requires a complete overhaul of your workflow so it is tailored specifically to an LLM, not a human, but this is also a massive learning curve (that take's a lot of time to figure out and optimize). Would I bother doing this if I were still working on a team? Probably not, I don't think it would've saved me much time in a "regular" codebase. As a single developer at a startup? This is the only way I've been able to get "other startup-y" work done while also progressing the codebase - the value of being able to do multiple things at a time, let the LLM and intermittently review the output while you get to work on other things.
The biggest tip I can give: LLMs struggle at "coding like a human" and are much better at "bad-practice" workflows (e.g. throwing away large parts of code in favour of a total rewrite) - let the LLM lead the development process, with the rules/prompts as guardrails, and try stay out of it's way while it works (instead of saying "hey X thing didn't work, go fix that now") - hold its hand but let it experiment before jumping in.
This document outlines the standardized approach to ticket management in the <redacted> project. All team members should follow these guidelines when creating, updating, or completing tickets.
## Ticket Organization
Tickets are organized by status and area in the following structure:
TICKETS/
COMPLETED/ - Finished tickets
BACKEND/ - Backend-related tickets
FRONTEND/ - Frontend-related tickets
IN_PROGRESS/ - Tickets currently being worked on
BACKEND/
FRONTEND/
BACKLOG/ - Tickets planned but not yet started
BACKEND/
FRONTEND/
## Ticket Status Indicators
All tickets must use consistent status indicators:
- *BACKLOG* - Planned but not yet started
- *IN_PROGRESS* - Currently being implemented
- *COMPLETED* - Implementation is finished
- *ABANDONED* - Work was stopped and will not continue
## Required Ticket Files
Each ticket directory must contain these files:
1. *Main Ticket File* (TICKET_.md):
- Problem statement and background
- Detailed analysis
- Implementation plan
- Acceptance criteria
1. Create tickets in the appropriate BACKLOG directory
2. Use standard templates from .templates/ticket_template.md
3. Set status to *Status: BACKLOG*
4. Update the TICKET_INDEX.md file
### Updating Tickets
1. Move tickets to the appropriate status directory when status changes
2. Update the status indicator in the main ticket file
3. Update the "Last Updated" date when making significant changes
4. Document progress in IMPLEMENTATION_PROGRESS.md
5. Check off completed tasks in IMPLEMENTATION_PLAN.md
### Completing Tickets
1. Ensure all acceptance criteria are met
2. Move the ticket to the COMPLETED directory
3. Set status to *Status: COMPLETED*
4. Update the TICKET_INDEX.md file
5. Create a completion summary in the main ticket file
### Abandoning Tickets
1. Document reasons for abandonment
2. Move to COMPLETED/ABANDONED directory
3. Set status to *Status: ABANDONED*
4. Update the TICKET_INDEX.md file
## Ticket Linking
When referencing other tickets, use relative links with appropriate paths:
markdown
@TICKET_NAME
Ensure all links are updated when tickets change status.
## Ticket Cleanup and Streamlining
### When to Streamline Tickets
Tickets should be streamlined and cleaned up at major transition points to maintain focus on remaining work:
1. *Major Phase Transitions* - When moving between phases (e.g., from implementation to testing)
2. *Milestone Achievements* - After completing significant portions of work (e.g., 80%+ complete)
3. *Infrastructure Readiness* - When moving from setup/building to operational phases
4. *Team Handoffs* - When different team members will be taking over the work
### What to Streamline
*Replace Historical Implementation Details With:*
- Brief completed tasks checklist ( high-level achievements)
- Current status summary
- Forward-focused remaining work
*Remove or Simplify:*
- Detailed session-by-session progress logs
- Extensive implementation decision histories
- Verbose research findings documentation
- Historical status updates and coordination notes
### Why Streamline Tickets
1. *Git History Preservation* - All detailed progress, decisions, and implementation details are preserved in git commits
2. *Clarity for Future Work* - Makes it easier to quickly understand "what needs to be done next"
3. *Team Efficiency* - Anyone picking up the work can immediately see current state and next steps
4. *Maintainability* - Shorter, focused tickets are easier to read, understand, and keep updated
### How to Streamline
1. *Archive Detailed Progress* - Historical implementation details are preserved in git history
2. *Create Completion Summary* - Replace detailed progress with a brief "What's Complete" checklist
3. *Focus on Remaining Work* - Make current and future phases the primary content
4. *Update Status Sections* - Keep status concise and action-oriented
5. *Preserve Essential Context* - Keep architectural decisions, constraints, and key requirements
*Goal*: Transform tickets from "implementation logs" into "actionable work plans" while preserving essential context.
## Maintenance Requirements
1. Keep the TICKET_INDEX.md file up to date
2. Update "Last Updated" dates when making significant changes
3. Ensure all ticket files follow the standardized format
4. Include links between related tickets in both directions
## Complete Documentation
For detailed instructions on working with tickets, refer to:
- @Ticket Workflow Guide
- @Ticket Index
- @Tickets README
I've been using `claude-4-sonnet` for the last few hours - haven't been able to test `opus` yet as it's still overloaded - but I have noticed a massive improvement so far.
I spent most of yesterday working on a tricky refactor (in a large codebase), rotating through `3.7/3.5/gemini/deepseek`, and barely making progress. I want to say I was running into context issues (even with very targeted prompts) but 3.7 loves a good rabbit-hole, so maybe it was that.
I also added a new "ticketing" system (via rules) to help it's task-specific memory, which I didn't really get to test it with 3.7 (before 4.0 came out), so unsure how much of an impact this has.
Using 4.0, the rest of this refactor (est. 4~ hrs w/ 3.7) took `sonnet-4.0` 45 minutes, including updating all of the documentation and tests (which normally with 3.7 requires multiple additional prompts, despite it being outlined in my rules files).
The biggest differences I've noticed:
- much more accurate/consistent; it actually finishes tasks rather than telling me it's done (and nothing working)
- less likely to get stuck in a rabbit hole
- stopped getting stuck when unable to fix something (and trying the same 3 solutions over-and-over)
- runs for MUCH longer without my intervention
- when using 3.7:
- had to prompt once every few minutes, 5 - 10mins MAX if the task was straight forward enough
- had to cancel the output in 1/4 prompts as it'd get stuck in the same thought-loops
- needed to restore from a previous checkpoint every few chats/conversations
- with 4.0:
- ive had 4 hours of basically one-shotting everything
- prompts run for 10 mins MIN, and the output actually works
- is remembering to run tests, fix errors, update docs etc
Obviously this is purely anecdotal - and, considering the temperament of LLMS, maybe I've just been lucky and will be back to cursing at it tomorrow, but imo this is the best feeling model since 3.5 released.
Is Copilot _enforced_ as the only option for an AI coding agent? Or can devs pick-and-choose whatever tool they prefer
I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.
(and if anybody has similar data for _any_ company developing their own agent, please shout out).
I switched to them a few months ago, I was previously using duckduckgo (and Google before that). As most of you have probably noticed Google search results have seriously dropped in quality the last few years, but especially in 2023. I'm no longer able to get meaningful results for almost any topic, especially if it's technical, the only results are AI generated (?) / obvious SEO spam websites. It takes me multiple different search terms and clicking through multiple results to find anything semi relevant, and even then it's a shallow article maybe summarising what I'm looking for. Unfortunately DDG seems to be going the same way.
Whereas Kagi reminds me of the 'old' google search. The results are meaningful and relevant, not diluted with pages of generic article results. They also offer a lot of great customisation options like being able to block or boost certain sites in results. They have some built in lists for common filler sites. I can't comment on the AI variation but I hear that's progressing well.
I wouldn't call myself a power user of Kagi, but even then I'm getting far better results than other search engines, definitely worth the price per month.
I'm not affiliated with them in any way, just thought I'd share my anecdotal experience.
> I wouldn't call myself a power user of Kagi, but even then I'm getting far better results than other search engines, definitely worth the price per month.
This only works as long as Kagi is a niche. The moment any search engine becomes commonplace I think they will inevitably succumb to SEO. Otherwise, they would have to change their methodologies every once in a while to completely flip the ecosystem.
I think it also has to do with incentives. If your business model is selling ads then you have a balancing act between user and customer satisfaction.
With Kagi as I understand it, the customer is the user since it’s a premium product that isn’t selling ads. There’s really no good reason for them not to just nuke bad actors.
Not necessarily. You'll still be able to nuke the whole domain from your results, permanently. That means you'll see the spam once, and getting a new domain promoted to the top takes time and effectively money.
I also hope that domains which get blocked by lots of people will get reviewed for global downranking, but I don't think that's happening yet?
I managed to filter that, geeksforgeeks.org and towardsdatascience.com out with Kagi. It's quite helpful being able to slightly reduce prioritization on a per site basis so that instead of showing up as top result it'll be buried a bit but still accessible.
uBlacklist can only block sites. But Kagi can raise or lower sites in pagerank, and can pin sites on the top. Boosting sites up in the result is more efficient than blocking spam sites one by one.
> The moment any search engine becomes commonplace I think they will inevitably succumb to SEO
Thankfully when I come across a irrelevant domain in Kagi I can just remove it from any future search results completely. If enough people do that, it may show up on the "most commonly removed" list inviting others to also ax it.
I rarely ever have an issue with spam on Kagi just by largely using the standard filters, and I'm confident this will remain the case.
And unrelated but I really like that I can redirect all reddit urls in search results to old.reddit.com, twitter to nitter etc. very helpful in searching on mobile.
There is a good chance that it will remain niche due to the paid and forced-login model. This is a good thing. I hope they will manage to position themselves well as an alternative search engine with clean, unmanipulated results; and be careful about unhealthy (greedy) growth.
SEO should be called "GEO", it's google optimization. Spam keyword blogsites only work because google prioritizes that stuff. They're driven by ad revenue so they're incentivized to show commerical sites over non commercial ones, etc. ,etc, etc.,etc.
hopefully what will happen is no single search engine will be dominant, ensuring that problem can't happen (we'll probably have other problems instead)
They are the biggest search engine; every SEO trick, every spam attack is spearheaded against them. But also being the biggest and the inevitable, they can afford to blunt their search tool somehow in order to show more lucrative sort-of-hits and sell more ads. A moral hazard to do such a thing is always present fr any market-dominating player.
Kagi, in comparison, is tiny, and almost nobody cares to attack their algorithms. Back in 1990s, when Macs were a small minority in the PC-dominated world, they were the safest desktop machines, because almost nobody cared to write malware for them. Now that Macs are a sizable segment of computers in hands of important people, they are targeted by malware all right.
> every SEO trick, every spam attack is spearheaded against them.
Sure, but also they're ignoring extremely basic issues. "every SEO trick" is one thing, "just copy the SO content and still get ranked on the first page" is them not caring. We can worry about them dealing with the complex issues after they address the low hanging fruit.
I've been curious for a while too and I've been trying to de-google myself a tiny bit each year (more or less dropped Chrome in 2023).
Once I actually grab a full time job again I wouldn't mind grabbing my own subscription here to try it out. I'm curious if 300 searches/month is truly enough for me, though. And what would happen if I go over that rate. Am I simply unable to search more for that month?
Fwiw, I initially burned through the free searches in a few days, so definitely not enough IMO. Add the fact that free searches never got refreshed for my account, and I was pretty much unable to properly test the service for months. But bangs still work after the limit, thus I kept it as default given that I heavily use bangs to search other services.
Still I ended up subscribing, and after properly testing, I can recommend. The service is good, the blacklist feature is essential to me now; is just that the free tier is shit.
Yeah, I found the opposite for me, as I expected. I did a little under 400 queries in the last 30 days. I could definitely cut down a lot of redundant or simple searches to get under 400, but given how ubiquitous it is for me to simply so random questions (or simply search around a lot for documentation via search engine) I'd rather not have to worry about it.
On top of that, this is during a month without any job (where I'd search even more on the clock). I hear it's 1.5 cents per query over but I can imagine doing 600+ searches once I'm employed again.
> I was surprised to find that I’m consistently nowhere near 300
Per month?
My current Kagi searches from 3rd of January until today sits at 1256 searches. For sure I'd do 300 searches in a week, and on a particularly hairy day I might do it in a day.
> 3rd of January until today sits at 1256 searches
1256 / 23 (days between today and Jan 3rd) = 54.6 searches on average per day.
Some days higher, some lower. Sometimes it can take a couple of tries to get the search right, so you do 5-10 searches in one minute maybe. Doesn't seem farfetched to me.
Does it have an option to exclude commercial websites? That'd be quite useful to me. Pretty much every time I try to find information about a product, all I find are sites trying to sell it to me (but I already have it and want to find information about it, damn it!).
Another Kagi user here, yes, the customization of results is way better than any other search engine I've used. Eg, personalization can be manually set to lower or raise weight of results from specific domains. This has become extremely useful to not only filter out bad sites, but increase relevant results when you regularly get information from sites like GitHub etc.
Stats are released about these as well so you can easily copy heavy used fiters [0].
If they do/done a user survey, it would be interesting to see where all paying users are coming from. My guess is that a substantial amount of users come from hearing about Kagi on HN or in HN comments.
Iirc there is an exclude (or at the very least, weights), though you'd have to do it by hand. Though i do think there is a social feature to install other peoples weights.
Small warning. If you click the "more" button at the bottom of a list of results, it silently does another search and deducts that from your remaining free searches.
I did that a couple months ago, and just signed up for a paid tier after I tried to go back to duckduckgo and started losing my mind. Kagi is better for discovering new content and mediocre places on the internet.
The question to about the obvious quality drop for Google is, Is this intentional? Perhaps some cost saving or ROI measures? Or the motive always was to just train their AI and we just helped with that?
From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.
Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.
Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.
You are right to be curious. The encoding used by both GPT-3.5 and GPT-4 is called `cl100k_base`, which immediately and correctly suggests that there are about 100K tokens.
I'm a completely unknown artist with 4 songs on Spotify, mostly released during 2020. In total I'm at 54388 plays, which has earned $42.41. This is across all platforms, though Spotify is 95% of the plays.
I'm not sure if Spotify has dropped their payout per play since 2020 but I'm likely at the lowest payout rate and I'd say it's not terrible (although it's not great). You also get paid more for Spotify premium streams, which afaik was the majority of my streams.
Oh I wish that were true for my experiences - much more commonly it's a project manager that doesn't understand the value of tests....
reply