Hacker News new | past | comments | ask | show | jobs | submit | cwsx's comments login

> Yet we'll still get the undisciplined engineers whining that they have to write a small and isolated unit test.

Oh I wish that were true for my experiences - much more commonly it's a project manager that doesn't understand the value of tests....


> One coulld hire a software developer to write such a program. But, in general, software developers can be untrustworthy and prone to stealing ideas for their own selfish purposes.

Ehhhh? Yes there are examples of that, as there are for any arbitrary group of humans you could select, but [anecdotally] I've noticed the opposite... it's not uncommon to find a passionate developer that's only interested in the challenge/problem solving aspect - it's a lot less common for say.. real estate agents.

I don't really get the point you're making beyond "people be greedy sometimes" (which I do agree with, don't get me wrong).


> The "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.

Then don't give it your API keys? Surely there's better ways to solve this (like an MCP API gateway)?

[I agree with you]


> For greenfield it’s amazing

I'll preface this comment with: I am a recent startup owner (so only dev, which is important) and my entire codebase has been generated via Sonnet (mostly 3.7, now using 4.0). If you actually looked at the work I'm (personally) producing, I guess I'm more of a product-owner/project-manager as I'm really just overseeing the development.

> I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts.

There's a few crucial steps to make an LLM-generated app maintainable (by the LLM):

- _have a very, very strong SWE background_; ideally as a "strong" Lead Dev, _this is critical_

- your entire workflow NEEDS to be centered around LLM-development (or even model-specific):

  - use MCPs wherever possible and make sure they're specifically configured for your project

  - don't write "human" documentation; use rule + reusable prompt files

  - you MUST do this in a *very* granular but specialized way; keep rules/prompts very small (like you would when creating tickets)

  - make sure rules are conditionally applied (using globs); do not auto include anything except your "system rules"

  - use the LLM to generate said prompts and rules; this forces consistency across prompts, very important

  - follow a typical agile workflow (creating epics, tickets, backlogs etc)

  - TESTS TESTS AND MORE TESTS; add automated tools (like linters) EVERYWHERE you can

  - keep your code VERY modular so the LLM can keep a focused context, rules should provide all key context (like the broader architecture); the goal is for your LLM to only need to read or interact with files related to the strict 'current task' scope

  - iterating on code is almost always more difficult than writing it from scratch: provided your code is well architected, no single rewrite should be larger than a regular ticket (if the ticket is too large then it needs to be split up)
This is off the top of my head so it's pretty broad/messy but I can expand on my points.

LLM-coding requires a complete overhaul of your workflow so it is tailored specifically to an LLM, not a human, but this is also a massive learning curve (that take's a lot of time to figure out and optimize). Would I bother doing this if I were still working on a team? Probably not, I don't think it would've saved me much time in a "regular" codebase. As a single developer at a startup? This is the only way I've been able to get "other startup-y" work done while also progressing the codebase - the value of being able to do multiple things at a time, let the LLM and intermittently review the output while you get to work on other things.

The biggest tip I can give: LLMs struggle at "coding like a human" and are much better at "bad-practice" workflows (e.g. throwing away large parts of code in favour of a total rewrite) - let the LLM lead the development process, with the rules/prompts as guardrails, and try stay out of it's way while it works (instead of saying "hey X thing didn't work, go fix that now") - hold its hand but let it experiment before jumping in.


Do you have an example of a rule file? Or the MCPs you use?

MCPs:

  - `server-sequential-thinking` (MVP)
  - `memory` (2nd MVP, needs custom rules for config)
  - `context7`
  - `filesystem`
  - `fetch`
  - `postgres`
  - `git`
  - `time`

Example rules file for ticketing system:

```

# Ticket Management Guidelines

This document outlines the standardized approach to ticket management in the <redacted> project. All team members should follow these guidelines when creating, updating, or completing tickets.

## Ticket Organization

Tickets are organized by status and area in the following structure:

TICKETS/ COMPLETED/ - Finished tickets BACKEND/ - Backend-related tickets FRONTEND/ - Frontend-related tickets IN_PROGRESS/ - Tickets currently being worked on BACKEND/ FRONTEND/ BACKLOG/ - Tickets planned but not yet started BACKEND/ FRONTEND/

## Ticket Status Indicators

All tickets must use consistent status indicators:

- *BACKLOG* - Planned but not yet started - *IN_PROGRESS* - Currently being implemented - *COMPLETED* - Implementation is finished - *ABANDONED* - Work was stopped and will not continue

## Required Ticket Files

Each ticket directory must contain these files:

1. *Main Ticket File* (TICKET_.md): - Problem statement and background - Detailed analysis - Implementation plan - Acceptance criteria

2. *Implementation Plan* (IMPLEMENTATION_PLAN.md): - Detailed breakdown of tasks - Timeline estimates - Success metrics

3. *Implementation Progress* (IMPLEMENTATION_PROGRESS.md): - Status updates - Issues encountered - Decisions made

4. *Design Documentation* (DESIGN_RECOMMENDATIONS.md), when relevant: - Architecture recommendations - Code patterns and examples - Error handling strategies

5. *API Documentation* (API_DOCUMENTATION.md), when applicable: - Interface definitions - Usage examples - Configuration options

## Ticket Workflow Rules

### Creating Tickets

1. Create tickets in the appropriate BACKLOG directory 2. Use standard templates from .templates/ticket_template.md 3. Set status to *Status: BACKLOG* 4. Update the TICKET_INDEX.md file

### Updating Tickets

1. Move tickets to the appropriate status directory when status changes 2. Update the status indicator in the main ticket file 3. Update the "Last Updated" date when making significant changes 4. Document progress in IMPLEMENTATION_PROGRESS.md 5. Check off completed tasks in IMPLEMENTATION_PLAN.md

### Completing Tickets

1. Ensure all acceptance criteria are met 2. Move the ticket to the COMPLETED directory 3. Set status to *Status: COMPLETED* 4. Update the TICKET_INDEX.md file 5. Create a completion summary in the main ticket file

### Abandoning Tickets

1. Document reasons for abandonment 2. Move to COMPLETED/ABANDONED directory 3. Set status to *Status: ABANDONED* 4. Update the TICKET_INDEX.md file

## Ticket Linking

When referencing other tickets, use relative links with appropriate paths:

markdown @TICKET_NAME

Ensure all links are updated when tickets change status.

## Ticket Cleanup and Streamlining

### When to Streamline Tickets

Tickets should be streamlined and cleaned up at major transition points to maintain focus on remaining work:

1. *Major Phase Transitions* - When moving between phases (e.g., from implementation to testing) 2. *Milestone Achievements* - After completing significant portions of work (e.g., 80%+ complete) 3. *Infrastructure Readiness* - When moving from setup/building to operational phases 4. *Team Handoffs* - When different team members will be taking over the work

### What to Streamline

*Replace Historical Implementation Details With:* - Brief completed tasks checklist ( high-level achievements) - Current status summary - Forward-focused remaining work

*Remove or Simplify:* - Detailed session-by-session progress logs - Extensive implementation decision histories - Verbose research findings documentation - Historical status updates and coordination notes

### Why Streamline Tickets

1. *Git History Preservation* - All detailed progress, decisions, and implementation details are preserved in git commits 2. *Clarity for Future Work* - Makes it easier to quickly understand "what needs to be done next" 3. *Team Efficiency* - Anyone picking up the work can immediately see current state and next steps 4. *Maintainability* - Shorter, focused tickets are easier to read, understand, and keep updated

### How to Streamline

1. *Archive Detailed Progress* - Historical implementation details are preserved in git history 2. *Create Completion Summary* - Replace detailed progress with a brief "What's Complete" checklist 3. *Focus on Remaining Work* - Make current and future phases the primary content 4. *Update Status Sections* - Keep status concise and action-oriented 5. *Preserve Essential Context* - Keep architectural decisions, constraints, and key requirements

*Goal*: Transform tickets from "implementation logs" into "actionable work plans" while preserving essential context.

## Maintenance Requirements

1. Keep the TICKET_INDEX.md file up to date 2. Update "Last Updated" dates when making significant changes 3. Ensure all ticket files follow the standardized format 4. Include links between related tickets in both directions

## Complete Documentation

For detailed instructions on working with tickets, refer to:

- @Ticket Workflow Guide - @Ticket Index - @Tickets README

```


I've been using `claude-4-sonnet` for the last few hours - haven't been able to test `opus` yet as it's still overloaded - but I have noticed a massive improvement so far.

I spent most of yesterday working on a tricky refactor (in a large codebase), rotating through `3.7/3.5/gemini/deepseek`, and barely making progress. I want to say I was running into context issues (even with very targeted prompts) but 3.7 loves a good rabbit-hole, so maybe it was that.

I also added a new "ticketing" system (via rules) to help it's task-specific memory, which I didn't really get to test it with 3.7 (before 4.0 came out), so unsure how much of an impact this has.

Using 4.0, the rest of this refactor (est. 4~ hrs w/ 3.7) took `sonnet-4.0` 45 minutes, including updating all of the documentation and tests (which normally with 3.7 requires multiple additional prompts, despite it being outlined in my rules files).

The biggest differences I've noticed:

  - much more accurate/consistent; it actually finishes tasks rather than telling me it's done (and nothing working)

  - less likely to get stuck in a rabbit hole

  - stopped getting stuck when unable to fix something (and trying the same 3 solutions over-and-over)

  - runs for MUCH longer without my intervention

  - when using 3.7:

     - had to prompt once every few minutes, 5 - 10mins MAX if the task was straight forward enough

     - had to cancel the output in 1/4 prompts as it'd get stuck in the same thought-loops

     - needed to restore from a previous checkpoint every few chats/conversations

  - with 4.0:

    - ive had 4 hours of basically one-shotting everything

    - prompts run for 10 mins MIN, and the output actually works

    - is remembering to run tests, fix errors, update docs etc

Obviously this is purely anecdotal - and, considering the temperament of LLMS, maybe I've just been lucky and will be back to cursing at it tomorrow, but imo this is the best feeling model since 3.5 released.

Is Copilot _enforced_ as the only option for an AI coding agent? Or can devs pick-and-choose whatever tool they prefer

I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.

(and if anybody has similar data for _any_ company developing their own agent, please shout out).


Coding is, and always has been, the easy part of software development


Pretty sure he knows this


I switched to them a few months ago, I was previously using duckduckgo (and Google before that). As most of you have probably noticed Google search results have seriously dropped in quality the last few years, but especially in 2023. I'm no longer able to get meaningful results for almost any topic, especially if it's technical, the only results are AI generated (?) / obvious SEO spam websites. It takes me multiple different search terms and clicking through multiple results to find anything semi relevant, and even then it's a shallow article maybe summarising what I'm looking for. Unfortunately DDG seems to be going the same way.

Whereas Kagi reminds me of the 'old' google search. The results are meaningful and relevant, not diluted with pages of generic article results. They also offer a lot of great customisation options like being able to block or boost certain sites in results. They have some built in lists for common filler sites. I can't comment on the AI variation but I hear that's progressing well.

I wouldn't call myself a power user of Kagi, but even then I'm getting far better results than other search engines, definitely worth the price per month.

I'm not affiliated with them in any way, just thought I'd share my anecdotal experience.


> I wouldn't call myself a power user of Kagi, but even then I'm getting far better results than other search engines, definitely worth the price per month.

This only works as long as Kagi is a niche. The moment any search engine becomes commonplace I think they will inevitably succumb to SEO. Otherwise, they would have to change their methodologies every once in a while to completely flip the ecosystem.


I think it also has to do with incentives. If your business model is selling ads then you have a balancing act between user and customer satisfaction.

With Kagi as I understand it, the customer is the user since it’s a premium product that isn’t selling ads. There’s really no good reason for them not to just nuke bad actors.


Not necessarily. You'll still be able to nuke the whole domain from your results, permanently. That means you'll see the spam once, and getting a new domain promoted to the top takes time and effectively money.

I also hope that domains which get blocked by lots of people will get reviewed for global downranking, but I don't think that's happening yet?


That is true, I really wish I can tell Google to simply filter out learncpp.com and some other websites.


I managed to filter that, geeksforgeeks.org and towardsdatascience.com out with Kagi. It's quite helpful being able to slightly reduce prioritization on a per site basis so that instead of showing up as top result it'll be buried a bit but still accessible.


You can with browser extensions, fwiw.


uBlacklist can only block sites. But Kagi can raise or lower sites in pagerank, and can pin sites on the top. Boosting sites up in the result is more efficient than blocking spam sites one by one.


Since the browser extension only works on the FE, this just means you are hiding the site and receiving fewer results on a page.


> The moment any search engine becomes commonplace I think they will inevitably succumb to SEO

Thankfully when I come across a irrelevant domain in Kagi I can just remove it from any future search results completely. If enough people do that, it may show up on the "most commonly removed" list inviting others to also ax it.

I rarely ever have an issue with spam on Kagi just by largely using the standard filters, and I'm confident this will remain the case.

And unrelated but I really like that I can redirect all reddit urls in search results to old.reddit.com, twitter to nitter etc. very helpful in searching on mobile.


If that SEO means removing ads and tracking from your page to get a higher rank, I'm cool with it. :)


Two-sided/platform market dynamics are really interesting to this economist.

I wonder if kagi's going to have to charge for listings some day, instead of users paying in, if they intend to grow substantially.


There is a good chance that it will remain niche due to the paid and forced-login model. This is a good thing. I hope they will manage to position themselves well as an alternative search engine with clean, unmanipulated results; and be careful about unhealthy (greedy) growth.


SEO should be called "GEO", it's google optimization. Spam keyword blogsites only work because google prioritizes that stuff. They're driven by ad revenue so they're incentivized to show commerical sites over non commercial ones, etc. ,etc, etc.,etc.


Except that the problem isn't specific to google search. The others are much the same.


hopefully what will happen is no single search engine will be dominant, ensuring that problem can't happen (we'll probably have other problems instead)


A paying search engine will always be a niche


Google Search is a victim of its own success.

They are the biggest search engine; every SEO trick, every spam attack is spearheaded against them. But also being the biggest and the inevitable, they can afford to blunt their search tool somehow in order to show more lucrative sort-of-hits and sell more ads. A moral hazard to do such a thing is always present fr any market-dominating player.

Kagi, in comparison, is tiny, and almost nobody cares to attack their algorithms. Back in 1990s, when Macs were a small minority in the PC-dominated world, they were the safest desktop machines, because almost nobody cared to write malware for them. Now that Macs are a sizable segment of computers in hands of important people, they are targeted by malware all right.


> every SEO trick, every spam attack is spearheaded against them.

Sure, but also they're ignoring extremely basic issues. "every SEO trick" is one thing, "just copy the SO content and still get ranked on the first page" is them not caring. We can worry about them dealing with the complex issues after they address the low hanging fruit.


I've been curious for a while too and I've been trying to de-google myself a tiny bit each year (more or less dropped Chrome in 2023).

Once I actually grab a full time job again I wouldn't mind grabbing my own subscription here to try it out. I'm curious if 300 searches/month is truly enough for me, though. And what would happen if I go over that rate. Am I simply unable to search more for that month?


Fwiw, I initially burned through the free searches in a few days, so definitely not enough IMO. Add the fact that free searches never got refreshed for my account, and I was pretty much unable to properly test the service for months. But bangs still work after the limit, thus I kept it as default given that I heavily use bangs to search other services.

Still I ended up subscribing, and after properly testing, I can recommend. The service is good, the blacklist feature is essential to me now; is just that the free tier is shit.


The free searches aren't supposed to "refresh". They are once per account.


Look at your browser history to find out how much you search. I was surprised to find that I’m consistently nowhere near 300.


Yeah, I found the opposite for me, as I expected. I did a little under 400 queries in the last 30 days. I could definitely cut down a lot of redundant or simple searches to get under 400, but given how ubiquitous it is for me to simply so random questions (or simply search around a lot for documentation via search engine) I'd rather not have to worry about it.

On top of that, this is during a month without any job (where I'd search even more on the clock). I hear it's 1.5 cents per query over but I can imagine doing 600+ searches once I'm employed again.


> I was surprised to find that I’m consistently nowhere near 300

Per month?

My current Kagi searches from 3rd of January until today sits at 1256 searches. For sure I'd do 300 searches in a week, and on a particularly hairy day I might do it in a day.


300 per day??? That’s a search per minute for 5 hours. Are you even doing anything else?


Eh?

> 3rd of January until today sits at 1256 searches

1256 / 23 (days between today and Jan 3rd) = 54.6 searches on average per day.

Some days higher, some lower. Sometimes it can take a couple of tries to get the search right, so you do 5-10 searches in one minute maybe. Doesn't seem farfetched to me.


> For sure I'd do 300 searches in a week, and on a particularly hairy day I might do it in a day.


Yeah, that's "on a particularly hairy day", not "per day" for a full month...


I didn’t say it was every day.


Ok, well, thanks for the intellectually stimulating discussion, I hope you have a nice day :)


Personally I'm fine with the 300 searches/month, however that means I don't use Kagi for searches that are extremely simple.


I just prefix "simple" searches with !g or !gi, bangs don't count against the limit


Why waste your mental energy on this? Searches cost like 1.5c once you go over the limit. It's not worth thinking about.

Also you can enable browsing history and use bookmarks to autofill stuff without having to use a search engine.


Yeah, once you hit the free limit, you get served a subscription wall once you try to do a search


Does it have an option to exclude commercial websites? That'd be quite useful to me. Pretty much every time I try to find information about a product, all I find are sites trying to sell it to me (but I already have it and want to find information about it, damn it!).


Another Kagi user here, yes, the customization of results is way better than any other search engine I've used. Eg, personalization can be manually set to lower or raise weight of results from specific domains. This has become extremely useful to not only filter out bad sites, but increase relevant results when you regularly get information from sites like GitHub etc.

Stats are released about these as well so you can easily copy heavy used fiters [0].

[0] https://kagi.com/stats?stat=leaderboard


Interesting that HN is pinned way more than stackoverflow.


If they do/done a user survey, it would be interesting to see where all paying users are coming from. My guess is that a substantial amount of users come from hearing about Kagi on HN or in HN comments.


There is a lens available in settings that seems like a good fit, though I haven’t tried it myself yet.

Small Web: results that favor noncommercial domains and topics.


Iirc there is an exclude (or at the very least, weights), though you'd have to do it by hand. Though i do think there is a social feature to install other peoples weights.


I just signed up. You get 100 searches for free to try it out.


Small warning. If you click the "more" button at the bottom of a list of results, it silently does another search and deducts that from your remaining free searches.


I did that a couple months ago, and just signed up for a paid tier after I tried to go back to duckduckgo and started losing my mind. Kagi is better for discovering new content and mediocre places on the internet.


The question to about the obvious quality drop for Google is, Is this intentional? Perhaps some cost saving or ROI measures? Or the motive always was to just train their AI and we just helped with that?


Just perverse incentives.

Google isn't incentivized to be a good search engine.

They are incentivized to be just good enough that you don't go elsewhere while increasing the number of ads / paid results.


Commenting to follow, curious about the answer.

From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.

Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.

Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.


You are right to be curious. The encoding used by both GPT-3.5 and GPT-4 is called `cl100k_base`, which immediately and correctly suggests that there are about 100K tokens.


Amazing, thanks for the reply, I'm finding some good resources afyer a quick search of `cl100k_base`.

If you have any other resources (for anything AI related) please share!


Their tokenizer is open source: https://github.com/openai/tiktoken

Data files that contain vocabulary are listed here: https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7d...


GPT 2 and 3 used the p50K right? Then GPT-4 used cl100K



I'm a completely unknown artist with 4 songs on Spotify, mostly released during 2020. In total I'm at 54388 plays, which has earned $42.41. This is across all platforms, though Spotify is 95% of the plays.

I'm not sure if Spotify has dropped their payout per play since 2020 but I'm likely at the lowest payout rate and I'd say it's not terrible (although it's not great). You also get paid more for Spotify premium streams, which afaik was the majority of my streams.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: