More

whistle650 · 2025-10-15T13:31:55 1760535115

And imagine how much more you’d have to pay for each of those clicks if everyone could stop those fraudulent clicks. In equilibrium it shouldn’t change the total ad spend.

ccortes · 2025-10-15T14:32:45 1760538765

That’s the point, without the fraudulent clicks you would just move on to some other strategy because the pricing would not be worth it.

Fake clicks give the illusion that ads are working and instead you have to optimize your funnel or whatever else.

arichard123 · 2025-10-15T14:21:54 1760538114

But if you're the only one doing it because the competition haven't figured it out, then you win in until they do. You can outbid on each ad.

whistle650 · 2025-10-15T15:59:23 1760543963

That’s true. But you probably can’t. At least any more than others. It’s a systemic issue in the ad network ecosystem which you don’t have much control over. If you can figure it out, odds are lots of others can too. People do assess the quality of traffic sources and do check the return on ad spend. It’s that system wide process that keeps the return on ad spend roughly constant.

The point here, for me, is that a microeconomic perspective on this whole question is more salient than a purely technical one.

paulcole · 2025-10-15T21:44:25 1760564665

The ad spend itself is not the issue.

I am fine with spending $10,000 on ads (or whatever amount).

The issue is that when I know $5,000 of it was spent on clicks that had 0% chance to convert. For every fraudulent click I can block/prevent that is one more click that can be made by a real person who may actually make a purchase.

rockskon · 2025-10-15T16:38:07 1760546287

A properly informed market is a more efficient market.

The online ad market is extremely inefficient because it has no idea how much ad spend is even reaching people.

bluGill · 2025-10-15T17:18:22 1760548702

You should have a good idea. How to know if an ad reaches people had been extensively studied for a lot longer than the internet even existed. Newspapers don't have clicks, but you still need to know if you ad works. Even on the internet, a large part of the value of ads are see the ad, buy latter without clicking on the ad. We can track this: do it.

whistle650 · 2025-10-15T13:08:25 1760533705

This is the key point. Ads and clicks etc are priced in a competitive market. If they don’t deliver the ROI because of bots, then people (including the allegedly hopelessly confused e-commerce retailers) would pay less for the same amount of traffic. It may be annoying (and the cost of dealing with that annoyance would further drive down the price paid for the traffic). But what matters is that an e-commerce site is profitable (enough) after the ad spend, period. If they are not, why do they spend what they spend on the ads?

pixl97 · 2025-10-15T15:53:44 1760543624

Google is a competitive ad market?

toast0 · 2025-10-15T17:05:05 1760547905

There are certainly competition concerns about Google's advertising programs.

But within their advertising market, you compete for placement with other advertisers. If everyone is getting lots of fraud traffic, presumably they adjust their bids for it, if you're getting outbid consistently, it's reasonable to expect that the other advertisers are either getting a better ROI or they have a lower ROI target than you do.

About a million years ago, I was on a team that had a significant ad program, and it was primarily data driven, we'd come up with keywords to advertise on, measure the results and adjust our bids. With a little bit of manual work to filter out inappropriate keywords or to allow a lower ROI on "important" keywords. Of course, our revenue was largely also from advertising, so it was a bit circular.

bluGill · 2025-10-15T17:08:04 1760548084

At google I would expect no. However I don't understand why other "portals" are not running their own ads (some are of course - but I think more should). If you are a portal your value is the eyeballs you sell to advertisers, so why are you out soucing this critical part of your business value? This needs to be a core competency you keep in house.

whistle650 · 2025-10-07T17:12:37 1759857157

I thought you could set up an automatic Takeout export periodically, and choose the target to be your Google Drive. Then via a webapp oauth you could pull the data that way. Frequency was limited (looks like it says the auto export is “every 2 months for 1 year”). So hardly realtime, but seems useful and (relatively) easy? Does a method like that not work for your intentions?

mholt · 2025-10-07T17:20:31 1759857631

Will have to look into that. Sounds like it could be expensive but maybe worth it.

robinwassen · 2025-10-07T20:37:42 1759869462

You can schedule the takeout to Drive, then use a tool such as rclone (amazing tool) to pull it down.

It should not add any costs except the storage for the takeout zip on drive.

Look at supported providers in rclone and you might find easy solutions for some hard sync problems: https://rclone.org/#providers

mholt · 2025-10-07T20:42:49 1759869769

> except the storage for the takeout zip on drive.

Yeah, that's the cost I'm talking about. It essentially amounts to paying an extra subscription to be able to download your data [on a regular basis].

I'm a big rclone fan btw :) I'm sure there's some future where we do something like this to automate Takeouts.

alashow · 2025-10-08T13:42:56 1759930976

Downloading from Google drive doesn't cost anything, does it? Although, I guess you would have to have enough empty space on your Google drive to be able to store the takeout zip, which I think is an acceptable cost

whistle650 · 2025-08-22T16:17:51 1755879471

It seems they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster (by sending all queries to all models and looking at responses vs ground truth answers). Then they route the remaining 30% "test" set queries according to those prior determinations. It doesn't seem surprising that this approach would give you Pareto efficiency on those benchmarks.

visarga · 2025-08-22T17:32:21 1755883941

It's ok if you can update the router over time, the more data you have the better.

whistle650 · 2025-05-07T16:54:40 1746636880

Looking at the home page of Meanwhile only made me think of how life insurance is such a different thing than, say, a mortgage. With life insurance, counterparty risk matters. You don't care about your mortgage counterparty. I'm not going to buy life insurance from an insurer with Youtube videos of Anthony Pompliano on their home page. Know your enemy.

whistle650 · 2025-04-18T02:11:26 1744942286

Have you tried the Gemini Live audio-to-audio in the free Gemini iOS app? I find it feels far more natural than ChatGPT Advanced Voice Mode.

whistle650 · 2025-02-10T18:11:12 1739211072

I don't know much about what it's like to do a PhD in physics at Berkeley, but many years ago I did a PhD in physics at Stanford starting out working in experimental quantum optics. I wound up doing something completely different, and felt supported in changing what I worked on. Stanford felt small in a good way, the grad student admin staff was wonderful. Stanford definitely has a different more suburban isolated vibe. Summers felt like you worked at a country club or something.

Who you work with really matters (obviously) and different PIs and labs can have very different cultures which you may or may not feel comfortable with. That alone can make your decision if you are very sure about what you want to do and who you want to work with.

Outside of that, I would say Stanford is a really great place to do graduate work, especially if you're not entirely sure what you want to do.

All of this is with the obvious caveat that my experience is from quite some time ago.

whistle650 · on Aug 19, 2024

Interesting read with lots of good detail, thank you. A comment: if you are balancing the classes when you do one vs all binary training, and then use the max probability for inference, your probabilities might not be calibrated well, which could be a problem. Do you correct the probabilities before taking the argmax?

whistle650 · on July 1, 2024

https://chatgpt.com/share/13f553a8-5cff-42a1-be95-4a9d33cd10...

May also be easy to correct a lot of it:

“For better safekeeping, Russia’s $24,000,000 collection of crown jewels, probably the finest array of gems ever assembled at one time,”

bbarnett · on July 1, 2024

But are you correcting the OICR or miscorrecting the originals?

I want original text, including misspellings, and original regional / historical spellings, including slang (which may look like another word, but is not, and isn't in a dictionary).

You cannot fix OCR text wirhout lioking at the original.

brabel · on July 1, 2024

With the spelling having been fixed, even if imperfectly, you could much more easily search for content and find relevant results, and then go on to look at the originals. What you want is still possible, unless you unreasonably make it a requirement that the transcriptions should be perfect.

bbarnett · on July 1, 2024

Proper transcription to digital is to do so with accuracy, not "close enough".

DemocracyFTW2 · on July 1, 2024

to quote myself, "every interesting data set will have inaccuracies in it"

bbarnett · on July 1, 2024

There is a vast difference between a rare, honest mistake, and an attenpt to mitigate them...

vs willingly knowing you are introducing corrections that are ridiculously wrong.

Advocating and being a champion for inaccuracy, really isn't a positive. You should find a new thing to quote about yourself.

DemocracyFTW2 · on July 2, 2024

This is not what this phrase is about. I came to it working on the structural data of just under 100k Chinese characters. I'd spend hours, days and weeks proofreading and correcting formulas, so your "advocating and being a champion for inaccuracy" doesn't stick. But absent an automated, complete coverage of all records against a known error-free data set, there will likely be a small percentage of errors and dubious cases.

And thanks by the way for the readiness to jump to conclusions and fire a salve of allegations, viz. "willingly", "knowingly", "introducing", "ridiculous"

bbarnett · on July 2, 2024

You're making statements supporting the concept that errors are unavoidable, with an air of "oh well!", in a thread where someone is claiming AI is a solution... right after demonstrating a 10x error!

AI is a ridiculous answer, with its hallucinations and absurd error rates. If you didn't intend to support that level of absurd error rate, you shouldn't be replying in defence.

It sounds like you did not want to give that impression, if so, I suggest you look at the chain of replies, and the context.

AI hype is literally a danger to us all.

DemocracyFTW2 · on July 4, 2024

oh well

notachatbot1234 · on July 1, 2024

"$2¢4,000,000" should be "$204,000,000" rather than ChatGPT's "$24,000,000".

djhn · on July 1, 2024

Are you aware of any models that perform as well as an LLM on this task at lower cost?

bl4ckneon · on July 1, 2024

Self hosted LLM?

whistle650 · on April 30, 2023

This is a great project thank you. I've installed the TestFlight app. FYI, right now it's saying in response to "Who was the president in 1973" that it was "Gerald Ford" which is wrong.

eurekin · on April 30, 2023

Even non-quantized large LLMS (70b) have a lot of difficulties remembering facts. Chatgpt, being much larger, hallucinates a ton. It seems that it's not the best use case for them right now. Being a fact base that is

matthewdgreen · on April 30, 2023

It's fun to ask it questions about famous computer scientists like Ron Rivest. Who is apparently a professor at Harvey Mudd College.