Hacker Newsnew | past | comments | ask | show | jobs | submit | whistle650's commentslogin

And imagine how much more you’d have to pay for each of those clicks if everyone could stop those fraudulent clicks. In equilibrium it shouldn’t change the total ad spend.


That’s the point, without the fraudulent clicks you would just move on to some other strategy because the pricing would not be worth it.

Fake clicks give the illusion that ads are working and instead you have to optimize your funnel or whatever else.


But if you're the only one doing it because the competition haven't figured it out, then you win in until they do. You can outbid on each ad.


That’s true. But you probably can’t. At least any more than others. It’s a systemic issue in the ad network ecosystem which you don’t have much control over. If you can figure it out, odds are lots of others can too. People do assess the quality of traffic sources and do check the return on ad spend. It’s that system wide process that keeps the return on ad spend roughly constant.

The point here, for me, is that a microeconomic perspective on this whole question is more salient than a purely technical one.


The ad spend itself is not the issue.

I am fine with spending $10,000 on ads (or whatever amount).

The issue is that when I know $5,000 of it was spent on clicks that had 0% chance to convert. For every fraudulent click I can block/prevent that is one more click that can be made by a real person who may actually make a purchase.


A properly informed market is a more efficient market.

The online ad market is extremely inefficient because it has no idea how much ad spend is even reaching people.


You should have a good idea. How to know if an ad reaches people had been extensively studied for a lot longer than the internet even existed. Newspapers don't have clicks, but you still need to know if you ad works. Even on the internet, a large part of the value of ads are see the ad, buy latter without clicking on the ad. We can track this: do it.


This is the key point. Ads and clicks etc are priced in a competitive market. If they don’t deliver the ROI because of bots, then people (including the allegedly hopelessly confused e-commerce retailers) would pay less for the same amount of traffic. It may be annoying (and the cost of dealing with that annoyance would further drive down the price paid for the traffic). But what matters is that an e-commerce site is profitable (enough) after the ad spend, period. If they are not, why do they spend what they spend on the ads?


Google is a competitive ad market?


There are certainly competition concerns about Google's advertising programs.

But within their advertising market, you compete for placement with other advertisers. If everyone is getting lots of fraud traffic, presumably they adjust their bids for it, if you're getting outbid consistently, it's reasonable to expect that the other advertisers are either getting a better ROI or they have a lower ROI target than you do.

About a million years ago, I was on a team that had a significant ad program, and it was primarily data driven, we'd come up with keywords to advertise on, measure the results and adjust our bids. With a little bit of manual work to filter out inappropriate keywords or to allow a lower ROI on "important" keywords. Of course, our revenue was largely also from advertising, so it was a bit circular.


At google I would expect no. However I don't understand why other "portals" are not running their own ads (some are of course - but I think more should). If you are a portal your value is the eyeballs you sell to advertisers, so why are you out soucing this critical part of your business value? This needs to be a core competency you keep in house.


I thought you could set up an automatic Takeout export periodically, and choose the target to be your Google Drive. Then via a webapp oauth you could pull the data that way. Frequency was limited (looks like it says the auto export is “every 2 months for 1 year”). So hardly realtime, but seems useful and (relatively) easy? Does a method like that not work for your intentions?


Will have to look into that. Sounds like it could be expensive but maybe worth it.


You can schedule the takeout to Drive, then use a tool such as rclone (amazing tool) to pull it down.

It should not add any costs except the storage for the takeout zip on drive.

Look at supported providers in rclone and you might find easy solutions for some hard sync problems: https://rclone.org/#providers


> except the storage for the takeout zip on drive.

Yeah, that's the cost I'm talking about. It essentially amounts to paying an extra subscription to be able to download your data [on a regular basis].

I'm a big rclone fan btw :) I'm sure there's some future where we do something like this to automate Takeouts.


Downloading from Google drive doesn't cost anything, does it? Although, I guess you would have to have enough empty space on your Google drive to be able to store the takeout zip, which I think is an acceptable cost


It seems they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster (by sending all queries to all models and looking at responses vs ground truth answers). Then they route the remaining 30% "test" set queries according to those prior determinations. It doesn't seem surprising that this approach would give you Pareto efficiency on those benchmarks.


It's ok if you can update the router over time, the more data you have the better.


Looking at the home page of Meanwhile only made me think of how life insurance is such a different thing than, say, a mortgage. With life insurance, counterparty risk matters. You don't care about your mortgage counterparty. I'm not going to buy life insurance from an insurer with Youtube videos of Anthony Pompliano on their home page. Know your enemy.


Have you tried the Gemini Live audio-to-audio in the free Gemini iOS app? I find it feels far more natural than ChatGPT Advanced Voice Mode.


I don't know much about what it's like to do a PhD in physics at Berkeley, but many years ago I did a PhD in physics at Stanford starting out working in experimental quantum optics. I wound up doing something completely different, and felt supported in changing what I worked on. Stanford felt small in a good way, the grad student admin staff was wonderful. Stanford definitely has a different more suburban isolated vibe. Summers felt like you worked at a country club or something.

Who you work with really matters (obviously) and different PIs and labs can have very different cultures which you may or may not feel comfortable with. That alone can make your decision if you are very sure about what you want to do and who you want to work with.

Outside of that, I would say Stanford is a really great place to do graduate work, especially if you're not entirely sure what you want to do.

All of this is with the obvious caveat that my experience is from quite some time ago.


Interesting read with lots of good detail, thank you. A comment: if you are balancing the classes when you do one vs all binary training, and then use the max probability for inference, your probabilities might not be calibrated well, which could be a problem. Do you correct the probabilities before taking the argmax?


https://chatgpt.com/share/13f553a8-5cff-42a1-be95-4a9d33cd10...

May also be easy to correct a lot of it:

“For better safekeeping, Russia’s $24,000,000 collection of crown jewels, probably the finest array of gems ever assembled at one time,”


But are you correcting the OICR or miscorrecting the originals?

I want original text, including misspellings, and original regional / historical spellings, including slang (which may look like another word, but is not, and isn't in a dictionary).

You cannot fix OCR text wirhout lioking at the original.


With the spelling having been fixed, even if imperfectly, you could much more easily search for content and find relevant results, and then go on to look at the originals. What you want is still possible, unless you unreasonably make it a requirement that the transcriptions should be perfect.


Proper transcription to digital is to do so with accuracy, not "close enough".


to quote myself, "every interesting data set will have inaccuracies in it"


There is a vast difference between a rare, honest mistake, and an attenpt to mitigate them...

vs willingly knowing you are introducing corrections that are ridiculously wrong.

Advocating and being a champion for inaccuracy, really isn't a positive. You should find a new thing to quote about yourself.


This is not what this phrase is about. I came to it working on the structural data of just under 100k Chinese characters. I'd spend hours, days and weeks proofreading and correcting formulas, so your "advocating and being a champion for inaccuracy" doesn't stick. But absent an automated, complete coverage of all records against a known error-free data set, there will likely be a small percentage of errors and dubious cases.

And thanks by the way for the readiness to jump to conclusions and fire a salve of allegations, viz. "willingly", "knowingly", "introducing", "ridiculous"


You're making statements supporting the concept that errors are unavoidable, with an air of "oh well!", in a thread where someone is claiming AI is a solution... right after demonstrating a 10x error!

AI is a ridiculous answer, with its hallucinations and absurd error rates. If you didn't intend to support that level of absurd error rate, you shouldn't be replying in defence.

It sounds like you did not want to give that impression, if so, I suggest you look at the chain of replies, and the context.

AI hype is literally a danger to us all.


oh well


"$2¢4,000,000" should be "$204,000,000" rather than ChatGPT's "$24,000,000".


Are you aware of any models that perform as well as an LLM on this task at lower cost?


Self hosted LLM?


This is a great project thank you. I've installed the TestFlight app. FYI, right now it's saying in response to "Who was the president in 1973" that it was "Gerald Ford" which is wrong.


Even non-quantized large LLMS (70b) have a lot of difficulties remembering facts. Chatgpt, being much larger, hallucinates a ton. It seems that it's not the best use case for them right now. Being a fact base that is


It's fun to ask it questions about famous computer scientists like Ron Rivest. Who is apparently a professor at Harvey Mudd College.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: