More

carlossouza · on June 17, 2024

The data used is Sharadar Core US Equities Bundle:

https://data.nasdaq.com/databases/SFA

It's a great survivorship-bias-free dataset.

Regarding tools, I use Python. I wrote the backtesting software many, many, many years ago during my Master's degree, and I've been refining it ever since.

It's an event-driven engine (they are slower than vector-based engines, but they are easier to write strategies for, understand, and debug) with all the bells and whistles, similar to the late Zipline. In fact, I tried most of the Python backtest engines that exist, and that's why I prefer to use what I built over the years: I have 100% understanding of what’s happening and 100% control.

I’m thinking about open-sourcing it… anyway, the logic is not that complicated.

spicyusername · on June 18, 2024

Thanks!

carlossouza · on June 9, 2024

Twitter/X and Substack are some options. Reddit doesn't work, indeed.

I just read your posts... good stuff, congrats! I'll follow your journey.

If you want to check some algorithms I have implemented and connect, check it out:

https://quantitativo.substack.com/

Cheers

mmarian · on June 10, 2024

Makes sense. Thanks! I like how you explain your algos in your Substack, I've subscribed to the newsletter. I'll send you an email if that's okay?

carlossouza · on April 25, 2024

The world is much larger than the bubble we live in.

spydum · on April 25, 2024

Or they are counting bots and other ai agents and ignoring the truth?

sahila · on April 25, 2024

Ultimately advertisers want a return on their spend and it's a closely watched metric for marketers. Them continuing to spend is indicative that there's growing value in ads, ie bots/ai agents cannot be the reason for their growth.

You could argue fb and particularly twitter are incentivized to include it in their DAU counts but market cares more for revenue for large companies.

ethbr1 · on April 25, 2024

"There is no alternative"

I wouldn't put it past Google/Meta coopting corporate advertisers. Everyone makes everyone look good, by stating the best possible numbers.

sangnoir · on April 26, 2024

SMEs spend a lot on advertising, in aggregate. These smaller businesses are more sensitive to lowered conversation rates and will bail early - Meta took a huge hit in earnings in the aftermath of Apple's privacy changes due to poor conversions. Advertisers didn't keep pumping money in - they stopped campaigns.

ajross · on April 25, 2024

Google and Meta literally represent alternative advertising products.

mangosteenjuice · on April 26, 2024

So do Android and iOS. They're still the primary indicators of the strength of the smartphone market. You probably can't imagine a world without advertisements or smartphones, but that's the point from the post you're responding to that you're missing. If the advertising industry looks healthy, it helps everyone by keeping investors satisfied and share prices up, even if it's a ruse. It is possible for smartphones and the concept of advertising to cease existing some day. There's a spectrum between the current reality and absolute zero, and staying as far away from zero is the goal, even once we enter an inescapable decline.

ajross · on April 26, 2024

Arguably true, but that's a rather different point than "there is no alternative". There are arguments to be made about market "health" and the relative sizes of the largest players, but what the upthread comment was doing was resorting to "anti trust" shorthand that clearly doesn't apply. There may be problems with this market, but lack of competition isn't one of them.

carlossouza · on April 24, 2024

That also sounds a lot like Blockbuster.

Google continues generating profits out of inertia and a lack of a better alternative.

It went for “don’t be evil” to “a necessary evil” (just until something a little better appears).

nothercastle · on April 24, 2024

I think they are just at attain median levels of evil now.

carlossouza · on April 19, 2024

Negative.

The point is not that he shouldn’t be allowed to unilaterally ban social media accounts.

The point is that he shouldn’t be allowed to do that in secrecy, without providing any public justification, and not respecting the right of the accused to defend themselves.

People are getting silenced without knowing why they are getting silenced, and without proper due process/right to respond.

mrkstu · on April 19, 2024

Yep, you don't 'save' democracy by trampling on its traditions.

noizejoy · on April 19, 2024

> you don't 'save' democracy by trampling on its traditions.

It’s maybe not that simple:

https://en.wikipedia.org/wiki/Paradox_of_tolerance

smsm42 · on April 19, 2024

Of course, it's ok to persecute my opponents, because they are the intolerant bad people, but it's not ok to prosecute me, because I am extremely tolerant and only prosecute my opponents, who, as we already established, are bad people and are ok to prosecute.

diegoholiveira · on April 19, 2024

You’re interpreting it wrongly.

lobocinza · on April 19, 2024

Popper's proposed remedy isn't a carte blanche for institutional censorship. Some excerpts from Popper's book that those who cite the paradox of tolerance typically ignore:

"I do not imply, for instance, that we should always suppress the utterance of intolerant philosophies; as long as we can counter them by rational argument and keep them in check by public opinion, suppression would certainly be most unwise."

"All these paradoxes can easily be avoided if we frame our political demands in the way suggested in section ii of this chapter, or perhaps in some such manner as this. We demand a government that rules according to the principles of equalitarianism and protectionism; that tolerates all who are prepared to reciprocate, i.e. who are tolerant; that is controlled by, and accountable to, the public. And we may add that some form of majority vote, together with institutions for keeping the public well informed, is the best, though not infallible, means of controlling such a government. (No infallible means exist.)"

Notes: By equalitarianism he meant the classic liberal notion of equal rights to everyone. By protectionism he meant that the state should ensure people's rights (protect people).

What the Brazilian supreme court is doing is the opposite, it's unaccountable and widespread censorship by non-elected judges.

And what if both side in this dispute "bolsonaristas" and "petistas" are intolerant?

Personally I saw worst cases of intolerance from "petistas" than from "bolsonaristas". Like accusing me of being a "bolsonarista" just because I didn't agree with his extremist political or was wearing and green & yellow shirt. Like leftists openly inciting violence against those they disagree with and receiving praises from their peers.

The worst I received from "bolsonaristas" was being called a "communist" (which I'm not and deem offensive). And ironically the groups they are mostly intolerant towards (besides leftists) are criminals, corrupt politicians, pedophiles (those that are intolerant). Were "petistas" are typically intolerant towards businessmen, policemen, Christians (specially from evangelic congregations), the wealthy and famous (but only those that don't share their views lol). But despite all the majority of "bolsonaristas" and "petistas" aren't bad, just normal people brainwashed with vicious ideologies.

braiamp · on April 19, 2024

Where in the Brazilian constitution you have right to access Twitter or any platform for that matter?

throwaway3298 · on April 19, 2024

From https://www.planalto.gov.br/ccivil_03/constituicao/constitui...

"IV - é livre a manifestação do pensamento, sendo vedado o anonimato;"

"IX - é livre a expressão da atividade intelectual, artística, científica e de comunicação, independentemente de censura ou licença;"

You can see translated versions under "Art 5" here: https://www.constituteproject.org/constitution/Brazil_2017

marcosdumay · on April 19, 2024

You know, during the entire 90s when the people that wrote it were around, they used to say that the Article 5 said this. But nowadays it's consensus over all the influential judges that it doesn't and only registered journalists can say things, and only congresspeople can have their opinions. So, who knows...

Anyway, it still says that decisions should be informed to the punished party and people should be able to defend themselves in a court.

carlossouza · on March 20, 2024

> most UI tests are pretty easy to write today with very natural DSLs that are close to natural language

Wouldn't it be a better/cheaper/faster solution to use LLMs to write UI/integration tests?

vercantez · on March 20, 2024

The issue with this approach is that for all but the most simple apps it is not possible to deduce the runtime element information needed to write traditional UI tests given just the source code. This can only be done reliably at runtime which is what we do. We run your app and iteratively build UI tests that can be reused later.

carlossouza · on March 18, 2024

Exactly what I thought.

Let’s see how long it will take them to collect enough data and train a model to distinguish AI-generated from user-generated videos.

carlossouza · on March 12, 2024

What do you mean by “they don’t do B2B”? They sell ads to companies, don’t they?

carlossouza · on March 10, 2024

> We observed that all the VLMs tend to be confident while being wrong. Interestingly, we observed that even when the entropy was high, models tried to provide a nonsensical rational, instead of acknowledging their inability to perform the task

It looks like all current models suffer from an incurable case of Dunning–Kruger effect cognitive bias.

All are at the peak of Mount Stupid.

nyrikki · on March 10, 2024

LLMs are trained to sound confident.

But they can also only do negation through exhaustion, known unknowns, future unknowns, etc...

That is the pain of the Entscheidungsproblem.

Even in Presburger arithmetic, Natural numbers will addition and equality, which is decidable, still has a double factorial time complexity to prove. That is worse than factorial time for those who've not dealt with it.

Add in multiplication then you are undecidable.

Even if you decided to use the dag like structure of transformers, causality is very very hard.

https://arxiv.org/abs/1412.3076

LLMs only have cheap access to their model probables which aren't ground truth.

So while asking for a pizza recipe could be called out as a potential joke if add a topping that wasn't in its training set, through exhaustion, It can't know when it is wrong in the general case.

That was an intentional choice with statistical learning and why it was called PAC (probably approximately correct) learning.

That was actually a cause of a great rift with the Symbolic camp in the past.

PAC learning is practically computable in far more cases and even the people who work in automated theorem proving don't try to prove no-instances in the general case.

There are lots of useful things we can do in BPP (bounded probabilistically polynomial time) and with random walks.

But unless there are major advancements in math and logic, transformers will have limits.

leereeves · on March 10, 2024

How can a neural network evaluate "confidence"?

The parameters don't store any information about what inputs were seen in the training data (vs being interpolated) or how accurate the predictions were for those specific inputs.

And even if they did, the training data was usually gathered voraciously, without much preference for quality reasoning.

habitue · on March 10, 2024

I don't know for sure, but here's a plausible mechanism for how:

Multiple sub-networks detect the same pattern in different ways, and confidence is the percent of those sub-networks that fire for a particular instance.

There's a ton of overlap and redundancy with so many weights, so there are lots of ways this could work

brookst · on March 10, 2024

That’s good. Also maybe an architecture that runs the query through multiple times and then evaluates similarity of responses, then selects (or creates) the most-generated one, along with a confidence level of how many of the individual responses were aligned.

XenophileJKO · on March 10, 2024

Actually you can get a very good proxy by looking at the probability distrobution of the "answer" tokens. The key here is you have to be able to identify the "answer" tokens.

https://arxiv.org/abs/2402.10200

pbhjpbhj · on March 10, 2024

Phind gives me ChatGPT answers with relatively authoritative references to works on the web that (usually!) support the answer. Could it have a post-filter to fact check against the references?

I guess that is a slight variation of the sibling (@habitue's) answer; both are checks against external material.

I wonder if best resources could be catalogued as the corpus is processed, giving a document vector space to select resources for such 'sense' checking.

montjoy · on March 10, 2024

IIRC confidence in video is related to predicting what happens next vs what actually happens. If the two seem to correlate to the model it would give it a higher confidence ranking, which would then be used further for self-reinforced learning.

astrange · on March 11, 2024

That's not how Dunning-Kruger works. There's never a point where incorrect people were more confident than correct people.

https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effec...

carlossouza · on March 10, 2024

> We find inspiration not from the size of a market, but from the importance of the work. Because the importance of the work is the early indicator of a future market.

> You are probably one ArXiv paper away from figuring this thing out.

> I used to be a dishwasher. I’ve cleaned a lot of toilets. I’ve cleaned more toilets than all of you combined. And some of them you can’t unsee. That’s life.

Great story and speech. Many lessons.