Yeah, the article was painting with a bit too of a broad stroke IMO, though they did briefly acknowledge "special exceptions" such as satellite or medical imagery. It's very application-dependent.
That said, in my experience beginners do often overestimate how much image resolution is needed for a given task for some reason. I often find myself asking to retry their experiments with a lower resolution. There's a surprising amount of information in 128x128 or even smaller images.
I have a vivid memory of playing Rise of the Triad[1] against my buddy over serial cable. As most PC games from back then, it used mode 13h[2], so 320x200 resolution with a 256 color palette.
I have the distinct memory of firing a rocket at him from far away because I thought that one pixel had the wrong color, and killing him to his great frustration. Good times.
You can play the shareware portion of the game here[3] to get an idea.
There's been a huge amount of work on image transformers since the original VIT. A lot of it has explored different schemes to slice up the image in tokens, and I've definitely seen some of it using a multiresolution pyramid. Not sure about the RL part - after all, the higher/low-res levels of the pyramid would add less tokens than the base/high-res level, so it doesn't seem that necessary. But given the sheer volume of work out there I can bet someone has explored this idea or something pretty close to it already.
Slicing up images to analyze them is definitely something people do - in many cases, such as satellite imagery, there is not much alternative. But it should be done mindfully, especially if there are differences between the training and testing steps. Depending on the architecture and the application, it's not the same as processing the whole image at once. Some differences are more or less obvious (for example, you might have border artifacts), but others are more subtle. For example, contrary to the expected positional equivariance of convolutional nets, they can implicitly encode positional information based on where they see border padding during training. For some types of normalization such as instance normalization, the statistics of the normalization may vary significantly when applied across patches or whole images.
$8000 also seems pretty cheap for 2PB of traffic? Looking at google cloud storage egress rates, $0.02/GiB (which is on the lower end, since it depends on destination) would be about $40k for 2PB.
Honestly, I think that if in 2020 you had asked me whether we would be able to do this in 2025, I would've guessed no, with a fairly high confidence. And I was aware of GPT back then.
The "non-intrusive" part is interesting. I've bit the bullet with AI assistance when coding - even when it feels like it gets in the way sometimes, overall I find it a net benefit. But I briefly tried AI in the shell with the warp terminal and found it just too clunky and distracting. I wasn't even interested in the AI features, just wanted to try a fancy new terminal. Not saying warp might not be useful for some people, just wasn't for me. So far I've found explicitly calling for assistance with a CLI command (I've used aichat for this, but there's several out there) to be more useful in those occasional instances where I can't remember some obscure flag combination.
Uninstalled warp, as the whole thing felt clunky and slow, and never even turned on the AI. You can accomplish everything it does with zsh + plugins without much fuss.
Same it performs like an electron app. And the whole must login thing soured me from the start. I don't need SaaS practices for the terminal and I don't trust that they aren't snooping either.
Didn't expect to see the M8 when opening this. It's truly an amazing device, and the tables are a really awesome addition to the typical audio modulation option.
that would be pretty confusing, as it happens M8 was partially inspired by another tracker called LGPT (Little Game Park Tracker, named after the device it ran on).
I'm not an expert on GB chiptune, but from what I've heard from enthusiasts is that different GB models sound different, and even within the same model there are variations. That said, it wouldn't surprise me if the GB waveforms aliasing, at least from the digital side, given that it was operating with pretty minimal synthesis capabilities. There's probably some extra low- and high-pass filtering that shape the sound after the waveform itself is generated. Looking at some examples of actual GB PWM waveforms, for sure some high-pass would make a pure PWM waveform more GB-like. And some low-pass could help a bit with aliasing.
That said, in my experience beginners do often overestimate how much image resolution is needed for a given task for some reason. I often find myself asking to retry their experiments with a lower resolution. There's a surprising amount of information in 128x128 or even smaller images.
reply