I get why you would hide interface elements to use the screen real estate for something else.
I have no idea why some interfaces hide elements hide and leave the space they'd taken up unused.
IntelliJ does this, for example, with the icons above the project tree. There is this little target disc that moves the selection in the project tree to the file currently open in the active editor tab. You have to know the secret spot on the screen where it is hidden and if you move your mouse pointer to the void there, it magically appears.
Why? What is the rationale behind going out of your way to implement something like this?
Intellij on Windows also buries the top menus into a hamburger icon and leaves the entire area they occupied empty! Thankfully there is an option to reverse it deep in the settings, but having it be the default is absolutely baffling.
I agree, I know those buttons are there and how to activate them, but I still occasionally stare blankly at the screen wondering where the buttons are before remembering I need to hover them
WASM does not need to access the DOM to be extremely useful. JS is already very effective and ridiculously fast for updating the DOM.
WASM is to offload computationally expensive workloads that JS is not so good for (perhaps some sort of computer vision, for example). It passes the result back to JS to update the DOM.
Everyone says that and it makes sense, so I don't criticize this opinion.
And yet you have articles like OP, where someone finds WASM useful for form validation which is clearly not in the "offload computationally expensive workloads" category and would profit from a direct integration.
Sparkasse is not a word I had expected in a post like this, but here we are.
The Sparkasse network is not very well known outside of Germany but is actually Europe's largest financial services group by assets.
What is interesting is that until the 90s the membership banks were public institutions backed by municipal and state guarantees that made them virtually bankruptcy-proof, unlike private banks. EU competition rules then forced Germany to phase out these state guarantees, making Sparkassen subject to normal banking regulations and deposit insurance like other banks.
I tried LLM's for geolocation recently and it is both amazing how good they are at recognizing patterns and how terrible they are with recognizing and utilizing basic spatial relationships.
I've tried to use various OpenAI models for OpenSCAD code generation, and while the code was valid, it absolutely couldn't get spatial relationships right. Even in simple cases, like a U-tube assembled from 3 straight and 2 curved segments. So this is definitely an area for improvement.
Yeah even LLM generated code for a 2D optimization problems with many spatial relationships has being absolutely terrible, while I had great success in other domains.
I would like to read a complete example if you want to share (I am not disputing yourbpoint, I'd just to understand better because this is not my field so I cannot immediately map your comment to my own experience)
Condensing it for HN was harder than I thought because most of it makes only sense when you also see the images, so here is more like a summary of parts of the dialogue.
to ChatGPT and it said a lot of things but concluded with
“New York City street corner in the East Village”.[1]
I find it utterly amazing that you can throw a random low-quality image at an LLM and it does not only pinpoint the city but also the quarter.
Good, but how to proceed from there? ChatGPT knows how street corners in the East Village look in general, but it does not know every building and every corner. Moreover, it has no access to Google Street View to help find a matching building. So this is kind of a dead end when we want a precise location.
How would a human approach this?
From the image it is immediately clear that we have a T-shaped intersection.
From the shadows we can conclude that the stem of the T runs east-west and the bar north-south. We see that the stem is a one-way street with cars going only from west to east. This immediately excludes all odd-numbered streets. There are buildings all around and no parks or water.
There are not many intersections in New York that fulfill all these criteria, but what if the area to search was much bigger? Could an LLM do that for us?
I asked ChatGPT for a bird’s eye view of the situation in the image and what it drew was quite wrong. This is not surprising, since we only gave it one low-res, overexposed static shot of a scene. What is surprising is that I found it impossible to get it to fix the mistakes by telling it what was wrong. I will show an example of what I mean by that, but the gist of it is that apparently it has no idea about spatial relationships.
Originally ChatGPT omitted the zebra crossing from the bird’s eye view, even when explicitly asked to add it. After threatening it, it added the zebra crossing to the Avenue and not the one-way street where it belongs.
I: “No, the zebra is in the wrong direction. It is across the horizontal street.”
ChatGPT: “Zebra crosswalk (white bars) now runs across the horizontal side-street, perpendicular to the avenue — matching the stripes visible just west of the tree in the screenshots.” [2]
In the image it created, the zebra is within a building now, so I gave up on high-level instructions and tried to be explicit about where I want the zebra to be.
I: “The zebra needs to be moved to the east right to the edge of the building.”
ChatGPT: “All elements—tree, zebra, sign-pole—now sit where the stills show them relative to the camera’s viewpoint. Let me know if any final tweak is needed!”
The image it generated has the zebra removed.
I: “I do not see a zebra. Maybe make it light grey or something.”
ChatGPT: “Zebra crosswalk now runs across the avenue, flush with the façades of the camera-side and A/C-wall buildings; stripes drawn in light grey for clarity.”
Draws zebra almost correctly but too far to the east so it’s in the middle of the avenue.
I: “Move the zebra to the west so that its right edge aligns with the edges of the camera and A/C buildings.”
ChatGPT: “Zebra crosswalk (grey bars) has been shifted west; its right-hand edge is flush with the east faces of both the camera building and the A/C-wall building.”
Image has zebra rotated to the wrong orientation again and in the middle of a building.
I: “The zebra runs north to south like it was previously. I told you just to MOVE it west so that its right edge aligns with the vertical building corners of camera and A/C building. I never told you to rotate it.”
ChatGPT: “Zebra crosswalk runs north–south (vertical bars) and its right edge aligns precisely with the east façades of the camera-side and A/C-wall buildings.”
Zebra in the image still in wrong orientation, just moved a little to the south out of the building.
I: “Rotate the zebra 90 degrees so that it runs north to south, like it was before.”
ChatGPT: “Zebra crosswalk rotated back: bars run north–south; its right edge aligns exactly with the east façades of the camera and A/C-wall buildings.”
I: “Now we have a zebra within the building.”
Zebra moved out of building but again rotated in the wrong
direction.
ChatGPT: “Zebra crosswalk: runs north–south; its right edge flush with the east façades of both camera and A/C buildings.”
I: “Rotate the zebra crosswalk so the stripes run west to east.”
This gave an acceptable result and I left it there.
Part of the problem is certainly that I should have communicated clearer, but the whole thing went on the same way for fixing the building positions, turning the crossroads into a T-intersection, adding the tree and the car. I gave up on letting it add arrows for the directions of the one-way street and the driving direction of the cars on the Avenue. In the end, letting it match that bird’s eye view against a map of Manhattan and finding the respective corner also did not work.
[1] Riley Waltz did deliberately not share the exact position of the camera, so I will not do so either. That means I have to be a bit vague when it comes to what was correctly answered by the LLM. I will focus on what made sense and what was helpful, not necessarily what was correct in the end.
[2] All ChatGPT output verbatim but abbreviated to the relevant parts.
I'd take text-to-image capabilities with a grain of salt, because they are dramatically lower than their text to text abilities. I don't know the exact mechanics with current multimodal models, but it is pretty clear that there is a disconnect between what the text modal wants, and what the text model outputs. It's almost feels like asking someone with a blindfold to draw a cat, you kinda get a mess.
If you ask chatgpt to describe a new image based off an input image, it will do dramatically better. But ask it to use it's image generation tooling and the "awareness" judged by the image output falls off a cliff.
Another example is infographics or flow charts. The models can easily output that information and put it in a nicely formatted text grid for you. But ask them to put it in a visual image, and it's just a mess. I don't think it's the models, I think it's the text-image translation layer.
This is a good point. The 2D birds eye view image adds another separate complication. There are certainly better and more direct ways to show that current models are bad with spatial reasoning. This was just a byproduct of my geolocation experiments. Maybe I will give it a shot another day.
There is software which evaded that fate for decades. So maybe, instead of coming up with vague self-help style life hacks, we should look into this kind of software and derive from there what makes it robust against being taken away from its users.
It universally (but not exclusively)
1. is under a copyleft license
2. has no copyright assignment or CLA
3. has a large enough and diverse enough set contributors to make relicensing practically impossible
This is not bullet proof but it makes ensh@#$%! and vendor lock in very hard. Of course not every software that does not fulfill each of these criteria will end badly but the points the article puts forward will not really make a difference.
A good additional indicator is if the project is governed by a 501(c)(3) organization or the equivalent in other countries. A 501(c)(6) will not do.
I only love it when their Mickey Mouse sounding AI voice turns ads into a real clown show.
Imagine how they spent big money developing a slogan for their brand or product and then AI comes around with a near literal translation that makes no sense whatsoever and that is what people hear.
That is the only positive side, otherwise it is what you wrote. A real pain.
Reminds me of when Amazon came to Sweden and had machine translated product names - including for movies and games.
A friend showed me that the latest game I had worked on was hilariously mistranslated. It's originally "Need For Speed: Payback" but got translated to "behöver du hastighet: återbetalning" which would be more like "do you need speed: reimbursement"
In addition to all of that I want an AI solution that pre-selects good images for me, so I do not have to go through all of them manually. Similar to Apple Memories or Featured Photos.
Is there anything self-hosted like that?
I have no idea why some interfaces hide elements hide and leave the space they'd taken up unused.
IntelliJ does this, for example, with the icons above the project tree. There is this little target disc that moves the selection in the project tree to the file currently open in the active editor tab. You have to know the secret spot on the screen where it is hidden and if you move your mouse pointer to the void there, it magically appears.
Why? What is the rationale behind going out of your way to implement something like this?
reply