Intel's Battlemage Architecture

jorvi · 2025-02-11T17:53:54 1739296434

> Unfortunately, today’s midrange cards like the RTX 4060 and RX 7600 only come with 8 GB of VRAM

Just a nit: one step up (RX 7600 XT) comes with 16GB memory, although in clamshell configuration. With the B580 falling inbetween the 7600 and 7600 XT in terms of pricing, it seems a bit unfair to only compare it with the former.

- RX 7600 (8GB) ~€300

- RTX 4060 (8GB) ~€310

- Intel B580 (12GB) ~€330

- RX 7600 XT (16GB) ~€350

- RTX 4060 Ti (8GB) ~€420

- RTX 4060 Ti (16GB) ~€580*

*Apparently this card is really rare plus a bad value proposition, so it is hard to find

qball · 2025-02-11T20:01:27 1739304087

All RTX xx60 cards are really bad value propositions, though (especially in comparison to the xx80 series cards).

If the 4060 was the 3080-for-400USD that everyone actually wants, that'd be a different story. Fortunately, its nonexistence is a major contributor to why the B580 can even be a viable GPU for Intel to produce in the first place.

jorvi · 2025-02-11T20:25:27 1739305527

Not all of them. The 3060 Ti was great because it was actually built on the same underlying chip as the 3070 and 3070 Ti. Which ironically made those less valuable.

But the release of those cards was during Covid pricing weirdness times. I scored a 3070 Ti at €650, whilst the 3060 Ti's that I actually wanted were being sold for €700+. Viva la Discord bots.

glenneroo · 2025-02-12T01:44:12 1739324652

I believe 3060 Ti's were in higher demand because they were a great value for shitcoin mining, especially after tuning (e.g. undervolting).

RachelF · 2025-02-12T00:50:58 1739321458

AMD and Intel copy Nvidia's VRAM lead.

The RTX 1060 came with 6GB of VRAM. Four generations later, the 5060 comes with only 2GB more.

I suspect NVidia does not want consumer cards to eat into those lucrative data centre profits?

The cost of 1GB of VRAM is $2.30 see https://www.dramexchange.com

Moto7451 · 2025-02-12T01:39:54 1739324394

I don’t think it’s accurate to say they’re copying NVidia’s lead. On the mid range it’s been segregated on memory and bus width for a very long time. Your 1060 is a good example actually. The standard GDDR5 versions have a reduced die with six memory controllers vs eight on the 1070 and 1080. The 1060 GDDR5X version a cut down version of the same die as the 1080 and with two memory controllers turned off. The odd sizes of 3 and 6 gigs of memory is due to the way they segmented their chips to have a 192bit bus on the 1060 vs the 256bit bus on the top end. The 5GB version is further chopped down to 160bit.

Those parts competed with the RX480 with 8GB of memory so NVidia was behind AMD at that price point.

AMD had not been competing with the *80/Ti cards at this point for a few generations and stuck with that strategy through today though the results have gotten better SKU to SKU.

And you’re quite right they don’t want these chips in the data center and at some point they didn’t really want these cards competing in games with the top end when placed in SLI (when that was a thing) as they removed the connector from the mid range.

est · 2025-02-12T10:23:20 1739355800

the VRAM chip is cheap, but how to inter-connect them at high speed isn't

michaelt · 2025-02-12T14:21:47 1739370107

If you want to double the memory and double the total memory bandwidth, sure. That'd need twice as many data lines, or the same lines at twice the speed.

But if you just want to double the memory without increasing the total memory bandwidth, isn't it a good deal simpler? What's 1 more bit on the address bus for a 256 bit bus?

kbolino · 2025-02-12T15:44:07 1739375047

The GPU already has DMA to system RAM. If you're going to make the VRAM as slow as system RAM, then a UMA makes more sense than throwing more memory chips on the GPU.

1W6MIC49CYX9GAP · 2025-02-12T17:30:42 1739381442

Why would you slow down VRAM?

kbolino · 2025-02-12T17:45:59 1739382359

Good point. I misunderstood the situation. I figured doubling the VRAM size at the same bus width would halve the bandwidth.

Instead, it appears entirely possible to double VRAM size (starting from current amounts) while keeping the bus width and bandwidth the same (cf. 4060 Ti 8GB vs. 4060 Ti 16GB). And, since that bandwidth is already much higher than system RAM (e.g. 128-bit GDDR6 at 288 GB/s vs DDR5 at 32-64 GB/s), it seems very useful to do so, though I'd imagine games wouldn't benefit as much as compute would.

jorvi · 2025-02-13T00:33:42 1739406822

Actually, it's compute workloads that love bandwidth, they just have hard thresholds on how much memory they need.

You can see this with overclocking VRAM. Greatly benefits mining, slightly or even negatively benefits gaming workloads.

This extends to system RAM too, most applications will see more benefit from better access times rather than higher MT/s.

immibis · 2025-02-12T12:20:55 1739362855

But having the VRAM allows you to run the model on the GPU at all, doesn't it? A card with 48GB can run twice as much model than a card with 24GB, even though it takes twice as long. Nobody is expecting to run twice as much model in the same time just by increasing the VRAM.

Without the extra VRAM, it takes hundreds of times divided by batch size longer due to swapping, or tens of times longer consistently if you run the rest of the model on the CPU.

clamchowder · 2025-02-11T20:50:17 1739307017

(author here) When I checked the 7600 XT was much more expensive. Right now it's still $360 on eBay, vs the B580's $250 MSRP, though yeah I guess it's hard to find the B580 in stock

jorvi · 2025-02-11T20:58:54 1739307534

Yeah I guess regional availability really works into it.. bummer

I wonder if the B580 will drop to MSRP at all, or if retailers will just keep it slotted into the greater GPU line-up the way it is now and pocket the extra money.

mananaysiempre · 2025-02-11T18:08:58 1739297338

All sources I've seen say the 4060 Ti 8GB is also really bad value. Here’s GamersNexus for example: https://www.youtube.com/watch?v=Y2b0MWGwK_U.

jandrese · 2025-02-11T22:31:48 1739313108

And that is also one of the most popular cards on prebuilt systems. Just search through Amazon listings and see which card shows up all the damn time.

kbolino · 2025-02-12T15:51:09 1739375469

Prebuilders get priority access and volume discounts, so while it may not be a good value to buy individually, that doesn't necessarily apply to buying it in bulk.

hassleblad23 · 2025-02-11T18:22:00 1739298120

> Intel takes advantage of this by launching the Arc B580 at $250, undercutting both competitors while offering 12 GB of VRAM.

Not sure where you got that 350 EUR number for B580?

xmodem · 2025-02-11T18:33:17 1739298797

330 EUR is roughly reflective of the street price of the B580 in Europe.

For example:

https://www.mindfactory.de/product_info.php/12GB-ASRock-Inte... (~327 EUR)

https://www.overclockers.co.uk/sparkle-intel-arc-b580-guardi... (~330 EUR)

https://www.inet.se/produkt/5414587/acer-arc-b580-12gb-nitro... (~336 EUR)

KronisLV · 2025-02-11T23:35:10 1739316910

Can confirm, bought mine for about 350 EUR in Latvia from a store that's known to add a bit of markup on things.

Though the market is particularly bad here, because an RTX 3060 12 GB (not Ti) costs between 310 - 370 EUR and an RX 7600 XT is between 370 - 420 EUR.

Either way, I'm happy that these cards exist because Battlemage is noticeably better than Alchemist in my experience (previously had an A580, now it's my current backup instead of the old RX 570 or RX 580) and it's good to have entry/mid level cards.

AnotherGoodName · 2025-02-12T00:02:10 1739318530

On Newegg the cheapest in stock is USD$370 as an example. This is consistent for Intel cards unfortunately.

The reviews will say "decent value at RRP" but Intel cards never ever sell anywhere near RRP meaning that when it comes down to it you're much better off not going Intel.

I feel like reviews should all acknowledge this fact by now. "Would be decent value at RRP but not recommended since Intel cards are always %50 over RRP".

muststopmyths · 2025-02-12T00:56:58 1739321818

I bought the Asus version of the B580 at MSRP of $280 on launch day.

Central Computers in the SF Bay Area keeps them at MSRP. They may not be in stock online, but the stores frequently have stock, especially San Mateo.

Not useful to people outside the area, but then Microcenter also sells at MSRP. So there are non-scalping stores out there.

The trick is to jump on the stock when it arrives.

compsciphd · 2025-02-12T14:23:43 1739370223

newegg doesn't have any stock of B580s at hte moment, you're looking at 3rd party sellers who are raising prices (and hence why have stock).

https://www.newegg.com/p/pl?d=b580&N=8000 to see sold by newegg stock.

lotharcable2 · 2025-02-12T17:29:19 1739381359

I have a Rx 7600 XT that I purchased to run Ollama LLMs. Something just to screw around with.

Works fine with their ollama:rocm docker image on Fedora using podman. No complaints.

Did some gaming, too, just to see how well that works. A few steam games.

mrbonner · 2025-02-11T20:11:06 1739304666

Let me know where you could find 4060Ti 16GB for under $1000 USD

hedgehog · 2025-02-11T20:46:43 1739306803

What's annoying is they were under $500 just a few months ago.

donflamenco · 2025-02-12T06:23:05 1739341385

Bestbuy has the PNY 4060 Ti 16GB in stock right now for $450.

qingcharles · 2025-02-12T08:17:27 1739348247

Not any longer...!

This card still seems like a bad proposition. It's roughly similar performance to the 11GB 2080 Ti for double the price. You'd have to really want that extra 5GB.

donflamenco · 2025-02-12T18:00:41 1739383241

I still see it in stock for pick up in Bay Area and in Seattle. I tried a Montana zip code and it showed it as available also.

https://www.bestbuy.com/site/pny-nvidia-geforce-rtx-4060-ti-...

Most people who want the 4060 Ti 16GB is because they want the 16GB for running LLMs. So yes, they really want that extra 5GB.

I'm actually tempted, but I don't know if I should go for a Mac Studio M1 Max 64GB for $1350 (ebay) or build a PC around a GPU. I think the Mac makes a lot of sense.

buck746 · 2025-02-18T20:28:28 1739910508

I have an M2 Max with 64Gb of ram. It handles everything I throw at it. Runs the 30ish gigabyte deepseek model fine. I will admit for gaming I pretty much just stick to Cyberpunk 2077, Minecraft, Stray, Viscera Cleanup Simulator and old games with open source engine options. I'm happy I can play Cyberpunk with my screen on full brightness using 30w, compared to my Xeon windows machine taking 250w for lower frame rates.

netbioserror · 2025-02-11T16:58:25 1739293105

A lot of commentators have pointed out that Intel is reaching nowhere near the performance/mm2 of Nvidia or AMD designs, though contrary to what I thought that might imply, it seems that power consumption is very much under control on Battlemage. So it seems the primary trade-off here is on the die cost.

Can anyone explain what might be going on here, especially as it relates to power consumption? I thought (bigger die ^ bigger wires -> more current -> higher consumption).

kimixa · 2025-02-11T17:21:45 1739294505

Increasing clocks tends to have a greater-than-linear cost on power, as you need transistors to switch quicker so often need a higher voltage, which causes more leakage and other losses on top of the switching cost itself (that all turn into heat). Higher clock targets also have a cost for the design itself, often needing more transistors for things like extra redrivers to ensure you get fast switching speed, or even things like more pipeline stages. Plus not all area is "transistors" - it's often easier to place related units that need a lot of interconnectivity with shorter interconnects if an adjacent, less interconnected unit isn't also trying to be packed into much of the same space, routing on modern chips is really difficult (and a place where companies can really differentiate by investing more).

For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

For GPUs generally that's just part of the pricing and cost balance, a larger lower clocked die would be more efficient, but would that really sell for as much as the same die clocked even higher to get peak results?

netbioserror · 2025-02-11T19:34:03 1739302443

>For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.

(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)

bgnn · 2025-02-11T21:28:22 1739309302

Yeah the power consumption scales, to first order, with Vdd^2 (square of power supply voltage) but performance scales with Vdd. Though you cannot simply reduce the Vdd and clock rate and do more pipelining etc to gain back the performance. If you are willing to back off on performance a bit you can gain hugely on power. Plus thermal management of it is more manageable.

cubefox · 2025-02-12T00:08:42 1739318922

> Increasing clocks tends to have a greater-than-linear cost on power

Old source, but this says the power cost of increasing the clock frequency is cubic: https://physics.stackexchange.com/questions/34766/how-does-p...

bloomingkales · 2025-02-11T17:23:50 1739294630

They are holding back the higher vram models of this card. GPU makers always do some nerfing of their cards in the same product line. Often times there’s no good reason for this other than they found specs that they can market and sell simply by moving voltages around.

Anyway, expecting good earnings throughout the year as they use Battlemage sales to hide the larger concerns about standing up their foundry (great earnings for the initial 12gb cards, and so on for the inevitable 16/24gb cards).

tonetegeatinst · 2025-02-11T17:05:34 1739293534

It mainly seems to boil down to design choice and process technology.

They might be targeting a lower power density per squad mm than compared to amd or nvidia, focusing more on lower power levels.

Instruction set architecture and layout of the chips and PCB also factor into this as well.

elric · 2025-02-11T21:11:07 1739308267

I couldn't find any information regarding power consumption in the article. I'd love to upgrade my aging gaming rig, but all modern AMD/Nvidia graphics cards consume significantly more power than my current card.

MisterTea · 2025-02-11T17:08:51 1739293731

> I thought (bigger die ^ bigger wires -> more current -> higher consumption).

I am not a semi expert but bigger die doesn't mean bigger wires if you are referring to cross-section, the wires would be thinner meaning less current. Power is consumed pushing and pulling electrons from the transistor gates which are all of the FET type, field effect transistor. The gate is a capacitor that needs to be charged to open the gate to allow current to flow through the transistor. discharging the gate closes it. That current draw then gets multiplied by a few billion gates so you can see where the load comes from.

williamDafoe · 2025-02-11T18:38:01 1739299081

Actually the wires don't scale down like the transistors do. I remember in graduate school taking VLSI circuit complexity theory and the conclusion was for two dimensional circuits the wires will end Moore's Law. However I've seen articles about backside power delivery and they are already using seven+ layers so the wires are going through three dimensions now. Copper interconnects were a one-time bonus in the late 90s and after that wires just don't scale down-signal delay would go up too fast. Imagine taking a city with all the streets and houses and the houses now become the size of dog houses but you can't shrink the streets they have to stay the same size to carry signals quickly!

gruez · 2025-02-11T17:09:26 1739293766

>I thought (bigger die ^ bigger wires -> more current -> higher consumption).

All things being equal, a bigger die would result in more power consumption, but the factor you're not considering is the voltage/frequency curve. As you increase the frequency, you also need to up the voltage. However, as you increase voltage, there's diminishing returns to how much you can increase the frequency, so you end up massively increasing power consumption to get minor performance gains.

wmf · 2025-02-11T17:13:23 1739294003

If it's a similar number of transistors on a larger die then I can believe the power consumption is good. Less dense layout probably requires less design effort and may reduce hotspots.

If Intel is getting similar performance from more transistors that could be caused by extra control logic from a 16-wide core instead of 32.

p1necone · 2025-02-12T00:40:48 1739320848

performance/mm2

This strikes me as not a particularly useful metric, or at least one only indirectly related to the stuff that actually matters.

Performance/watt and performance/cost are the only metrics that really matter both to consumer and producer - performance/die size is only used as a metric because die size generally correlates to both of those. But comparing it between different manufacturers and different fabs strikes me as a mistake (although maybe it's just necessary because identifying actual manufacturing costs isn't possible?).

myrmidon · 2025-02-11T18:31:25 1739298685

Loosely related question:

What prevents manufacturers from taking some existing mid/toprange consumer GPU design, and just slapping like 256GB VRAM onto it? (enabling consumers to run big-LLM inference locally).

Would that be useless for some reason? What am I missing?

elabajaba · 2025-02-11T19:27:20 1739302040

The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).

Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).

Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).

A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).

HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.

lostmsu · 2025-02-11T20:51:31 1739307091

What of previous generations of HBM? Older consumer AMD GPUs (Vega) and Titan V had HBM2. According to https://en.wikipedia.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get 16GB with 1TB/s for $700 at release. It is no longer use in data centers. I'd gladly pay $2800 for 48GB with 4TB/s.

Tuna-Fish · 2025-02-12T09:34:33 1739352873

Previous generation of HBM is not any cheaper than the current ones, and it is no longer in production, the lines having shifted to the new stuff.

IanCutress · 2025-02-12T09:54:29 1739354069

HBM2 is still in volume production. New products coming out with it on the ASIC side. Gaudi 3 uses HBM2e.

mppm · 2025-02-12T12:28:46 1739363326

Interesting. So a 32-chip GDDR6 clamshell design could pack 64GB VRAM with about 2TB/s on a 1024bit bus, consuming around 100W for the memory subsystem? With current chip prices [1], this would cost just about 200$ (!) for the memory chips, apparently. So theoretically, it should be possible to build fairly powerful AI accelerators in the 300W and < 1000$ range. If one wanted to, that is :)

1. https://dramexchange.com/

devit · 2025-02-11T22:53:04 1739314384

I wonder if a multiplexer would be feasible?

Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.

Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.

API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).

This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.

Tuna-Fish · 2025-02-12T09:33:29 1739352809

Not at GDDR speeds.

GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.

You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.

nenaoki · 2025-02-12T15:29:18 1739374158

>A clamshell 5090 with 2GB modules has a max limit of 64GB

How does "clamshelling" get around the 32-bits per module requirement? Do the two 2GB modules act as one 4GB module when clamshelled?

m4rtink · 2025-02-12T10:53:49 1739357629

So I guess we just wait for HBM to get cheaper and better, which should not take too long, given how much money is being pumped into it ?

reginald78 · 2025-02-11T19:07:59 1739300879

You'd need memory chips with double the memory capacity to slap the extra vram in, at least without altering the memory bus width. And indeed, some third party modded entries like that seem to have shown up: https://www.tomshardware.com/pc-components/gpus/nvidia-gamin...

As far as official products, I think the real reason another commentator mentioned is that they don't want to cannibalize their more powerful card sales. I know I'd be interested in a lower powered card with a lot of vram just to get my foot in the door, that is why I bought a RTX 3060 12GB which is unimpressive for gaming but actually had the second most vram available in that generation. Nvidia seem to have noticed this mistake and later released a crappier 8GB version to replace it.

I think if the market reacted to create a product like this to compete with nvidia they'd pretty quickly release something to fit the need, but as it is they don't have too.

SunlitCat · 2025-02-11T22:41:02 1739313662

The 3060 with 12GB was an outlier for it's time of release because the crypto (currency) hype was raging at that moment and scalpers, miners and everyone in between were buying graphic cards left and right! Hard times were these! D:

Animats · 2025-02-11T21:14:18 1739308458

There are companies in China doing that, recycling older NVidia GPUs.[1]

[1] https://www.reddit.com/r/hardware/comments/182nmmy/special_c...

protimewaster · 2025-02-11T19:13:03 1739301183

You can actually getting GPUs from the Chinese markets (e.g., AliExpress) that have had their VRAM upgraded. Someone out there is doing aftermarket VRAM upgrades on cards to make them more usable for GPGPU tasks.

Which also answers your question: The manufacturers aren't doing it because they're assholes.

nenaoki · 2025-02-12T17:38:27 1739381907

These are a bit mythical, finding one for sale is no small feat.

I guess adding memory to some cards is a matter of completely reworking the PCB, not just swapping DRAM chips. From what I can find it has been done, both chip swaps and PCB reworks, it's just not easy to buy.

Software support is of course another consideration.

ksec · 2025-02-11T18:40:07 1739299207

Bandwidth. GDDR / HBM, both used by GPU depending on usage are high bandwidth low capacity, comparatively speaking. Modern GPU tries to put more VRAM with more memory channel up to 512bit but requires more die space and hence are expensive.

We will need a new memory design for both GDDR and HBM. And I wont be surprised they are working on it already. But hardware takes time so it will be few more years down the road.

patmorgan23 · 2025-02-11T18:33:52 1739298832

Because then they couldn't sell you the $10k enterprise GPU

RachelF · 2025-02-12T00:55:57 1739321757

True, but it is mostly profit - GDDR6 sells for $2.30 a gigabyte [1]

[1]https://www.dramexchange.com

Tuna-Fish · 2025-02-12T09:38:04 1739353084

That's for 8Gbit chips, which are more or less unusable in modern products. 16Gbit chips are at ~$8, or $4 per GB.

Culonavirus · 2025-02-12T09:38:09 1739353089

10? Try 30+ ...

The_Colonel · 2025-02-12T06:12:55 1739340775

> enabling consumers to run big-LLM inference locally

A non-technical reason is that the market of people wanting to run their personal LLMs at home is very small.

numpad0 · 2025-02-12T14:12:56 1739369576

Not sure where I read this and am paraphrasing a lot, but: there's a point where `RAM bandwidth < processor speed` becomes `true`, and processor becomes architecturally data starved.

As in, a 32bit CPU that runs at 1 giga instruction/second, with a 16 Gbps memory bus, could get up to 0.5 instruction per clock, and that's not very useful. For this reason there can't be an absolute potato with gigantic RAM.

How gigantic is not useful, idk.

fulafel · 2025-02-11T19:19:50 1739301590

Seems some years away to get that into consumer price range.

newsclues · 2025-02-12T13:43:02 1739367782

NVidia sells memory and GPUs as bundles to board partners.

If you harm their profit good luck continuing to have access to GPU chips. It’s a cartel.

singhrac · 2025-02-11T23:05:52 1739315152

There's some rumors of an Arc Pro, which would be a B580 in clamshell configuration with 24 GB of VRAM (which iiuc would be the same memory bandwidth unfortunately). Unless the price is absurd it would be the cheapest dollar/VRAM card at 24 GB.

mrandish · 2025-02-12T02:03:30 1739325810

That would be nice for AI-curious users and hobby experimenters, however gamers won't find near-term value in VRAM beyond 16GB. My concern is that due to Intel's severe financial challenges, their CEO du jour will end up killing off the discrete GPU business.

Back when they (finally) got into dGPUs seriously, Intel (and everyone else) said it would take many years and the patience to tolerate break-even products and losses while coming up the learning curve. Currently, it looks pretty much impossible to sustain ongoing profitability in low-end GPUs. Given gamer's current performance expectations vs the manufacturing costs to hit those targets, mid-range GPUs ($500-$750) seem like the minimum to have broad enough appeal to be sustainably profitable. Unfortunately, Intel is still probably years away from a competitive mid-range product. Sadly, the market has evolved weirdly such that there's now a minimum performance threshold preventing scaling price/performance down linearly below $300. The problem for Intel is they waited too long to enter the dGPU race, so now this profit gap coincides with no longer having the excess CPU profits to go in the red for years. Instead they squandered billions doing stupid stuff like buying McAfee.

treve · 2025-02-11T17:44:44 1739295884

I wonder if these GPUs are good options for Linux rigs and if first-party drivers are made.

mtlmtlmtlmtl · 2025-02-11T19:27:14 1739302034

Been running Linux on the A770 for about 2 years now. Very happy with the driver situation. Was a bit rough very early on, but it's nice and stable now. Recommend at least Linux 6.4, but preferably newer. I use a rolling release distro(Artix) to get up to date kernels.

ML stuff can be a pain sometimes because support in pytorch and various other libraries is not as prioritised as CUDA. But I've been able to get llama.cpp working via ollama, which has experimental intel gpu support. Worked fine when I tested it, though I haven't actually used it very much, so don't quote me on it.

For image gen, your best bet is to use sdnext(https://github.com/vladmandic/sdnext), which supports Intel on linux officially, and will automagically install the right pytorch version, and do a bunch of trickery to get libraries that insist on CUDA to work in many of the cases. Though some things are still unsupported due to various libraries still not supporting intel on Linux. Some types of quantization are unavailable for instance. But at least if you have the A770, quantisation for image gen is not as important due to plentyful VRAM, unless you're trying to use the flux models.

immibis · 2025-02-12T12:27:31 1739363251

I also have an A770. Don't use it for AI, but it runs fine for general 3D use (which mostly means either Minecraft, other similarly casual games, or demoscene shaders). I'm pretty sure I'm not utilizing it fully most of the time.

My main complaint is that the fan control just doesn't work. They stay at low speed or off no matter how hot the card gets, until it shuts down due to overheating. Apparently there's a firmware update to fix this, but you need Windows to flash it. You can zip-tie a spare fan somewhere pointing at the card...

Secondary complaint is that it's somehow not compatible with Linux's early boot console, so there's no graphical output until the driver is loaded. You'd better have ssh enabled while setting it up.

It's also incompatible with MBR/BIOS boot since it doesn't include an option ROM or whatever is needed to make that work - so I switched to UEFI (which I thought I was already using).

When I ran a shader "competition" some people's code with undefined behaviour ran differently on my GPU than theirs. That's unavoidable regardless of brand and not an Intel thing at all.

bradfa · 2025-02-11T18:04:57 1739297097

Yes, first party drivers are made. Upstream Linux and mesa project should have good support in their latest releases. If you're running a non-bleeding edge distro, you may need to wait or do a little leg work to get the newer versions of things, but this is not unusual for new hardware.

If you're running Ubuntu, Intel has some exact steps you can follow: https://dgpu-docs.intel.com/driver/client/overview.html

dingi · 2025-02-11T18:27:41 1739298461

In fact, Intel has been a stellar contributor to the Linux kernel and associated projects, compared to all other vendors. They usually have launch day Linux support provided that you are running a bleeding edge Linux kernel.

baq · 2025-02-11T18:02:55 1739296975

Of all the god awful Linux GPU drivers Intel's are the least awful IME. Unless you're talking purely compute, then nvidia, have fun matching those cuda versions though...

dralley · 2025-02-11T22:26:47 1739312807

AMD's Linux drivers are pretty good. I get better performance playing games through Proton on Linux than I do playing the same games on Windows, despite whatever overhead the translation adds.

The only really annoying bug I've run into is the one where the system locks up if you go to sleep with more used swap space than free memory, but that one just got fixed.

ZeWaka · 2025-02-11T18:11:43 1739297503

I use an Alchemist series A380 on my nix media server, but it's absolutely fantastic for video encoding.

VTimofeenko · 2025-02-11T18:35:36 1739298936

Same; recently built SFF with low profile A310. Goes through video streams like hot knife through butter.

Do you have your config posted somewhere? I'd be interested to compare notes

ZeWaka · 2025-02-12T09:22:44 1739352164

Not particularly, no. I mainly followed the Jellyfin setup (https://jellyfin.org/docs/general/administration/hardware-ac...) and tweaked some things like fan speeds to my preferences. General linuxserver.io containers, and ITX.

VTimofeenko · 2025-02-12T15:57:31 1739375851

Got it. I went native; NixOS wiki has an example of an overlay

bee_rider · 2025-02-11T18:55:36 1739300136

I have always associated Intel iGPUs with good drivers but people seem to often complain about their Linux dGPU drivers in these threads. I hope it is just an issue of them trying to break into a new field, rather than a slipping of their GPU drivers in general…

jorvi · 2025-02-11T20:34:16 1739306056

Intel switched over to a new driver for dGPUs and any iGPU newer than Skylake(?).

The newest beta-ish driver is Xe, the main driver is Intel HD, and the old driver is i915.

People complaining experienced the teething issues of early Xe builds.

sirn · 2025-02-12T02:00:21 1739325621

i915 is still the main kernel mode driver on Linux for every Intel GPUs up to Alchemist. xe kmd is used by Battlemage by default (as of 6.12).

There's a Mesa DRI driver, called i965 (originally made for Broadwater chipset, thus the 965 numbering), which has since been replaced by either:

- Crocus for anything up to Broadwell (Gen 8)

- Iris for anything from Broadwell and newer

Then there's a Video Acceleration driver, which is (also) called i965. I think this is what you're referring to. There are:

- i965 (aka Intel VAAPI Driver), which supports anything from Westmere (Gen 5) to Coffee Lake (Gen 9.5)

- iHD (aka Intel Media Driver), is a newer one, which supports anything from Broadwell (Gen 8)

- libvpl, an even newer one, which supports anything from Tiger Lake (Gen 12) and up

Battlemage users had to use libvpl until recently because Media Driver 2024Q4 with BMG support was only released 2 weeks ago. Using libvpl with ffmpeg may requires rebuilding ffmpeg, as some distro doesn't have it enabled (due to conflict with legacy Intel Media SDK, so you have to choose).

I have B580 for my Linux machine (6.12), and xe seems pretty stable/performant so far.

nullify88 · 2025-02-12T05:03:55 1739336635

I am always confused about which drivers need installing to fully enable all hardware acceleration features on Broadwell. Also not all distros maintain the drivers equally resulting in mismatches between the vaapi driver or some other driver.

sirn · 2025-02-12T07:38:08 1739345888

My rough mental Intel driver matrix is something like this (might not be entirely correct):

    iGPU:

    | Arch                                         | KMD     | DRI (Mesa)  | Vulkan (Mesa)  | VA         |
    | -------------------------------------------- | ------- | ----------- | -------------- | ---------- |
    | <  Broadwater (Gen4)                         | i915    | i915        | N/A            | N/A        |
    | >= Broadwater (Gen4), < Westmere             | i915    | i915        | N/A            | i965       |
    | >= Westmere (Gen5), < Haswell                | i915    | crocus      | N/A            | i965       |
    | >= Haswell (Gen7), < Broadwell               | i915    | crocus      | hasvk          | i965       |
    | == Broadwell (Gen8)                          | i915    | iris/crocus | anv/hasvk      | iHD        |
    | >= Skylake (Gen9), < Tiger Lake              | i915    | iris        | anv            | iHD        |
    | >= Tiger Lake (Xe/Gen12), < Lunar Lake (Xe2) | i915/xe | iris        | anv            | iHD/libvpl |
    | >= Lunar Lake (Xe2)                          | xe      | iris        | anv            | iHD/libvpl |

    dGPU:

    | Arch                                  | KMD     | DRI (Mesa)  | Vulkan (Mesa)  | VA         |
    | ------------------------------------- | ------- | ----------- | -------------- | ---------- |
    | >= DG1 (Xe/Gen12.1), Battlemage (Xe2) | i915/xe | iris        | anv            | iHD/libvpl |
    | >= Battlemage (Xe2)                   | xe      | iris        | anv            | iHD/libvpl |

Usually, KMD/DRI/Vulkan should work as-is if you use a reasonably recent kernel and mesa, but video acceleration sure is a bit of a mess.

elabajaba · 2025-02-12T00:21:58 1739319718

Intel GPU drivers have always been terrible. There's so many features that are just broken if you try to actually use them, on top of just generally being extremely slow.

Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.

everfrustrated · 2025-02-11T18:45:53 1739299553

Intel also have up-streamed their video encoding acceleration support into software like ffmpeg.

Intel Arc gpus also support hardware video encoding for the AV1 codec which even the just released Nvidia 50 series still doesn't support.

lostmsu · 2025-02-11T20:18:40 1739305120

This is wrong. AV1 encoding is supported since Nvidia 40 series.

jcarrano · 2025-02-11T22:19:53 1739312393

Last year I was doing a livestream for a band. The NVidia encoder on my friend's computer (running Windows) just wouldn't work. We tried in vain to install drivers and random stuff from Nvidia. I pulled out my own machine with Linux and Intel iGPU and not only did it worked flawlessly, but did so on battery and with charge to spare.

On the other hand, I have to keep the driver for the secondary GPU (also intel) blacklisted because last time I tried to use it it was constantly drawing power.

ThaDood · 2025-02-11T18:05:43 1739297143

Here are some benchmarks from a few months back. Seems promising. https://www.phoronix.com/review/intel-arc-b580-gpu-compute

Whoops - included the wrong link! https://www.phoronix.com/review/intel-arc-b580-graphics-linu...

daneel_w · 2025-02-11T21:54:15 1739310855

Missing detail: 190 watt TDP.

glitchc · 2025-02-11T18:22:08 1739298128

Double the memory for double the price and I would buy one in a heartbeat.

kevincox · 2025-02-12T21:57:06 1739397426

There are a tons of products that I would buy if I could double a single spec for the same price.

talldayo · 2025-02-11T22:35:53 1739313353

If your application is video transcoding or AI inference, you could probably buy two and use them in a multi-GPU configuration.

glitchc · 2025-02-13T19:43:45 1739475825

Hate futzing around with multi-GPU configurations. It's always a bit of a mess from a driver perspective, not to mention all the extra power connectors needed, even though this card only requires two.

MezzoDelCammin · 2025-02-12T10:43:09 1739356989

This would have been a great card for a homelab if only they haven't decided to move away from SR-IOV in their consumer GPUs.

AFAIK it used to be possible to get some SR-IOV working on the previous Alchemists (with some flashing), but Battlemage seems like a proof of Intel abandoning the virtualization/GPU splitting in the consumer space altogether.

taurknaut · 2025-02-11T18:22:52 1739298172

I don't really care about how it performs so long as it's better than a CPU. I just want to target the GPU myself and remove the vendor from the software equation. Nvidia has taught me there isn't any value that can't be destroyed with sufficiently bad drivers.

rajnathani · 2025-02-12T06:19:06 1739341146

Slightly tangential: Nvidia has over 22.5K patents: https://patents.google.com/?assignee=Nvidia+Corporation

cedws · 2025-02-12T18:38:58 1739385538

I really don’t see the argument for patents. It just slows down healthy competition in Western countries while China disregards them and surges ahead. How can we expect to compete when they don’t play by the same rules?

joelthelion · 2025-02-11T19:03:56 1739300636

That's cool and all but can you use it for deep learning?

coderenegade · 2025-02-11T23:07:52 1739315272

You can. You need a recent Linux kernel, but pytorch now officially supports Intel's extensions (xpu). These are actually a decent consumer proposition because the bottleneck for most people training models on their own hardware is VRAM. These have substantially more VRAM than anything in their price bracket, and are priced competitively enough that you could buy two and have a pretty solid training setup. Or one for training and one for inference.

joelthelion · 2025-02-12T10:40:41 1739356841

Very interesting, thank you!

SG- · 2025-02-11T20:42:02 1739306522

it's a nice technical article but the charts are just terrible and seem blurry even when zoomed in.

clamchowder · 2025-02-11T20:53:56 1739307236

Yea Wordpress was a terrible platform and Substack is also a terrible platform. I don't know why every platform wants to take a simple uploaded PNG and apply TAA to it. And don't get me started on how Substack has no native table support, when HTML had it since prehistoric times.

If I had more time I'd roll my own site with basic HTML/CSS. It's not even hard, just time consuming.

dark__paladin · 2025-02-11T22:31:16 1739313076

TAA is temporal anti-aliasing, correct? There is no time dimension here, isn't it just compression + bilinear filtering?

clamchowder · 2025-02-11T23:56:25 1739318185

It was a joke about blurriness. To extend the joke, be glad it doesn't flicker and shimmer.

But yes, platforms usually apply compression in terrible ways, and it's especially noticeable coming from text and straight line stuff like graphs

dark__paladin · 2025-02-12T00:45:50 1739321150

Thanks for clarifying, went right over my head!

singhrac · 2025-02-11T23:18:06 1739315886

Ghost as an alternative? They’ll let you sign up paying subscribers as well.

stoatstudios · 2025-02-11T17:38:47 1739295527

Is nobody going talk about how the architecture is called "Battlemage?" Is that just normal to GPU enthusiasts?

reginald78 · 2025-02-11T17:44:05 1739295845

The generations are all fantasy type names in alphabetical order. The first was Alchemist (and the cards were things like A310) and the next is Celestial. Actually when I think about product names for GPUs and CPUs these seem above average in clarity and only slightly dorkier than average. I'm sure they'll get more confusing and nonsensical with time as that seems to be a constant of the universe.

spiffytech · 2025-02-11T18:24:43 1739298283

Dorky, alphabetical codenames are a big step up from a bunch of lakes in no obvious order.

PaulHoule · 2025-02-11T18:27:24 1739298444

Yeah, with the way Intel has been struggling I thought they should get it out of their system and name one of their chips "Shit Creek."

ReptileMan · 2025-02-11T18:38:59 1739299139

It has been 20 years since Prescott But the name is suitable still.

Workaccount2 · 2025-02-11T18:19:12 1739297952

Can't wait for Dungeon architecture.

CodesInChaos · 2025-02-11T21:30:53 1739309453

Dragon and Druid sound like viable options.

meragrin_ · 2025-02-11T20:26:49 1739305609

Dungeon architecture? What's that?

sevg · 2025-02-11T20:38:03 1739306283

Looks to have been a joke about the alphabetical naming: Alchemist, Battlemage, Celestial .. Dungeon

(There’s no name decided yet for the fourth in the series.)

pocak · 2025-02-11T23:38:10 1739317090

There is, it's Druid. Intel announced the first four codenames in 2021.

> [...] first generation, based on the Xe HPG microarchitecture, codenamed Alchemist (formerly known as DG2). Intel also revealed the code names of future generations under the Arc brand: Battlemage, Celestial and Druid.

https://www.intel.com/content/www/us/en/newsroom/news/introd...

A_D_E_P_T · 2025-02-11T23:55:47 1739318147

C should have been Cleric, and I don't know about E (Eldritch Knight?!), but if F ain't Fighter I'm going to be disappointed.

MrDrMcCoy · 2025-02-12T07:52:35 1739346755

E could be evoker, enchanter, or exorcist. F could be Firedancer, but will probably be fighter :)

ZeWaka · 2025-02-11T17:42:19 1739295739

It's their 2nd generation, the 'B' series. The previous was their 'A' / Alchemist.

> According to Intel, the brand is named after the concept of story arcs found in video games. Each generation of Arc is named after character classes sorted by each letter of the Latin alphabet in ascending order. (https://en.wikipedia.org/wiki/Intel_Arc)

tdb7893 · 2025-02-11T19:01:48 1739300508

It's dorky but there isn't much else to say about it. Personal GPU enthusiasts are almost always video game enthusiasts so it's not really a particularly weird name in context.

babypuncher · 2025-02-11T20:53:15 1739307195

It's just the code name for this generation of their GPU architecture, not the name for its instruction set. Intel's are all fantasy themed. Nvidia names theirs after famous scientists and mathematicians (Alan Turing, Ada Lovelace, David Blackwell)

dark-star · 2025-02-11T17:50:20 1739296220

A well-known commercial storage vendor gives their system releases codenames from beer brands. We had Becks, Guinnes, Longboard, Voodoo Ranger, and many others. Presumably what the devs drank during that release cycle, or something ;-)

It's fun for the developers and the end-users alike... So no, it's not limited to GPU enthusiasts at all. Everyone likes codenames :)

throw16180339 · 2025-02-11T19:34:16 1739302456

Are you referring to NetApp?

dark-star · 2025-02-12T22:27:36 1739399256

indeed :)

B1FF_PSUVM · 2025-02-11T18:02:37 1739296957

> Everyone likes codenames :)

Except butt-headed astronomers

homarp · 2025-02-11T18:08:11 1739297291

https://www.engadget.com/2014-02-26-when-carl-sagan-sued-app... if you miss the ref

monocasa · 2025-02-11T18:37:25 1739299045

I mean, living people seems like a dick move in general for codenames.

wincy · 2025-02-11T20:54:53 1739307293

That’s what we make sure our codenames are sensible things like Jimmy Carter and James Earl Jones

We were actually told to change our internal names for our servers after someone named an AWS instance “stupid” and I rolled my eyes so hard, one dev ruined the fun for everyone.

monocasa · 2025-02-11T22:22:18 1739312538

I mean, sure, for a lot of the same reasons you can't file a defamation claim in defense of someone who's dead. The idea of them is in the public domain in a lot of ways.

So sure, pour one out to whoever's funeral is on the grocery store tabloids that week with your codenames.

high_na_euv · 2025-02-11T17:56:00 1739296560

Cool name, easy to remember, aint it?

faefox · 2025-02-11T19:55:34 1739303734

It sounds cool and has actual personality. What would you prefer, Intel Vision Pro Max? :)

baq · 2025-02-11T17:55:04 1739296504

A codename as good as any. Nvidia has Tesla, Turing etc.

userbinator · 2025-02-11T21:17:05 1739308625

It's very much normal "gamer" aesthetic.

ein0p · 2025-02-12T03:11:36 1739329896

Why not go out on a limb and produce a 64GB compute-optimized card with 1TB/sec memory bandwidth, Intel? What do you have to lose at this point?

williamDafoe · 2025-02-11T18:28:37 1739298517

[flagged]

keyringlight · 2025-02-11T21:22:35 1739308955

The other major issue with regards pricing is that intel need to pay one way or another to get market penetration, if no one buys their cards at all and they don't establish a beachhead then it's even more wasted money.

As I see it AMD get _potentially_ squeezed between intel and nvidia. Nvidia's majority marketshare seems pretty secure for the foreseeable future, intel undercutting AMD plus their connections to prebuilt system manufacturers would likely grab them a few more nibbles into AMD territory. If intel release a competent B770 versus AMD products priced a few hundred dollars more, even if Arc isn't as mature I'm not sure they have solid answers for why someone should buy Radeon.

In my view AMD's issue is that they don't have any vision for what their GPUs can offer besides a slightly better version of the previous generation, it appears back in 2018 that the RTX offering must have blindsided them, and years later they're not giving us any alternative vision for what comes next for graphics to make Radeon desirable besides catching up to nvidia (who I imagine will have something new to move the goalposts if anyone gets close), and this is an AMD that is currently well resourced from Zen

adgjlsfhk1 · 2025-02-11T20:28:58 1739305738

I think this is a bad take because it assumes that NVidia is making rapid price/performance improvements in the consumer space The RTX 4060 is roughly equivalent to a 2080 (similar performance and ram and transistors). Intel isn't making much margin, but from what I've seen they're probably roughly breaking even not taking a huge loss.

Also, a ton of the work for Intel is in drivers which are (as the A770 showed) very improvable after launch. Based on the hardware, it seems very possible that B580 could get an extra 10% (especially in 1080p) which would bring it clear above the 4060ti in perf.

wirybeige · 2025-02-11T21:46:59 1739310419

Strange to point out those comparisons but not the actual transistor difference between the two.

B580 only has 19.6B transistors while the RTX 4070 has 35.8B transistors. So the RTX 4070 has nearly double (1.82x) the transistors of B580.

The RTX 4060 ti has 22.9B and the RTX 4060 has 18.9B transistors

throwawaythekey · 2025-02-11T22:22:16 1739312536

Would the difference in density be more likely due to a difference in design philosophy or the intel design team being less expert?

As a customer do intel pay for mm2 or for transistors?

Forgive me if you are not the right person for these questions.

wirybeige · 2025-02-12T00:24:00 1739319840

Hard to say why the density is that different, if those transistor numbers are accurate. A less dense design would allow for higher clocking, & while the clocks are fairly high, they aren't that far out there, but that's one factor (I'd hope they wouldn't trade half the area for a few extra MHz, when a gpu w/ 2x the tr will just be better).

It could also be in addition that the # of transistors that each company provides is different as they may count them differently (but I'm not convinced of this).

Customers pay by the wafer, so mm^2; though tr cost is a function of that so :3 .

wqaatwt · 2025-02-11T19:14:16 1739301256

> 4060 performance

That’s really not true though. It’s closer to 4060 Ti and somewhat ahead/behind depending on specific game.

ksec · 2025-02-11T18:50:41 1739299841

> I am not too impressed with the "chips and cheese ant's view article" as they don't uncover the reason why performance is SO PATHETIC!

Performance on GPU has always been about Drivers. Chip and Cheese is only here to show the uArch behind it. This isn't even new as we should have learned all about it during Voodoo 3Dfx era. And 9 years have passed since an ( now retired ) Intel Engineers said that they would be completing against Nvidia by 2020 if not 2021. We are now in 2025 and they are not even close. But somehow Raja Koduri was suppose to save them and now gone.

rincebrain · 2025-02-11T22:17:34 1739312254

Intel seems to have deep-seated issues with their PR department writing checks their engineers can't pay out on time for.

Not that Intel engineers are bad - on the contrary. But as you pointed out, they've been promising they'd be further than they are now for over 5 years now, and even 10+ years ago when I was working in HPC systems, they kept promising things you should build your systems on that would be "in the next gen" that were not, in fact, there.

It seems much like the Bioware Problem(tm) where Bioware got very comfortable promising the moon in 12 months and assuming 6 months of crunch would Magically produce a good outcome, and then discovered that Results May Vary.