Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An FPGA Is an Impoverished Accelerator (washington.edu)
75 points by samps on Nov 27, 2014 | hide | past | favorite | 39 comments


What a useless article.

The big problem is software people thinking that they have any concept of actual hardware design.

If they understood hardware, they would understand that an FPGA is the least efficient way to accomplish anything.

Routing is sparser than any chip. You burn 10-100x the transistors to do the same task. FPGA's are hot and slow.

Even for signal processing, an FPGA is going to be quite hard pressed to beat a 2.0GHz ARM with Neon extensions unless it is very expensive and your algorithm is very dataflow oriented. How many ARM's can I put on a board for $10,000-$100,000 (the very highest end FPGA's)?

You use an FPGA because you have a low-volume application that you can't do any other way, and your application has enough margin that you can eat the cost of the FPGA. And you are always looking to wipe out that FPGA and replace it with a microprocessor because it is so much cheaper and easier to deal with.


"FPGA is the least efficient way to accomplish anything" Define 'efficient'. If you're talking about cost, it costs far less than an asic below a certain volume, and it certainly costs less in tooling and development.

"FPGA's are hot and slow" - compared to a full custom IC? Sure. FPGA's improve with every process generation (like all silicon devices) and an ASIC design won't intrinsically take advantage of those advances; an FPGA design that didn't meet it's power or thermal envelope 5 years ago might easily do so now, without incurring the NRE of the ASIC - add to that the fact that the first stage of the ASIC design can be prototyped via the FPGA, and you have a viable product without the risk of a bad ASIC.

I've been involved in converting several Virtex-2 designs to newer devices - the huge reduction in power and increase in available logic has led to some extremely impressive gains. There is work to do in such a conversion, but it is understood work - there's no real mystery to updating the CoreGen components.

Agree it is a useless article though, because digital logic design is not programming (it is architectural work). There is no 'abstracting' that away - all attempts thus far have failed miserably (Vivado HLS, for example, turns out designs that work but are HUGE compared to what even a passable designer can do).


>There is no 'abstracting' that away

While hardware design often has awkward constraints that make generalised abstractions tricky, there is still a lot that can be done to improve over Verilog or VHDL. I've been working in Bluespec for a couple of years now, and the difference is night and day. Having a modern type system in our HDL makes experimentation and iteration so much easier.


You can still abstract lots of it. For example chisel[1] does so through high level abstractions and parametrized generators - while still offering results equal to verilog.It's also also used for the open source risc-V architecture[4], the recent highly-efficient parallel cpu[2] done by a small start-up(2 guys),and a floating-point alu generator[3] that explores the design space of fpu's and finds optimal designs.

And reading about these projects, one gets the impression chisel was critical for them.

[1]https://chisel.eecs.berkeley.edu/chisel-dac2012.pdf

[2]http://www.eetimes.com/document.asp?doc_id=1324759

[3]http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=654588...

[4]http://www.eetimes.com/author.asp?doc_id=1323406


> it costs far less than an asic below a certain volume

And that's it. That is the ONLY place where an FPGA makes sense--"I'd like to have an ASIC, but I can't afford the NRE."

There is no technological axis in which an FPGA is superior to anything.


He is right in the sense that for someone from a programming background, using FPGAs is far harder than it could be. I have looked into it a few times and it is indeed horrible.

It looks to me like a typical situation of legacy tools and some degree of oligopoly. I imagine a hardware person going into programming would have a similar experience (in reverse).


Legacy tools, IMHO, is the biggest problem. This article got it right -- you can't just get rid of them.


Okay then. I wanna know of a better way to take four 1 GSPS signals, demodulate them, and pump out another two 1 GSPS signals which encode decisions made every every four samples on the incoming signals. That's ONE of the problems in quantum control for stabilizing one qubit. We did it with an FPGA. If you know of a DSP or systolic array or processor or what have you for doing this, I'm ALL ears. Oh and the timing must be COMPLETELY deterministic down to the nanosecond.

In fact, if you know of ANY general purpose hardware that will talk to gigasample ADCs/DACs, I'd love to know about it.


ASIC. The timing will be better. Less power will be burned.

The only thing an FPGA wins on is NRE (non-recurring engineering).

The real problem is that the giga-sample DAC's/ADC's aren't willing to speak one of the actual high-speed interfaces or put a DSP directly on the ADC/DAC. So, everybody needs to use an FPGA to shoehorn the data into a useful form.

If somebody put an actual DSP on their ADC/DAC, FPGA's would evaporate for this application like they have evaporated for so many others.

Any FPGA application with volume eventually gets subsumed by special purpose hardware on a microcontroller. For example, people used to use FPGA's for PWM, motor control, etc. Now those blocks are standard on microcontrollers.


I'd LOVE to make an asic, but we're a small lab and don't have that much to spend.


Exactly. It's not that an FPGA is better, more flexible, faster, etc. It's that you don't have the money for the NRE on an ASIC.

That is EXACTLY the reason to use an FPGA. I use them all the time for projects like that.

This is different from the software people who want a magically flexible brain for generic tasks.


The grandparent is responding to the article, which is talking about general purpose computation. There's no question that your application is a prime use case for an FPGA.


I'm not sure fpga's are the least efficient everywhere,aren't there micro+fpga chips(like xilinx zynq) used in high/medium volume embedded system ? and wouldn't better tools increase volumes, leading to lower costs ?


I've heard several computational physicists make this complaint to NVIDIA sales reps. The standard response, which I'm sure is correct, goes as follows.

Designing a fast processor is very expensive, far beyond the means of the research community. The only way anyone can afford it is to sell millions of the things to gamers. To put $1 of special hardware on your numerical card, we have to put it on 1000 graphics cards too, so you'd have to pay $1000 for it. Bad luck: scientists are destined to hack hardware that was designed for larger markets.


Yet somehow AMD manages to consistently offer better hardware (wrt double floating point performance) for a lower price. I'm sure it's because the fine folks at AMD are silicon wizards and not because of NVIDIA's cozy monopoly position due to shrewd marketing of CUDA + their early-mover advantage in academic markets.


The tools are more valuable than raw performance, which can be bought with time and money.


Absolutely true, but there's better overlap between tooling required for the game industry and tooling required for academic compute than there is between the respective hardware (double vs single (or lower) float performance).


Well, buy AMD then...

But yeah, for games, maybe fp is going to make more sense than fixed-point with time, as things like ray-tracing, etc, begin to be used.


Games are basically 100% floating point on the graphics side already. Even the color shading is done on floating point quantities on modern engines.

The problem isn't that games are not FP, it's that for games, 32-bit precision is good enough, and for most problems actually way more than they need.


Yes, it's hard to justify a 48bit fpu or 64bit fpu when you can have 1.5x or 2x the number of computing units (approximately)

Still, "not as fast as we wanted" is a "modern researcher problem" ;) Some years ago they would have been converting it to run in integers so that it's not unbelievably slow.


> Well, buy AMD then...

I already mentioned why that wasn't feasible. I'm pretty much stuck paying $100 extra on each card in exchange for my predecessors' "free" CUDA lessons.

My point isn't that the economic tradeoff mentioned by the sales rep doesn't exist, my point is that the tradeoff can't be responsible for a price grade as steep as the one we see NVIDIA use. The real answer as to why they price-grade so heavily is "because they can" -- not that I would expect a sales rep to be honest about it.


> Bad luck: scientists are destined to hack hardware that was designed for larger markets.

Good luck: You get the perfomance of what was considered a supercomputer little over two decades ago, by hacking a consumer-technology product.


> FPGAs are legacy baggage in the same way that GPGPUs are.

I hoped the author would expand on this point.

It is also my impression that GPGPU are just "a hack": they should had been normal coprocessors to the main CPU, just like the FPU and the vector units are. It seems that now we are finally reaching that model (in Linux the graphics device is almost completely separated from the computational device, although they are on the same physical device most of the time) but we are still far from the "Comprocessor extension" opcode space of MIPS processors or to the "brain and arms" of CELL (1 generic CPU, many specialized coprocessors).


FPGAs would be more attractive if they weren't so over priced... good thing that patents are around to almost completely eliminate competition in that space.


Or from the other view, reduce entering the market from an extremely lengthy and risk R&D venture into a known fee. Which you can account for and drastically lowers risk.


On what grounds are they overpriced? Because they are expensive?


I wish he would comment more on what he finds wrong with HDLs?

I fail to understand why using a HDL for a digital ASIC is fine, but using one for a FPGA in the context of acceleration is not.


He's annoyed that the HDL doesn't describe the entirety of a typical FPGA "program". Some things just can't be emulated efficiently by the FPGA fabric (or they are common enough patterns that it would be wasteful to do so) and the workaround that has become the de-facto standard is to include "ASIC chunks" in the middle of all the programmable gates. For instance, you might have a serial output that runs at 10Gbps while the rest of the FPGA runs at 500MHz. To bridge the gap between the slower programmable logic and the fast transceiver you need a shift register. The way you specify this in code is by importing a vendor-specific "library" -- except it's not really a HDL library at all, it's a black box that the proprietary back-end hooks up to to the "ASIC chunks" at compile time.

It's like compiling against a binary library, except that the binary isn't another piece of software, it's an etched pattern on your FPGA's wafer. Even if you did have the "source code" it wouldn't do you any good unless you have a foundry in your backyard :-)

I'm skeptical of the calls for a higher level of abstraction. How are you going to abstract away the fact that the FPGA has exactly 2 embedded memory controllers that have precisely A, B, and C inputs and X, Y, and Z outputs? Either you come up with a solution that's effectively just as ugly as what we have now (because it exposes the FPGA's resources explicitly) or you come up with a solution that hides these details and as a result becomes enormously fragile because it's easy to accidentally change something that prevents the compiler from inferring which embedded ASIC chunk you meant to use. You need to be aware of limitations to work within them, and the limitations seem to be stuck with us for the foreseeable future.


"How are you going to abstract away the fact that the FPGA has exactly 2 embedded memory controllers that have precisely A, B, and C inputs and X, Y, and Z outputs?"

The same thing we do every time, Pinky: assume the existence of a sufficiently advanced compiler. ;)


Wouldn't an ASIC have hard macros anyway? It sounds to me more like the author is annoyed that "it's 2014 and we are still using HDL", and also that FPGAs are somehow supposed to be readily exploitable for general-purpose computation.


The same way we have 3D printers now, I would dream to have a foundry in my backyard.


Me too, brother. Me too.


It sounds like he doesn't think the level of abstraction offered is high enough. If you are using an FPGA as a general purpose device, and not for prototyping, then a higher level of abstraction would be helpful. Otherwise, if you are prototyping, you may want a lower level of abstraction that may offer a closer approximation to your end goal.


Are there any attempts out there to build a better open standard than FPGA? I'd be interested to look into them if there were.


I believe menta licenses fpga cores(LUT architectures). But FPGA's have so much that isn't LUT which is critical for performance.


Yes, RTL level of abstraction is a way too low, even for most of the ASIC things. Yes, we need higher level HDLs (more abstract than the said Chisel and Bluespec). I'm working on it, stay tuned.

But what I cannot get from this article is what is exactly wrong with the current FPGAs design? They've got DSP slices (i.e., ALU macros), they've got block RAMs and all the routing facilities one can imagine. For the dataflow stuff it's more than enough.

Of course it would have been much better if the vendors published the detailed datasheets for all the available cells and the interconnect, for the bitfile formats, etc. - to make it possible for the alternative, open source toolchains to appear. Yes, their existing toolchains are, well, clumsy. But it is still quite possible to abstract away from the peculiarities of these toolchains.


Best of luck for your project. I'm curious about it and i'll wait.Instead i'll ask: what are your opinion regarding embedded/mcu software tools? do you see something better than rust that can automate the dev process ?


Thanks. I've been a Forth fan, but recently, looking at the advances in the static code analysis, I'd suspect that the higher level languages have a chance to become very useful in the mcu limited resources environment too. Rust is a nice attempt, and there is also a possibility that something doing a proper region analysis can kick in (looking at languages like Harlan, I would not say it's impossible).


I once had a vision, but now I see rust, once matured, as the best contender here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: