Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
deepnotderp
on Sept 10, 2021
|
parent
|
context
|
favorite
| on:
YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level P...
V100 GPUs have non tensor core fp16 operations too I think
woadwarrior01
on Sept 10, 2021
|
next
[–]
Yes. Non tensor core fp16 ops are the default. Tensor cores are essentially 4x4 fp16 mac units and there's a requirement that matrix dimensions are multiples of 8[1] that needs to be met for them to be used.
[1]:
https://docs.nvidia.com/deeplearning/performance/mixed-preci...
ml_hardware
on Sept 10, 2021
|
prev
[–]
That's true.. in fact, seeing V100 FP16 < T4 FP16 makes me believe you're right, the V100 should be much faster if the tensor cores were being used.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: