What makes using volunteer compute resource not practical for training large scale LLMs. Something similar to the SETI@home project or the Mersenne prime number search which enabled users to effectively pool available compute resource together to solve some large problem.
It seems like compute resources are quickly becoming a bottleneck and moat preventing ML researchers to train and use LLM type language models.
Would be great to see a more publicly available solution to this, to break down the dam so to speak and give everyone access to SOTA LLMs
As you mentioned, ML training can be parallelized but this requires either model/data parallelism.
Data parallelism means spreading the data over many different compute units and then synchronizing gradients somehow. The heterogeneous nature of @home computing makes this particularly challenging, as you will be limited by the smallest compute unit. I've personally only ever seen data (and model) parallel done on a homogenous compute cluster (i.e. 8x GPUS)
For model parallelism, we split the model across different compute units. However, this means that you need to synchronize the different parts of the model together, which can get very expensive when you do it across the internet. If you have 8xGPUS on one machine, your latency is limited by PCIe instead of TCP/IP in a distributed @home cluster.
But I would say it's not impossible, someone clever could definitely figure it out.