Datasets tend to be really rough proxies of product goals. Initial spec is "feat...

Datasets tend to be really rough proxies of product goals. Initial spec is "feature smiling faces", so a "smiling/no-smiling" dataset is built. But over the next year you really realize people can be "smiling but ugly smiling", "neutral faced but pleasant" and a bunch more. There are bugs you need to fix (false positive/negatives), and lots of tweaks to the goals. Any design nuance is lost in the chain: explain product concept to data science team, who writes a spec for data collectors, who collect samples, DS makes a model, eng integrates, and then folks (finally) try it in product.

QA files one off bugs, but not in a way that impacts datasets/training. Someone needs to analyze them in bulk and make calls about which areas to care about (which is slow and expensive).

However, if the time to data is tiny, you can iterate more like software. New model drops often (with fast evals). Subjective feedback can become synth data quickly, the issue fixed, and results evaluated.

Your product looks a bit more like analysis pipelines for new problems? I'm more looking at zero-shot quality and performance.