No, the kernel trick is something else: basically a nonlinear basis representati...

psb217 · 2025-05-08T18:08:42 1746727722

To be fair, the "trick" part of the kernel trick involves implicitly transforming the data into a higher dimensional space and then fitting a linear function in that space. Ie, you're transforming the inputs so that a linear function from inputs to outputs fits better than if you didn't do the transform.

The "trick" allows you to fit a linear function in that higher dimensional space without any potentially costly explicit computation in the higher dimensional space based on the observation that the optimal solution's parameters can be represented as a sum of the higher dimensional representations of points in the training set.

LPisGood · 2025-05-08T18:56:42 1746730602

No actually I think you’re mistaken. Representing the model via a nonlinear transformation where a linear model more closely captures what’s going on is precisely what the kernel trick does, although the situation being described is more broad than the kernel trick, things like the power transform also fit the bill.

Maxatar · 2025-05-08T19:15:03 1746731703

The kernel trick is a technique used in data classification that involves mapping the points into a higher dimensional space and then finding a linear separation in that higher dimension.

It's not about finding a line of best fit or making the dataset appear linear, it's about being able to split a dataset into two classes using a linear function.

LPisGood · 2025-05-08T23:15:12 1746746112

Sure, it’s not about finding a line of best fit, but the principle is the same: a transformed space where linear things work better is used.

Maxatar · 2025-05-08T23:41:55 1746747715

Just keep in mind, the kernel trick is a way to transform a data set so that "linear things work better"... although that's very vague I mean sure it's passable but it's also different from what was originally posted... the kernel trick doesn't transform your data into a space where that data becomes linear. It transforms your data into a space where it can be separated by a line/plane. The data is almost always non-linear in that transformed space but it's transformed in a way that a plane can cleanly separate that data.

Given that the kernel trick is pretty specific jargon used mostly in a specific circumstance, it's in your interest to use that term in that specific context. If you're interested in the more general term of making things work with respect to some function, which can be linear or Gaussian or some other form the term is "feature transformation".