Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, the kernel trick is something else: basically a nonlinear basis representation of the model. For example, fitting a polynomial model, or using splines, would effectively be using the "kernel trick" (though only ML people use that term, not statisticians, and usually they talk about it in the context of SVMs but it's fine for linear regression too). Transforming the data is just transforming the Y-outcome, most commonly with log(y) for things that tend to be distributed with a right-skew: house prices being a classic example, along with things like income, various blood biomarkers, or really anything that cannot go below zero but can (in principle) be arbitrarily large.

In a few rare cases I have found situations where sqrt(y) or 1/y is a clever and useful transform but they're very situational, often occurring when there's some physical law behind the data generation process with that sort of mathematical form.



To be fair, the "trick" part of the kernel trick involves implicitly transforming the data into a higher dimensional space and then fitting a linear function in that space. Ie, you're transforming the inputs so that a linear function from inputs to outputs fits better than if you didn't do the transform.

The "trick" allows you to fit a linear function in that higher dimensional space without any potentially costly explicit computation in the higher dimensional space based on the observation that the optimal solution's parameters can be represented as a sum of the higher dimensional representations of points in the training set.


No actually I think you’re mistaken. Representing the model via a nonlinear transformation where a linear model more closely captures what’s going on is precisely what the kernel trick does, although the situation being described is more broad than the kernel trick, things like the power transform also fit the bill.


The kernel trick is a technique used in data classification that involves mapping the points into a higher dimensional space and then finding a linear separation in that higher dimension.

It's not about finding a line of best fit or making the dataset appear linear, it's about being able to split a dataset into two classes using a linear function.


Sure, it’s not about finding a line of best fit, but the principle is the same: a transformed space where linear things work better is used.


Just keep in mind, the kernel trick is a way to transform a data set so that "linear things work better"... although that's very vague I mean sure it's passable but it's also different from what was originally posted... the kernel trick doesn't transform your data into a space where that data becomes linear. It transforms your data into a space where it can be separated by a line/plane. The data is almost always non-linear in that transformed space but it's transformed in a way that a plane can cleanly separate that data.

Given that the kernel trick is pretty specific jargon used mostly in a specific circumstance, it's in your interest to use that term in that specific context. If you're interested in the more general term of making things work with respect to some function, which can be linear or Gaussian or some other form the term is "feature transformation".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: