A very natural explanation of "wikipedia proof 2" for differentiable functions seems to be missing:
By linearity of expectation, both sides are linear in f, and for linear f we have equality. Let's subtract the linear function whose graph is the tangent hyperplane to f at E(X). By above, this does not change the validity of the inequality. But now the left hand side is 0, and right hand side is non-negative by convexity, so we are done.
It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.
Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.
By linearity of expectation, both sides are linear in f, and for linear f we have equality. Let's subtract the linear function whose graph is the tangent hyperplane to f at E(X). By above, this does not change the validity of the inequality. But now the left hand side is 0, and right hand side is non-negative by convexity, so we are done.
It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.
Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.