http://stats.stackexchange.com/questions/187335/validation-error-less-than-training-error

Generally speaking though, training error will almost always underestimate your validation error. However it is possible for the validation error to be less than the training. You can think of it two ways:

  1. Your training set had many 'hard' cases to learn
  2. Your validation set had mostly 'easy' cases to predict

That is why it important that you really evaluate your model training methodology. If you don't split your data for training properly your results will lead to confusing, if not simply incorrect, conclusions.

I think of model evaluation in four different categories:

  1. Underfitting – Validation and training error high
  2. Overfitting – Validation error is high, training error low
  3. Good fit – Validation error low, slightly higher than the training error
  4. Unknown fit - Validation error low, training error 'high'

I say 'unknown' fit because the result is counter intuitive to how machine learning works. The essence of ML is to predict the unknown. If you are better at predicting the unknown than what you have 'learned', AFAIK the data between training and validation must be different in some way. This could mean you either need to reevaluate your data splitting method, adding more data, or possibly changing your performance metric (are you actually measuring the performance you want?).

results matching ""

    No results matching ""