Sunteți pe pagina 1din 3

A Few Useful Things to Know about Machine Learning

Machine learning algorithms provide feasible and cheap solutions for ambitious tasks
by generalizing from examples. Purpose of the article is to show imporant points to pay
attention while dealing with machine learning applications.

1. Introduction
In machine learning, instead of manually constructing algorithms, systems learn from
data automatically. One type of machine learning is making classifications. A
classifier outputs a single value, the class) by using vector of feature values as an
input. Spam filter is an example of this. A learner inputs training set which consists of
observed inputs and outputs a classifier.

2. Learning = Representation +Evaluation + Optimization


Numerous learning algorithms exist in machine learning, yet they consists of three
components: Representation (a formal language which computer handles), evaluation
(to distinguish good/bad classifiers) and optimization (to find best classifier). There is
no simple rule or trick to choose components and many learners even may have both
discrete and continuous components.

3. Its Generalizaton That Counts


The ultimate purpose of machine learning is to generalize beyond the training set. If
you do test on your training data, you may have an illusion of success, this is the most
common msitake among beginners. In machine learning, training set errors should be
used wisely since unlike the most other applications, we dont have access to the
function which we want to optimize in machine learning.
4. Data Alone Is Not Enough
To generalize beyond, learners should combine knowledge with data. Machine
learning is getting more from less and very general assumptions are often well enough
to provide progress.
5. Overfitting Has Many Faces
If a classifier is succesfull among training data but very weak on generalizing, there is
probably an overfitting. Strong false assumptions may be better than weak true
assumptions. It is possible to avoid overfitting by falling into the opposite error of
underfitting but there is no single technique for every case.
6. Intuiton Fails In High Dimensions
Many systems may work well in low dimensions but may become useless when the
dimensionality is high. As dimensionality grows, generalizing truely become much
more harder. Since we are used to 3D world, our intuitions become wrong in high
dimensions. In high dimensionality, having more features even may be outweighted by
negative effect of dimensionality. Luckily, in most cases examples are concentrated
near a low dimensional manifold.

7. Theoretical Guarantees Are Not What They Seem


Theoretcial guarantees are not a criterion for practical decisions, but they may be the
source of understanding and be the driving force for the design of the algorithm.

8. Feature Engineering Is The Key


If one have lots of features that correlate well with the class, learning becomes easy.
You can construct features from the raw data that is beneficial to learning. Actually little
time is spent actually doing machine learning. Time goes for having, processing and
trying for the data.

9. More Data Beats A Cleverer Algorithm


A weak algorithm may outperform the better by having more data. But out 2 main
limited sources are time and memory. So even if you have numerous data, you may
not have time to process it. All learners basically work by grouping nearby examples
into the same class. Learners can produce different frontiers while still making the
same predictions. This is why powerful learners may be accurate eventhough they are
not stable.

10. Learn Many Models, Not Just One


There is no such thing like best learner. It varies from application to application. If you
combine different learners, you obtain better results. It is known as model ensembles
and involvse techniques like bagging, boosting, stacking.

11. Simplicity Does Not Imply Accuracy


There is no direct connection between simplicity and accuracy but yet simpler
hypotheses should be chosen because of its own virtue.

12. Representable Does Not Imply Learnable


Can it be represented? and Can it be learned? is two different questions that should
be thought.
13. Correlation Does Not Imply Causation
Correlations may be a sign for a potential casual relation but it is not necessarily so.
Some learning algorithms may reach causal info from observational data where
predictive variables are not under the control of the learner.
Conclusion
This paper gives brief information about most misinterpreted thought in machine
learning and aims to show pitfalls that should be avoided.

S-ar putea să vă placă și