Sunteți pe pagina 1din 7

Learning Path : Your mentor to

become a machine learning expert

Machine learning is a complex topic to master!


Not only there is a plethora of resources available, they also age very fast. Couple this
with a lot of technical jargon and you can see why people get lost while pursuing machine
learning. However, this is only part of the story. You can not master machine learning with
out undergoing the grind yourself. You have to spend hours understanding the nuances
of feature engineering, its importance and the impact it can have on your models.

Through this learning path, we hope to provide you an answer to this problem. We have
deliberately loaded this learning path with a lot of practical projects. You can not master
machine learning with the hard work! But once you do, you are one of the highly sought
after people around.

Since this is a complex topic, we recommend you to strictly follow the steps in sequential
order. Consider this as your mentor for machine learning. Only skip a step, if you know
the subject matter mentioned in that step already.
Warming up – how is machine learning useful?
If you are a complete starter to machine learning, here is a good talk from Jeremy Howard
to understand how machine learning is changing this world. Jeremy discusses various
applications of machine learning and deep learning. Jeremy, also discusses a few ways
in which machine learning can impact this world.

Still not sure, check out this smaller video on training a machine to play Super Mario.

Excited about what machine learning can achieve? Let’s look at a learning path to make
you a machine learning expert.

Optional read: Basics of Machine learning for a newbie

Step 0: Basics of R / Python


There are multiple languages which provide machine learning capabilities. Also, there is
development work happening at a rapid pace across several languages. Currently “R” and
“Python” are the most commonly used languages and there is enough support /
community available for both. Before entering into world of ML, I would recommend you
to choose one of these two language (R or Python) which can help to focus on machine
learning (Which is better – R or Python?).

Keep your focus on understanding the basics of the language, libraries and data structure.
Here’s the step by step guide to learn R and Python:

a) Learning Path on R: Step 0 to Step 2

b) Learning Path on Python: Step 0 to Step 2

Other languages you can consider: Scala, Go / Julia in coming time

Step 1: Learn basic Descriptive and Inferential


Statistics
Let’s start or refresh our statistical learning. It is good to have understanding about the
descriptive and inferential statistics before you start serious machine learning
development. Udacity offers course on descriptive statistics and Inferential statistics. Both
courses would make use of Excel to teach you all the basics of statistics. If you already
know them, you can refresh or skip this step.

Assignment: You can perform assignments of both the courses using your choice of
language (R / Python). You can refer respective statistical libraries and methods for both
the languages below.

R: Stats

Python: Scipy, Numpy, Pandas

Must read: Difference between machine learning and statistical modeling?

Step 2: Data Exploration / Cleaning / Preparation


What differentiates a good machine learning professional from an average one is the
quality of feature engineering and data cleaning which happens on the original data. The
more quality time you spend here, the better it is. This step also takes the bulk of your
time and hence it helps to put a structure around it. You can refer series of articles below
to learn different stages of data explorations.

1. Variable Identification, Univariate and Multivariate analysis


2. Missing values treatment
3. Outlier treatment
4. Feature Engineering

You can also refer Data exploration methods in R and Python:

• Data Exploration in R
• Data Exploration in Python

Exercise / assignment:
1. Take up the titanic survival problem from Kaggle, build a set of hypothesis and then
clean the data, add new features to the existing dataset. Think what is the best way
to impute missing age?
2. Similarly, take up the Bike sharing demand forecasting problem and repeat the
cycle mentioned above.

Step 3: Introduction to Machine Learning


You should now open the doors for Machine Learning. There are various resources
available to start with Machine learning techniques. I would suggest you to pick one of the
following 2 ways depending on your style of learning:

• Option 1: If you are some one who likes to take learning in small small steps and
need more hand holding, you should start from Machine learning course from
Andrew Ng: It is a good course for beginners and easy to understand. Professor
Ng is amazing in making difficult concepts come to you so smoothly. The course
covers all the basic algorithms and also introduces a few advanced topics like
neural networks, Recommendation system and application of machine learning in
large databases using Map Reduce. He chooses to use Octave / MATLAB instead
of the more popular R or Python for teaching machine learning. Once completed,
you should proceed to exercises and homework provided in Option 2.
• Option 2: If you are more independent, like challenges and can battle out tough
assignments, you should take Learning form Data course by Prof. Yaser Abu-
Mostafa: This course gives an amazing treatment of the concepts behind machine
learning but beware this course is quite heavy on math and the theory behind ML
(stuff like the VC dimension). It also requires more programming knowledge and is
thus more advanced in that sense. This course is loaded with home works (which
is not necessarily a bad thing ).

Now, you have good understanding about the algorithms and techniques. Let’s look at the
libraries or packages available in R or Python. You can refer learning path (step-6 )
of R (additionally, ML Algorithms in R) and Python to explore about these packages and
related options.
Step 4: Participate in Kaggle Knowledge competition
By now, you have all the tools you need to compete on Kaggle knowledge
competitions. These knowledge competitions have less difficulty level as compared to
prize winning challenges. You can also find various related resources to kick start your
data science journey. Below are the list of currently active knowledge competition:

• Titanic: Machine Learning from Disaster


• San Francisco Crime Classification

Must Read: How do I start my journey on Kaggle?

Step 5: Advanced Machine Learning


Now that you have learnt most of machine learning techniques, it’s time to explore
advanced machine learning techniques to understand different structure of data like Deep
Learning and Machine Learning with Big Data.

Deep Learning
Are you aware about deep learning? if not, here is a brief introduction about it and more
detail on deep learning watch video here. Below are the list of deep learning resources
that will help you to get started:

• The most comprehensive resource is deeplearning.net. You will find everything


here – lectures, datasets, challenges, tutorials.
• Another course from Geoff Hinton a try in a bid to understand the basics of Neural
Networks
• Pattern recognition using Python (Resource 1, Resource 2, Resource 3) and R
(Resource 1)
• Text Mining using Python (Resource) and R (Resource 1 , Resource 2)

Ensemble modeling
This is where an expert is different from an average professional. Ensembling can add a
lot of power to your models and has been a very successful technique in various Kaggle
competitions. Here is one of the best guide on emsemble modeling we have come
across.

Machine Learning with Big Data


As you know that the size of data is increasing at an exponential rate but raw data is not
useful till you start getting insights from it. Machine learning is nothing but learning from
data, generate insight or identifying pattern in the available data set. There are various
application of machine learning algorithms like “spam detection”, “web document
classification”, “fraud detection”, “recommendation system” and many others. Below are
the list of tutorials to deal with big data using machine learning.

• Scalable Machine Learning


• Packages for Big Data in Python ( Pydoop, PyMongo) and R
(Resource1, Resource2)

Step 6: Participate in main stream Kaggle Competition


Now you have most of the technical and statistical skills. It’s time to start learning from
fellow data scientists while competing with them. Kaggle is a similar place as what we
want a more active, engaged and competitive platform. Data scientists are passionate
about their rank and model performance. Go, dive into one of the live competitions
currently running on Kaggle and give all what you have learnt a try! Good luck!

Optional step: Text mining and databases


If you need to apply machine learning to text mining, you can look at the following guide
to clean text data and build models on it. You can also look at the following Kaggle
competition:

• De-noising dirty documents


• Sentiment analysis on movie reviews

The Fun part


Now that you know what and where to learn to become a machine learning professional,
here is a small simulation of how a genetic algorithm based robot would learn walking

And some serious stuff


Now that you know the potential of machine learning, imagine the impact it could have
on today’s world. The talk from Jeremy mentions briefly about this. Following article tells
about this evolution from a different perspective: part 1 & part 2
Hope you enjoyed this learning path on machine learning and the impact machine
learning can have on our future. If you have any suggestions to improve this learning
path, please feel free to share them through comments below.

S-ar putea să vă placă și