Split it up - Part 1
Machine Learning ???? How can a machine even learn??
For a long period of time I always wondered what this fancy term meant and how would it be actually possible ..!! Let's figure it out today. To get an essence of what happens when we say Machine is learning ,why not understand in the first place how we learn.
Well..!!! let us take a glance at those days when we learnt mathematics as a kid. I still remember how my maths teacher used to teach us basic addition and subtraction in class .Initially she helped us to count using fingers and then add or subtract those numbers. With a lot of practice and sometimes getting beaten up by her we finally learnt addition /subtraction. Mathematically speaking after lot of practice /training , our brain learnt a function of addition where if we get numbers to add ,it processed in a way to give the sum.
Similarly Machine Learning is all about finding that magic function "f" wherein ,we give some input and desire some output . (EXC for e.g. KNN algo)
Similarly in Machine Learning we split our whole dataset into Train data and Test data. While training a ML model we only use Train data and keep it away from Test data so that model should generalize well.
Yaa..!! you guessed it right ,by training a model, I mean learning a magic function "f" .
Oh wait !!! did I use the word Dataset ?
Let's see what it is . A dataset comprises of features and labels. Briefly features are inputs and labels are outputs. For e.g. we can say that a cat has short ears ,light colored eye ,flat mouth and long whiskers whereas a dog has long ears,dark coloured eyes ,bulging mouth and short whiskers. These are nothing but features because it defines the class or labels ,which is dog and cat.
With the given training data now we learn the magic function "f" .So that if some input x is fed to function f it will return output y .
f(x) = y
And now to evaluate our model that is to find the performance of our model we feed the Test data to the model and then find the accuracy . If model gives a good accuracy we accept it or else we make few changes in our model and then re-evaluate it.
Splitting of train and test data can be in the ratio 80:20 ,70:30 etc. Idea is ,more data must be reserved for training purpose and we can have relatively less data for testing of model
In this way Train and Test data has it's own importance and care must be taken to keep them separate and independent.
Take away points:
- Training and Testing data must be kept separate always.
- Splitting can be done in the ratio 80:20 ,70:30 etc.
- Dataset comprises of features and labels
- Feature is input and label is output
To be Continued..............
Amazing.
ReplyDeleteKeep writing boy✨
Very well written. Informative and interesting
ReplyDelete