Dummy Variables & One Hot Encoding
Dummy Variables & One Hot Encoding
Dummy Variables vs One Hot Encoding
Dummy variable:
- You replace the categorical variable by different boolean variables (taking value 0 or 1) to encode whether or not the categorical value had a certain value. For encoding a categorical variable that can take k values, you only need k-1 dummy variables.
- Often used in more statistical domains as it uses the “correct number of degrees of freedom”.
One-hot encoding :
- You replace the categorical variable by a vector indicating “in which dimension” your variables lives. This vector will have k dimensions.
- Often used in CS domains.
Lets code
First of all we importing Pandas library
Here you can clearly see that our data set has a categorical data which is town column . so we know that categorical data can not acceptable in machine learning so we have to handle this problem and the solution is Dummy variable.
You can use the get_dummies method of pandas library which convert the town column in 0 & 1.
so, now we can use this dummy variable in our machine learning model.
if you want to download data set then click here.
You can download code by downloading my github repository.
👍
ReplyDelete