Dummy Variables & One Hot Encoding

 

Dummy Variables & One Hot Encoding


Dummy Variables vs One Hot Encoding


Dummy variable:

  • You replace the categorical variable by different boolean variables (taking value 0 or 1) to encode whether or not the categorical value had a certain value. For encoding a categorical variable that can take k values, you only need k-1 dummy variables.
  • Often used in more statistical domains as it uses the “correct number of degrees of freedom”.

One-hot encoding :

  • You replace the categorical variable by a vector indicating “in which dimension” your variables lives. This vector will have dimensions.
  • Often used in CS domains.

Lets code 

First of all we importing Pandas library


Now we read the data from csv file using read_csv method of panda library


Here you can clearly see that our data set has a categorical data which is town column . so we know that 
categorical data  can not acceptable in machine learning so we have to handle this problem and the solution is Dummy variable.

You can use the get_dummies method of pandas library which convert the town column in 0 & 1.



so, now we can use this dummy variable in our machine learning model.

if you want to download data set then click here.

You can download code by downloading my github repository.












Comments

Post a Comment

Popular posts from this blog

Multivariate logistic regression in Python

Decision tree for titanic dataset in Python

K Means Cluster Algorithm