How to manage missing values?


You will find below a summary of Jacob Joseph post.
Original article is here:
http://www.datasciencecentral.com/profiles/blogs/how-to-treat-missing-values-in-your-data-1

How do you deal with missing values - ignore or treat them? The answer would depend on the percentage of those missing values in the dataset, the variables affected by missing values, whether those missing values are a part of dependent or the independent variables, etc. Missing Value treatment becomes important since the data insights or the performance of your predictive model could be impacted if the missing values are not appropriately handled.
Technic 1: deletion
Unless the nature of missing data is ‘Missing completely at random’, the best avoidable method in many cases is deletion.

Technic 2:imputation
a) Popular averaging techniques. Mean, median and mode are the most popular averaging techniques, which are used to infer missing values. Approaches ranging from global average for the variable to averages based on groups are usually considered.

b) Predictive Techniques. A predictive model could be used to impute the missing values for Device, OS, Revenues. There are various statistical methods like regression techniques, machine learning methods like SVM and/or data mining methods to impute such missing values.


Conclusion
Imputation of missing values is a tricky subject and unless the missing data is not observed completely at random, imputing such missing values by a Predictive Model is highly desirable since it can lead to better insights and overall increase in performance of your predictive models

Read more in the original post. 

Comments