Semi-Automated Text Cleaning in R
Cleaning real life textual data is hard. Weather it’s convention inconsistencies, manual data entry mistakes, or a myriad of other reasons, reaching a consistent representation is essential. This is most pronounced when dealing with categorical variables (like city or job title) where an inconsistent representation would give rise to superfluous categories that do not exist in reality.
Read the full post on Medium