Saher El-Neklawy A blog

Semi-Automated Text Cleaning in R

Cleaning real life textual data is hard. Weather it’s convention inconsistencies, manual data entry mistakes, or a myriad of other reasons, reaching a consistent representation is essential. This is most pronounced when dealing with categorical variables (like city or job title) where an inconsistent representation would give rise to superfluous categories that do not exist in reality.

Read the full post on Medium

Writing your own dplyr functions

dplyr is awesome, like really awesome. The thing I like most about it is how readable it makes data processing code look. But how to write your own functions that make use of dpyrl convesions?

Read the full post on Medium