You'll want at the least a naive stemming algorithm (check out the Porter stemmer; there's out there, free code in the majority of languages) to approach textual content to start with. Hold this processed textual content along with the preprocessed textual content in two independent space-break up arrays.a+ opens for appending and examining, making