In text analysis, the concept of compounds words is difficult to accurately capture, especially when performing term frequency analysis. To accommodate the identification and counting of a multi-term idea/concept, terms comprised of two or more words are combined into a unified term.
Arguments
- comment
string of characters (usually text) to be evaluated and unified. Alternatively, the column containing the comments can be passed to the function as often is done via
dplyr::mutate()
- clean
logical indicating if the function should also clean the comments to be unified; default:
TRUE
. Seeclean.comment()
for the details on comment cleaning.
Value
vector of comments where predetermined terms are unified (and possibly cleaned unless the user has changed the default). NOTE: Unified terms are camel case. For example, "wi-fi" is returned as "WiFi" and "burn out" is returned as "burnOut".
Author
Emilio Xavier Esposito emilio.esposito@gmail.com (https://github.com/emilioxavier)