Skip to contents

Cleaning Posts

Functions for editing the text variable in place.

limpiar_accents()
Clean accented characters
limpiar_spaces()
Clean redundant spaces
limpiar_url()
Clean URLs from the text variable
limpiar_repeat_chars()
Clean repeated charaaaacters
limpiar_shorthands()
Clean shorthands and abbreviations
limpiar_tags()
Clean user handles and hashtags
limpiar_stopwords()
Clean stop words for visualisations
limpiar_slang()
Clean slang from multiple Spanish dialects
limpiar_recode_emojis()
Recode emojis with a textual description
limpiar_remove_emojis()
Completely Remove Most Emojis from Text
limpiar_emojis_es()
Replace emojis with a Spanish textual description
limpiar_pp_products()
Replace entities for the Peaks&Pit classifier
limpiar_pp_companies()
Remove known companies for pits & peaks
limpiar_non_ascii()
Remove non-ASCII characters except those with latin accents
limpiar_alphanumeric()
Remove everything except letters, numbers, and spaces

Removing Posts

Functions for removing unwanted posts entirely (rather than cleaning).

limpiar_duplicates()
Clean the text variable of duplicate posts
limpiar_retweets()
Clean retweets from the text variable
limpiar_spam_grams()
Remove posts containing spam-like n-grams

Utility Functions

Miscellaneous functions designed to speed up aspects of cleaning text.

limpiar_inspect()
Inspect every post and URL which contains a pattern
limpiar_na_cols()
Clean NA-heavy columns from a Data Frame or Tibble
limpiar_link_click()
Prepare a URL column to be clickable in Shiny/Data Table
limpiar_link_click_reverse()
Reverses (inverts) limpiar_link_click
limpiar_ex_subreddits()
Quickly extract subreddits from a link variable
limpiar_wrap()
Wrap strings for visual ease

Processing Parts of Speech

A collection of functions that collectively make up a Parts of Speech(POS) analysis and workflow.

limpiar_pos_import_model()
Import UDPipe models to begin Parts of Speech Analysis
limpiar_pos_annotate()
Annotate Texts for Parts of Speech Analysis using udpipe models.