library(DisplayR)
#> Warning in .setup_fonts(): Font 'Segoe UI' not found in system, download and
#> install individually to access full package functionality
#> Warning in .setup_fonts(): Font 'NeueHaasGroteskText Pro Md' not found in
#> system, download and install individually to access full package functionality
#> Warning in .setup_fonts(): Font 'GT Walsheim Pro' not found in system, download
#> and install individually to access full package functionality
library(dplyr)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
library(tidyr)
library(gt)
The functions in this package are built to mesh seamlessly with the {tidyverse} set of packages, created by Posit: https://posit.co/, particularly {gt} which gives R users the tools to create beautiful tables: https://gt.rstudio.com/
Introduction
Tables are an essential and often under-appreciated method of visualising data. Many data visualisation professionals eschew their usage as they are not considered visually engaging. However, they are often the ideal tool for the job. Many of us would do well to ask ourselves ‘Do I need these charts or should this be a table?’
You should consider using a table when:
- You need to display precise values
- You need to display multiple raw volumes and percentages in the same place
- You want users to compare individual elements
- Your data has multiple dimensions and plotting becomes convoluted
- You’re slide-limited e.g. because need to provide an executive summary of complex data
Apart from using tables to display or communicate findings, it’s important to be aware of how tables can aid hypothesis generation, showing you where to dig next in the data mining process.
Tabular Data
We’ll use the example data frame which ships with {DisplayR} and the {tidyverse} to create some summary tables, then we’ll use {gt} and some {DisplayR} functions to create aesthetically-pleasing tables.
df <- DisplayR::disp_example
Let’s say we need to visualise volume - raw numbers and %, as well as sentiment counts for a categorical variable - in this case we’ll use topic. We can create a summary table like so:
(summary_table <- df %>%
filter(!is.na(sentiment)) %>%
count(topic, sentiment, name = "sent_n") %>%
add_count(topic, wt = sent_n, name = "volume") %>%
pivot_wider(names_from = sentiment, values_from = sent_n) %>%
mutate(percent = round(volume / sum(volume) * 100, 2), .after = volume) %>%
arrange(desc(volume))
)
#> # A tibble: 9 × 6
#> topic volume percent negative neutral positive
#> <chr> <int> <dbl> <int> <int> <int>
#> 1 Conversational AI 1559 31.2 412 784 363
#> 2 AI Ethics & Society 798 16.0 341 332 125
#> 3 AI-Powered Creativity 637 12.7 210 292 135
#> 4 Risks & Challenges 631 12.6 394 178 59
#> 5 AI Search 380 7.6 123 182 75
#> 6 Coding & Assistance 364 7.28 115 167 82
#> 7 AI Performance 280 5.6 58 122 100
#> 8 AI & Business 268 5.36 57 110 101
#> 9 AI & Security 82 1.64 24 45 13
We could use various means to visualise this table, let’s try a data table, which is fairly common. We’ll style it with bootstrap allow filtering at the top of the table.
DT::datatable(summary_table, style = "bootstrap", filter = "top")
Whilst functional, the table is lacking in visual appeal; it’s better used for exploring data internally than it is for communicating findings or including in a deck.
Despite making it clear which topic has the highest volume, it would require considerable mental work to figure out which topic had the highest proportion of positive or negative sentiment. We’re also lacking vital elements such as titles, subtitles, and captions; as well as additional flourishes such as icons which can bring the table to life. Let’s see how we could visualise the output with gt.
GT
library(gt)
(my_gt <- df %>%
disp_gt(
date_var = date,
group_var = topic,
sentiment_var = sentiment,
sentiment_max_colours =
list(
"negative" = "#D83B01",
"neutral" = "#FFB900",
"positive" = "#107C10"
),
time_unit = "day",
table_title = "Topic Modelling Summary Table", source_note = "Data Source: xxx"
) %>%
gt::cols_hide("Sentiment x Time")
)
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_col()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).
Topic Modelling Summary Table | ||||||
topic | Volume | Positive | Neutral | Negative | Volume x Time | |
---|---|---|---|---|---|---|
Conversational AI | 1,559 | 23.3% | 50.3% | 26.4% | ||
AI Ethics & Society | 798 | 15.7% | 41.6% | 42.7% | ||
AI-Powered Creativity | 637 | 21.2% | 45.8% | 33.0% | ||
Risks & Challenges | 631 | 9.4% | 28.2% | 62.4% | ||
AI Search | 380 | 19.7% | 47.9% | 32.4% | ||
Coding & Assistance | 364 | 22.5% | 45.9% | 31.6% | ||
AI Performance | 280 | 35.7% | 43.6% | 20.7% | ||
AI & Business | 268 | 37.7% | 41.0% | 21.3% | ||
AI & Security | 82 | 15.9% | 54.9% | 29.3% | ||
Total | — | 4,999 | — | — | — | — |
Data Source: xxx |
This table is preferable to me for multiple reason:
The gradient fill for the positive & negative columns make comparison easy. The Volume x Time charts are a nice addition, and allow us to compare the trends of multiple topics at a glance, whilst also keeping volume + sentiment information an eye movement away. The overall styling is just more pleasant than the Data Table.
*note, there is a Sentiment x Time chart which has been hidden from view, we’re still experimenting with which charts work/ don’t work for the default outputs.
One of the drawbacks with gt is that we cannot easily output to PowerPoint, step forward flextable
Flextable & Exporting to PowerPoint
The function will not include the Volume x Time chart, but we will get a nice gradient fill for the Positive + Negative columns which we can export to PowerPoint.
(my_ft <- df %>%
disp_flextable(topic, sentiment))
Topic |
Volume |
Negative |
Neutral |
Positive |
---|---|---|---|---|
Conversational AI |
1,559 |
26.4 |
50.3 |
23.3 |
AI Ethics & Society |
798 |
42.7 |
41.6 |
15.7 |
AI-Powered Creativity |
637 |
33.0 |
45.8 |
21.2 |
Risks & Challenges |
631 |
62.4 |
28.2 |
9.4 |
AI Search |
380 |
32.4 |
47.9 |
19.7 |
Coding & Assistance |
364 |
31.6 |
45.9 |
22.5 |
AI Performance |
280 |
20.7 |
43.6 |
35.7 |
AI & Business |
268 |
21.3 |
41.0 |
37.7 |
AI & Security |
82 |
29.3 |
54.9 |
15.9 |
Total |
4,999 |
- |
- |
- |
We can preview how this would look in powerpoint with:
print(my_ft, preview = "pptx")
Topic |
Volume |
Negative |
Neutral |
Positive |
---|---|---|---|---|
Conversational AI |
1,559 |
26.4 |
50.3 |
23.3 |
AI Ethics & Society |
798 |
42.7 |
41.6 |
15.7 |
AI-Powered Creativity |
637 |
33.0 |
45.8 |
21.2 |
Risks & Challenges |
631 |
62.4 |
28.2 |
9.4 |
AI Search |
380 |
32.4 |
47.9 |
19.7 |
Coding & Assistance |
364 |
31.6 |
45.9 |
22.5 |
AI Performance |
280 |
20.7 |
43.6 |
35.7 |
AI & Business |
268 |
21.3 |
41.0 |
37.7 |
AI & Security |
82 |
29.3 |
54.9 |
15.9 |
Total |
4,999 |
- |
- |
- |