Analyse der Nomen, welche den SkimeisterInnen in den Medien zugeschrieben werden —–

Die untenstehende Wörter wurden auf Basis der Resultate der Keyness Statistiken (i.e. Feature-Scores) ausgesucht:

\(\bullet\) favorit/topfavorit
\(\bullet\) konkurrent
\(\bullet\) spitzenfahr
\(\bullet\) spezialist
\(\bullet\) held
\(\bullet\) dominator
\(\bullet\) aussenseiter
\(\bullet\) teamleader
\(\bullet\) allrounder
\(\bullet\) hoffnungstrager
\(\bullet\) aufsteiger
\(\bullet\) nachfolger
\(\bullet\) uberflieger

Diese Wörter fielen mir bei der Keyness Statistiken besonders aus, den wenn sie im Output vorkommen, heisst es, dass sie bei einem Geschlecht im bezug zum anderen Geschlecht überrepräsentiert sind. Das keyness output wollte ich aber nicht direkt interpretieren. Zum einten funktionierte das Wortstemming nicht ganz korrekt und zum anderen in der Eigennamenentfernung wurden gewisse Wörter entfernt die ich hier trotzdem brauche.

Preliminaries —–

Daten —–

Für diese Analyse wurde nicht der ursprüngliche Datensatz gebraucht, da öfters innerhalb der Artikel andere SkifahrerInnen erwähnt werden und es somit unmöglich macht ein Artikel präzis einem Geschlecht zuzuordnen. Um die Analyse präziser zu machen, verwende ich für diese Analyse die Resultate der KWIC-Filterung. Das heisst, dass ich anhand der Namen der SkimeisterInnen die 15 Wörter vorher und die 15 Wörter nacher zusammengefügt habe und einem Geschlecht zugeordnet habe.

#import data
kwic_results_f_nomen <- read.csv("Inputs/kwic_results_f3.csv")
kwic_results_f_nomen <- as.data.frame(kwic_results_f_nomen)
kwic_results_m_nomen <- read.csv("Inputs/kwic_results_m3.csv")
kwic_results_m_nomen <- as.data.frame(kwic_results_m_nomen)

Analyse der Nomen nach Geschlecht —–

#dictionary der stammwörter
nomen <- c("favorit", "konkurrent", "spitzenfahrer", "spezialist",
           "held", "dominator", "aussenseiter", "leader", "allrounder", 
           "hoffnungsträger","aufsteiger", "nachfolger", "überflieger")

#leeres dataset erstellen
df_occurences <- data.frame(nomen = character(), 
                            frauen.total = numeric(), frauen.perc = numeric(),
                            manner.total = numeric(), manner.perc = numeric())

#loop der berechnet wie oft vorkommen
for (x in nomen) {
  ft = sum(str_count(kwic_results_f_nomen$text, pattern=x))
  fp = round(ft/length(kwic_results_f_nomen$text) * 100, 3)
  mt = sum(str_count(kwic_results_m_nomen$text, pattern=x))
  mp = round(mt/length(kwic_results_m_nomen$text) * 100, 3)
  df_occurences[nrow(df_occurences) + 1,] <- list(x, ft, fp, mt, mp)
}

Datensatz df_occurances:
\(\bullet\) nomen = liste der Nomen im Dictionaire
\(\bullet\) frauen.total = wie oft das Wort bei einer Skifahrerin vorkommt
\(\bullet\) frauen.perc = frauen.total / observations (wie viele Frauenerwähnungen)

Grafik —–

Nomen in unmittelbarer Nähe der Namen der SkifahrerInnen in Prozent (wie oft Wort vorkommt geteilt durch wie oft Frauen erwähnt worden sind)

gender_colors <- c("frauen.perc" = "#008081", "manner.perc" = "#FD7601")

df_occurences_long <- df_occurences %>%
  dplyr::select(-c(frauen.total, manner.total)) %>%
  pivot_longer(!nomen, 
               names_to = "sex", 
               values_to = "percent")

#write.csv(df_occurences_long, "Inputs/df_occurences_long.csv")
#df_occurences_long <- read.csv("Inputs/df_occurences_long.csv")

plot_nomen <- df_occurences_long %>%
  filter(!(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = reorder(nomen, desc(percent))) %>%
  ggplot(aes(x = name, y = percent, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  # geom_text(aes(label = paste0(percent, "%")),
  #           vjust = 0.3,
  #           size = 4,
  #           angle = 90,
  #           hjust = -0.2,
  #           position = position_dodge(width = 0.9),
  #           alpha = 0.7) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.7),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        plot.title = element_text(hjust = 0, size = 18),
        plot.subtitle = element_text(hjust = 0, size = 16),
        axis.title.y = element_text(margin = margin(r = 15))) +
  scale_y_continuous(labels = function(x) paste0(x , "%"), limits = c(0,1.5)) +
  labs(title = "Skimeister sind eher Helden, Skimeisterinnen eher Dominatorinnen",
       subtitle = "Vorkommen von typischen Bezeichnungen für SkifahrerInnen",
       x = "Bezeichnungen",
       y = "Vorkommen in % ",
       caption = "Daten: SWISSDOX@LiRI") +
  scale_fill_manual(values = gender_colors,
                    labels = c("manner.perc" = "Skifahrer", "frauen.perc" = "Skifahrerinnen"))
plot_nomen

In Counts:

Achtung nicht gleiche Anzahl Observationen nach Geschlecht!

gender_colors_count <- c("frauen.total" = "#008081", "manner.total" = "#FD7601")

df_occurences_long_count <- df_occurences %>%
  dplyr::select(c(nomen, frauen.total, manner.total)) %>%
  pivot_longer(!nomen, 
               names_to = "sex", 
               values_to = "count")

plot_nomen_count <- df_occurences_long_count %>%
  filter(!(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = reorder(nomen, desc(count))) %>%
  ggplot(aes(x = name, y = count, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  # geom_text(aes(label = paste0(percent, "%")),
  #           vjust = 0.3,
  #           size = 4,
  #           angle = 90,
  #           hjust = -0.2,
  #           position = position_dodge(width = 0.9),
  #           alpha = 0.7) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.7),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        plot.title = element_text(hjust = 0, size = 18),
        plot.subtitle = element_text(hjust = 0, size = 16),
        axis.title.y = element_text(margin = margin(r = 15))) +
  #scale_y_continuous(labels = function(x) paste0(x , "%"), limits = c(0,1)) +
  labs(title = "",
       subtitle = "",
       x = "Nomen",
       y = "Vorkommen",
       caption = "Daten: SWISSDOX@LiRI") +
  scale_fill_manual(values = gender_colors_count,
                    labels = c("manner.total" = "Skifahrer", "frauen.total" = "Skifahrerinnen"))

plot_nomen_count

Two count plots:

nomen_renaming_men <- c("favorit" = "Favorit",
                        "spezialist" = "Spezialist",
                        "leader" = "Leader",
                        "held" = "Held",
                        "hoffnungsträger" = "Hoffnungsträger",
                        "allrounder" = "Allrounder",
                        "dominator" = "Dominator",
                        "aussenseiter" = "Aussenseiter",
                        "spitzenfahrer" = "Spitzenfahrer",
                        "überflieger" = "Überflieger",
                        "aufsteiger" = "Aufsteiger")

plot_nomen_count_men <- df_occurences_long_count %>%
  filter(sex == "manner.total" & !(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = recode(nomen, !!!nomen_renaming_men)) %>%
  mutate(name = reorder(name, desc(count))) %>%
  ggplot(aes(x = name, y = count, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = count), vjust = -0.5, size = 4, color = "#FD7601") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.8),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        axis.title.y = element_text(margin = margin(r = 15)),
        axis.title.x = element_text(margin = margin(t = -10))) +
  scale_y_continuous(limits = c(0,2200)) +
  labs(title = "Für Männer (Insgesamt 153'242 Erwähnungen von männlichen Skifahrern)",
       x = "Bezeichnungen",
       y = "Vorkommen") +
  scale_fill_manual(values = gender_colors_count,
                    labels = c("manner.total" = "Skifahrer"))
plot_nomen_count_men

nomen_renaming_women <- c("favorit" = "Favoritin",
                        "spezialist" = "Spezialistin",
                        "leader" = "Leaderin",
                        "held" = "Heldin",
                        "hoffnungsträger" = "Hoffnungsträgerin",
                        "allrounder" = "Allrounderin",
                        "dominator" = "Dominatorin",
                        "aussenseiter" = "Aussenseiterin",
                        "spitzenfahrer" = "Spitzenfahrerin",
                        "überflieger" = "Überfliegerin",
                        "aufsteiger" = "Aufsteigerin")

plot_nomen_count_women <- df_occurences_long_count %>%
  filter(sex == "frauen.total" & !(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = recode(nomen, !!!nomen_renaming_women)) %>%
  mutate(name = reorder(name, desc(count))) %>%
  ggplot(aes(x = name, y = count, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = count), vjust = -0.5, size = 4, color = "#008081") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.9),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        axis.title.y = element_text(margin = margin(r = 15)),
        axis.title.x = element_text(margin = margin(t = -10))) +
  scale_y_continuous(limits = c(0,1500)) +
  labs(title = "Für Frauen (Insgesamt 121'276 Erwähnungen von Skifahrerinnen)",
       x = "Bezeichnungen",
       y = "Vorkommen") +
  scale_fill_manual(values = gender_colors_count,
                    labels = c("frauen.total" = "Skifahrerinnen"))

plot_nomen_count_women



combined_plot <- plot_nomen_count_men + plot_spacer() + plot_nomen_count_women +
  plot_layout(ncol = 1, heights = c(0.5, 0.04, 0.5))

combined_plot <- combined_plot +
  plot_annotation(
    title = "Skimeister sind eher Helden, Skimeisterinnen eher Dominatorinnen",
    subtitle = "Vorkommen von typischen Bezeichnungen für SkifahrerInnen",
    caption = "Daten: SWISSDOX@LiRI",
    theme = theme(plot.title = element_text(hjust = 0, size = 18, face = "bold",
                                            family = "Futura Bk BT"),
                  plot.subtitle = element_text(hjust = 0, size = 16,
                                               family = "Futura Bk BT",
                                               margin = margin(b = 20))
  ))

combined_plot

Both percent plots:

df_occurences_long

plot_nomen_percent_men <- df_occurences_long %>%
  filter(sex == "manner.perc" & !(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = recode(nomen, !!!nomen_renaming_men)) %>%
  mutate(name = reorder(name, desc(percent))) %>%
  ggplot(aes(x = name, y = percent, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5, size = 4, color = "#FD7601") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.9),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        axis.title.y = element_text(margin = margin(r = 15)),
        axis.title.x = element_text(margin = margin(t = -10))) +
  scale_y_continuous(limits = c(0,1.5)) +
  labs(title = "Für Männer (Insgesamt 153'242 Erwähnungen von männlichen Skifahrern)",
       x = "Bezeichnungen",
       y = "Vorkommen (%)") +
  scale_fill_manual(values = gender_colors,
                    labels = c("manner.perc" = "Skifahrer"))

plot_nomen_percent_men


plot_nomen_percent_women <- df_occurences_long %>%
  filter(sex == "frauen.perc" & !(nomen %in% c("konkurrent", "nachfolger"))) %>%
  mutate(name = recode(nomen, !!!nomen_renaming_women)) %>%
  mutate(name = reorder(name, desc(percent))) %>%
  ggplot(aes(x = name, y = percent, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5, size = 4, color = "#008081") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 0.9),
        axis.title = element_text(size = 14, family = "Futura Bk BT"),
        axis.text = element_text(size = 12, family = "Futura Bk BT"),
        text = element_text(size = 12, family = "Futura Bk BT"),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        legend.position = c(0.8, 0.9), 
        legend.justification = c(0.5, 1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "gray70", size = 0.5, linetype = "dashed"),
        panel.grid.major.x = element_blank(),
        axis.title.y = element_text(margin = margin(r = 15)),
        axis.title.x = element_text(margin = margin(t = -10))) +
  scale_y_continuous(limits = c(0,1.5)) +
  labs(title = "Für Frauen (Insgesamt 121'276 Erwähnungen von Skifahrerinnen)",
       x = "
       Bezeichnungen",
       y = "Vorkommen (%)") +
  scale_fill_manual(values = gender_colors,
                    labels = c("frauen.perc" = "Skifahrerinnen"))

plot_nomen_percent_women


combined_plot_percent <- plot_nomen_percent_men + plot_spacer() + plot_nomen_percent_women +
  plot_layout(ncol = 1, heights = c(0.5, 0.04, 0.5))

combined_plot <- combined_plot_percent +
  plot_annotation(
    title = "Skimeister sind eher Helden, Skimeisterinnen eher Dominatorinnen",
    subtitle = "Vorkommen von typischen Bezeichnungen für SkifahrerInnen",
    caption = "Daten: SWISSDOX@LiRI",
    theme = theme(plot.title = element_text(hjust = 0, size = 18, face = "bold",
                                            family = "Futura Bk BT"),
                  plot.subtitle = element_text(hjust = 0, size = 16,
                                               family = "Futura Bk BT",
                                               margin = margin(b = 20))
  ))

combined_plot