State of the Union Addresses: Easier to Understand, Still Difficult to Interpret

The State of the Union address is probably the most analysed recurring speech in media, as well as in academia. Countless journalists have given their take on the president’s yearly address and every major news outlet is obliged to write a witty analysis of the speech in front of congress. But how has the speech changed throughout history, and does it tell us something about the presidency of Obama?

First of all, let’s look at speech analysis for a moment. Everyone who says that State of the Union addresses are better analysed qualitatively than quantitatively is probably right. A lot of detailed contextual information is lost when we analyse text purely  in a quantitative manner. However, classic qualitative analysis has been done many times, so  if we are aware of the limited scope of simple word frequency calculations, we might gain something by analysing the presidents choice of words. Wordclouds, which through their widespread use regardless of the limitations of their informational value have earned a bad reputation among some data journalists, won’t be used, not because they offer no insight, but because they have been used to death by people who didn’t think five minutes about the validity of their approach, and alternative text analysis tools are readily available. So let’s look at the changing length and form of the SOTU Address:


As is clearly visible, after having  short period of very short addresses delivered in person, throughout the 19th century State of the Union addresses were written down and ever expanding in size. In 1913, things changed as Woodrow Wilson returned to a spoken form of the speech. Ever since, the length of the speech has stayed more or less constant, only interrupted in 1946 and 1973 by two written addresses, which fall a bot out of context.

Change of Audience

State of the Union addresses have undoubtedly undergone some change since they have first been held in the late 18th century. Not only have they become shorter, but also easier to understand, according the Flesch-Kincaid reading-level score. This test scores a text based on the length of words and sentences, and tries to be a good indicator for complexity, returning a grade level that indicates the years of education necessary to understand a text, a value above 17 is generally regarded as reading material for graduates. So while only a few educated men could understand James Madison’s 1815 address (score: 25,3), Obama’s latest speech can be understood by anyone who has made it past the 10th grade (score: 10,1).

The Guardian has done an impressive visualisation of this phenomenon, but sadly sets a typical clickbait title by calling the more recent addresses „dumber“, when all they do is measure word and sentence length, and not logical consistency or complexity. Additionally, the Guardian’s graphic completely undermines the fact that from 1800 to 1913, the State of the Union addresses have been delivered in written, not in spoken form, does not take the way of broadcast into account, and implies that the „smartest“ SOTU address in the last 30 years has been held by George W. Bush. Considering these factors, I decided to replicate the Guardian’s graphic, and improve it a little:


(speech length in words is represented by circle area size)

As we can see, the most complex speeches have not been delivered in person, but on paper. However, a clear trend remains: State of the Union addresses have generally become easier to understand. This might have to do with presidents trying to make a point not to a elite group of elected officials, which often have no problem understanding complex texts, but to the broader public. The changing and ever expanding means of communication and broadcasting might have helped in this regard:


This shows what we already suspected: with a greater audience comes a greater need for increased clarity and simpler, shorter words and sentences. The State of the Union address has developed from an almost academic declaration of legislative goals to a speech to the people, and the easier it is understood, the better.

Recovering from the Crisis

But for now, let’s take a look at the current president. Obama’s presidency has been defined by topics like the financial crisis, the following recession and recovery, and his signature piece of legislation, the Affordable Care Act, or „Obamacare“. If we take a look at the president’s addresses, and analyse each speech in regard of its tonality, we can get the following graph:


This graph shows the results of a Sentiment analysis, which distinguishes between positive and negative sentiments in a text, based on the Hu and Liu sentiment lexicon. and scores the sentences according to the words used. Sentences with no sentiment score (i.e. them did not have any words in them that were relevant to sentiment analysis) were excluded from the analysis. We get similar results for all speeches, which have a slightly positive tonality most of the time, and drift into negative regions only in the first two addresses Obama has given. In fact, the speeches seem to be getting more and more positive:


The aftermath of the financial crisis might have taken its toll in the president’s addresses: While the crisis has certainly had terrible consequences, that have certainly been part of the president’s assessment of the situation America finds itself in, the focus might have shifted towards a recovery plan, improving economic indicators and creating new jobs. It should be noted though, that any analysis of these results remains largely speculative, and identifying factors that influence the tonality of a State of the Union address is not a topic that could be adequately addresses in a single blog post.

This blogpost is part of the MA research seminar „Political Data Journalism“ at the University of Zurich, the course is taught by Prof. Dr. Fabrizio Gilardi, Dr. Michael Hermann and Dr. des. Bruno Wüest. The article was written by Nikolai Thelitz (, student nr: 09-724-626) and handed in on April 12, 2014, with a length of 882 words.

If you want to look at the analysis, here’s the replication archive (dataset & R-script)


Schreiben Sie einen Kommentar

Ihre E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahren Sie mehr darüber, wie Ihre Kommentardaten verarbeitet werden .