Twitter plays a crucial role in politics these days. Gone are the days of door-to-door campaigning and trying to reach the last man. Today, power is weilded by those who can tweet. A carefully worded 140 character phrase carries the ability to swing states and potentially change the course of an entire nation. The goal of our project is to analyse the impact of Twitter on society and understand how ideas are spread across a network.

In order to investigate this question, our proposal is to analyse the tweets posted by Trump on Twitter over time to discover what was the main focus of his campaign, how it changed and how the society reacts to his discourses and proposals.

The Trump Twitter Archive is a project that has been collecting Trump's tweets since 2009 but the main focus of this project will be the period of his campaign.

Data Overview

Twitter API provides information about tweets in JSON formart and in order to see which field means it is possible consult the twitter's documentation available here. In this analysis the main fields that will be used are { text, created_at, source, retweet_count, favorite_count}. The fields { coordinates, place } weren't used due to the high level of missingness.

The number of tweets aggregated by month is given in the chart below.

Count By Month The spot in the end of 2015 is regarding a TV Show that Trump used to host and it had a live show where he was twitting in the middle of the show.

Trump also tweets from different sources or devices and there is evidence that the style of tweet is different by device, the analysis about it can be found in this blog post. The volume of tweets over time can be seen in the chart below.

Devices By Month

Nowadays the tweet's source are mainly iPhone and Android and David Robinson's analysis infers that Trump should be using Android and his team iPhone.

The volume of tweets in one day of weekend is around 40% lower than during any working day. The number of retweets is also about the same as can be seems in the picture below, so the impact of a tweet seems to be not correlated with weekend's days. Devices By Month

If instead of retweets we look the favorited counting, it has the same behavior and the spearman correction between retweets amount and favorited is about 87%.

Writing Style

The manner of expressing thoughts is characteristic of a person that can change over time and also with the available resources to pass the message. Some example that might influenciate this behavior are: it is easier take longer sentences when you tweet from a computer instead of your cellphone, twitter's box is limited to 140 characters, depending on the historical moment more verbs are used instead of adjectives.

The chart below shows over time the average of number of standards to express opinions / sentiments such as { !, #, ?, url, emojis } over time.

Writing style over time

The only conclusion that it can be taken from this chart is that the amount of urls is followed by the number of #, which might be reference to others.

The same analysis over time was done with the amount of verbs, adjectives and adverbs extracted from the module pos_tag from nltk library but the proportions are pretty constant over time.

However when we look the writing style split by device it shows huge styling differences. The chart below show metrics such as number of { urls, words, verbs, adjectives }. In android device almost there is no urls, but the number of words and verbs are slightly higher than for other channels.

Writing style over time

Word2Vec

Word2vec is a method to predict the co-occurence matrix of words in a document, that means that it is able to capture the context of two words appears. After run word2vec it is possible to calculate the cosine similarity of two words and find which words are closer to each other. The table below show the closest words related with hot topics regarding Trump.

As it can be seen in the table, the word Hillary is related with { beat, crooked, obama } and war is related with { hillary, attacks, act }.

Sentimental Analysis

Humans are "generally" sentimental and emotional beings. Analysing the sentiment of someone's tweets' could tell us a great deal about the general thought process of the person. Here, we will present an in-depth analysis of the sentiments expressed by Trump on Twitter.

Let us look at the distribution of the number of tweets classified based on the sentiments expressed in them. The raw sentiment scores lie between -1 and 1, with -1 denoting the most negative sentiment and vice versa.

Tweet Count by Sentiment

Although majority (~43%) of tweets have been classified as neutral, a significant (~20%) are negative. Let us look at some of these tweets :

While some of the positive ones are as follows :

It would also be interesting to see the temporal pattern in the distribution of the number of tweets based on the sentiment expressed in them.

Tweet Count by Sentiment per quarter

While the number of tweets by Trump has decreased in general, the proportion of negative tweets has gone up, and more so, after he was elected as the president. Let us now look at how these tweets were perceived by the public. We will use the total number of retweets and likes (favorited) as a metric.

Tweet Count by Sentiment per quarter Tweet Count by Sentiment per quarter

It is interesting to note that while the total number of negative tweets is only about ~20% in total, the number of retweets and favorited/likes are almost similar across the three categories of sentiments. This could be indicative of the fact that people are more gullible and supportive towards negative tweets by Trump.

Source Analysis

It is a well known fact that Trump's official Twitter handle @realDonaldTrump is also accessible by his media and other supporting teams. Trump has been personally known to use an "Android" phone while his team either uses the Twitter Web Client or an iPhone. Since the Twitter API allows us to differentiate between the "source" of a tweet, it would be interesting to see if there are any differentiating patterns based on the source.

Trump and his team have clearly used numerous devices/sources for connecting with the world on Twitter. Here is a comprehensive list ordered according to total number of tweets originating from them : 'Twitter for Android', 'Twitter Web Client', 'Twitter for iPhone', 'TweetDeck', 'TwitLonger Beta', 'Instagram', 'Facebook', 'Twitter for BlackBerry', 'Twitter Ads', 'Mobile Web (M5)', 'Twitlonger', 'Twitter for iPad', 'Media Studio', 'Twitter QandA', 'Vine - Make a Scene', 'Periscope', 'Neatly For BlackBerry 10', 'Twitter Mirror for iPad', 'Twitter for Websites'.

Let us take a look at the active-ness of these different sources which can be quantified by the number of tweets in a given time period.

Devices By Month

"Twitter for Android" has been relatively the most active source from Q2 2013 to Q2 2016. Note that, Trump's presidential campaign officially started in Q2 2015 and he won the election in Q4 2016.

Since we are aware that Trump personally uses Android for tweeting, it would be interesting to analyse the statistics of sentiments in tweets based on the source.

Devices By Month Devices By Month

It is not surprising that compared sources other than itself, tweets from Android has greater proportion of negative tweets.

Topic Modelling

Trump has had many phases in his life, ranging from being a judge at a reality T.V. show, to being the President of the United States of America. We thought, it would be an interesting exercise to figure out what were the major thematic elements in his tweets through the years. Given his vastly different backgrounds throughout the years, we hoped to find significant differences in the theme of his tweets. And we weren't surprised. The technique used for topic modelling is Latent Dirichlet Allocation (LDA). Others like Non-Negative Matrix Factorization (NMF) and Hierarchical Dirichlet Process (HDP), but LDA's perofrmance yielded the most coherent results.

LDA outputs most significant 'n' words for 'm' topics each. Both parameters are chosen by the user. Initially, we took all the tweets for creating the input courpus for LDA. The resulting output was missing some major tweet elements that Trump is known for. For example, 'Hillary Clinton', 'Immigration' were entirely missing!

Incoherent Results with Full Corpus

We thought it might be because of the huge time range that we have chosen. We narrowed down next to last three months before the election, when competition between Hillary and Trump as at its peak. And voila! The infamous Crooked Hillary, BigLeagueTruth and MAGA make their appearance.

Coherent Results with Full Corpus

Better visualization technique for the entire corpus is to divide them into quarters for each year, and then have a look at the word-cloud for topic models. We will be visualizing both the word-cloud and the corresponding news stories from that period of time. Google News was used for the purpose of finding relevant news stories.

Topic Word-Cloud :

Topics by Quarter

After having a look at the word-cloud above, let's validate the results by looking at few news stories from those quarters :

2017 Q3
2017 Third Quarter, Key words - FraudNews, FakeNews

2016 Q4
2016 Fourth Quarter, Key words - Big League Truth

2016 Q1
2016 First Quarter, Key words - Ted Cruz, Iowa

2015 Q2
2015 Second Quarter, Key words - O'Reilly Factor

Timing

Trump - The Night Owl

Being an early riser has always been associated as a quality of success. While, yes, the early birds do catch some worms, night owls have perks of their own.

Trump, by his own admission , is a restless sleeper. But instead of twisting and turning in his bed to find just the right posture to sleep, he's more likely to indulge in some binge tweeting.

Using the twitter archive, we have plotted some interesting results on his late-night rendezvous. First up, we have plotted the frequency of his late night tweeting, late-night being defined as the time between 0000 to 0359 hours. As is clearly visible the graph below, the he has been a consistent late-night tweeter since early 2014, where he either bragged about 'Celebrity Apprentice' or questioned Obama's tax reforms. Upon announcing his candidacy, he has been more involved in making incendiary remarks about his rivals in his late-night goofups.

Trump's Late Nights On Twitter

Another interesting visualization informs us about his diminishing early-morning posts. As compared to 2014 and 2015, 2016 was the year, where Trump decided to take it slow, making far less tweets past the 3:00 A.M. mark. Maybe the presidential campaigns grueling schedule tempered his infamous insomnia. The heatmap below shows tweet-timings for various days of the year, from midnight to 3:59 A.M.

Over Three Years, Trump Often Stayed Up Late

And finally, a bit of thematic analysis on the tweet data, filtered down based on late-night time slot and Android device. The resulting word-cloud provides much more interesting insights on the kind of topics he preferred indulging late at night, revealing previously hidden 'top-words' like the infamous spat with Megyn Kelly, Sean Hannity, FoxNews etc. This resulting topic word cloud provides better insights mainly because we are narrowing down our timeframe (i.e. midnight, the time where Trump prefers to talk about topics he cares the most) and device (i.e. Android, from which Trump personally tweets).

What does Trump Talk About Late at Night? Most retweeted tweet :
Most favorited tweet :

Final Comments

In this data story, we dug deeper into Trump's tweeting behaviour and found out a whole lot of new and interesting information. Trump's tweeting style has been extremely varying throughout the years. As we saw in our analyses, there are a plethora of different devices that are used to post on his behalf with an equally varied grammatical writing style. We did capture many nuances associated with his twitter account, including his infamous sentiments about his rivals and questionable sleeping patterns. What we did find out in the end is that, being one of the most influential and highly controversial personalities, his twitter account was definitely very striking in its nature.

Being one of the most powerful persons in the world, it is always a good practice to keep his powers in check by making a continuous assessment of his behaviour on social media and how people are influenced by that. This can help us in predicting any kind of danger this powerful person might pose in the future, thereby making his actions more accountable and predictable.