In a
previous post, we saw that the number of retweets a tweet receives shortly after Guy Kawasaki posts it are a good predictor of the tweet's final
number of retweets. We also learned that the more we waited to make a prediction,
the more accurate we were at predicting the final number of retweets. But, how long should we wait to make a
prediction? In this post I will cover
how we can make this decision by balancing two priorities: how long Twitter
followers are willing to wait for a prediction, and how much accuracy we gain for waiting an additional minute. We
will see that waiting 25 minutes after a tweet is posted best balances
Twitter users’ need for recency, and prediction accuracy.

Twitter users are not willing to wait too long for a tweet. Although good data on how frequently people
check Twitter every day is not available at the time of writing, we can use

__Kawasaki’s reposting strategy__as a reference. Kawasaki reposts some of his tweets every 8 hours. So, it is reasonable to assume that he believes people will check their Twitter feeds about once every 8 hours. Therefore, we can’t wait more than 8 hours to predict the final number of tweets, a pretty low bar as we will see later on.
In contrast to Twitter user’s interest in recency, my
prediction model produces more accurate results the more we wait to make a
prediction. However, accuracy doesn’t
increase proportionally to how long we wait.
The more we wait the less accuracy we will gain for each additional
minute. To better understand these
diminishing returns, I measured the accuracy of waiting 5, 15, 25, and 35
minutes using three different measures.

The first measure is called out-of-sample proportion
of variance (PVE) and it tells us how well we can predict the final number
of retweets on data we haven’t seen yet.
Instead of single number, we run the PVE for different subsets of our
data and take the average. The second
and third measures are the Akaike
information criterion (AIC), and the Bayesian
information criterion (BIC). Much
like PVE, both models measure how well the model explains the data we want to
predict, but they do so a little differently.
We will use the three measures to triangulate the diminishing returns of
waiting to make a prediction.

The graph below shows the results of the three measures for models
that wait 5, 15, 25, and 35 minutes. The
graph shows the distribution of the out-of-sample PVE calculations, the orange
boxes and whiskers, and the average PVE, the thick black lines in the boxes. The table at the bottom shows the AIC and BIC
values for each model. We also include a Null model, which uses the mean final number of
retweets to make predictions.

The graph shows that as we wait longer to make a prediction,
the average PVE for each model increases and distribution of PVE values gets
tighter. However, the rate of average
PVE improvement decreases with time. The
average increase in PVE for waiting 15 minutes instead of 5 minutes is 73%,
while the average increase for waiting 35 minutes instead of 25 minutes is just
6%. So, it looks like we are reaching diminishing returns in accuracy at around 25 or 35 minutes .

The AIC and BIC values tell a similar story. For both AIC and BIC a lower value describes
a better model, so the table shows that the more wait the more accurate we will
be. However, with BIC and AIC, we do see
an inflection point. The AIC and BIC for
waiting 25 minutes is marginally lower than waiting 35 minutes. This inflection point matches the diminishing returns seen with the PVE.

So, the data has spoken and we have been lucky to find a Goldilocks
answer. Waiting 15 minutes seems to be too little, waiting 35 minutes was too
long, but 25 minutes seems just right. In future posts we will talk about how we can use more data and better tools to get even more accurate retweet predictions.

## No comments:

## Post a Comment