Looking at our chart we see that time-series data can be quite noisy, with a lot of up and down spikes. This can sometimes make it difficult to see what's going on.
A useful technique to make a trend apparent is to smooth out the observations by taking an average. By averaging say, 6 or 12 observations we can construct something called the rolling mean. Essentially we calculate the average in a window of time and move it forward by one observation at a time.
Since this is such a common technique, Pandas actually two handy methods already built-in: rolling() and mean(). We can chain these two methods up to create a DataFrame made up of the averaged observations.
# The window is number of observations that are averaged roll_df = reshaped_df.rolling(window=6).mean() plt.figure(figsize=(16,10)) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.xlabel('Date', fontsize=14) plt.ylabel('Number of Posts', fontsize=14) plt.ylim(0, 35000) # plot the roll_df instead for column in roll_df.columns: plt.plot(roll_df.index, roll_df[column], linewidth=3, label=roll_df[column].name) plt.legend(fontsize=16)
Now our chart looks something like this:
Play with the window
argument (use 3
or 12
) and see how the chart changes!