Our ability to get by with every-day life relies on all sorts of estimates relating to time. We estimate the time it takes to get ready in the morning and set the alarm clock accordingly. When walking we make judgments about how long we think it will take to cover the distance to the other side of the road and how long we think it will take any approaching vehicles to reach the crossing point. And so on.
Our estimates may be comparative in nature — I can cross the road in half the time it takes the car to reach the crossing — or they may rely on specific units of measurement — I need 45 minutes to get ready for work in the morning. Our choice of units of measurement in the latter case or when doing more formal measurements vary depending on context and scale. The second is used as a standard measurement of time across the world for periods of order of, well, a second. Milliseconds (thousandths of a second) may also be used for really short times. In everyday life there's not a lot of motivation to think about shorter time scales but for really fine measurements in scientific circles we have microseconds (millionths of a second), nanoseconds (billionths of a second) and more.
While conventional measurements for short periods of time all fit nicely in to a decimal system, measurement systems over longer periods of time are less elegant. As I'm sure you're aware, we have 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day and 365 (and a quarter) days in a year. And if that wasn't awkward enough there are also seven days in a week, 28-31 days in a month and 52 weeks and one day in a year. The concepts of days and years come from simple astronomical considerations (the rotation period of the Earth about it's own axis and about the Sun, respectively) while months originally related to lunar cycles. The seven days in a week may have come about from the astronomical mysticism of the Babylonians. The division of days in to hours, minutes and seconds is more esoteric, and traces back to the ancient Babylonians, Greeks and Egyptians. Attempts by the French in the late eighteenth century to introduce decimal time were not successful.
I've recently discussed the use of connected scatter plots, animation (in the form of GIFs though other possibilities are available) and small multiples as means of displaying datasets with a time component. However, the most common method is the line chart. Line charts can be great for highlighting trends and anomalies but the peculiarities of the time-keeping conventions discussed above can add a layer of complexity to any analysis. To illustrate this, take a look at the line chart showing 35 days worth of sales data of (fictional) "Product A" below:
Ticks every five days may be more conventional, but swapping to every 7 days makes things a little clearer.
What are we seeing here? Evidence of an upward trend with regular spikes and troughs relating to the day of the week? 35 days means 5 Mondays, 5 Tuesdays and so on. We can divide the data in to it's five weeks (we'll assume that day 1 is a Monday) and see how things change over the course of each week:
Sales are always lower on Sundays than every other day and higher on Saturdays than week days. Comparing week days is a little less clear-cut. Looking at the mean by day for all weeks combined is helpful here:
From this chart it seems safe to conclude that any difference between weekday sales is small in comparison to other fluctuations. Perhaps the weather, which doesn't care about our 7-day-week convention is more important?
We can use this data to alter the original line chart above by subtracting the corresponding mean for that day of the week from each data point:
There's a definite upward trend to start with. After around day 15 that trend seems to disappear.
An alternative to subtracting a day-dependent mean from each data point is to plot the seven-day rolling mean. As well as allowing us to keep the raw data in place, this also smooths out some of the short-term fluctuations.
Again we see a rise in the early days that appears to flatten out around half-way through the period.
There are other ways of "correcting" temporal data to remove short term effects (see here for example), but in Part 2 I will instead introduce a variant of the line graph - the cycle plot - that allows us to keep these short term cycles visible while still showing longer term patterns.