The most common, let’s call it man-on-the-street analysis, revolve around the analysis of means (or averages as they call it). The challenge with average is that it may lead to very false conclusion. An interesting use, and probably conclusion, of this rampant error can be captured in the recent dialog of whether or not night traveling should be banned. Though, as usual, the conversation degenerated, at least on twittersphere, into the very hilarious and totally irrelevant #EngKamauLogic, it revealed flaws, or otherwise, in our thinking that banning night travels would reduce road accidents, and consequently deaths and injuries. Proponents of this notion have cited examples such as Tanzania where incidents of road carnage are rather few and have tied this to the Tanzania’s policy banning night travels.
I want to explore this question by questioning another assumption that there are generally more deaths and injuries as a result of road accidents during weekends than during weekdays. The intention is to show, statistically, why our ‘common statistics’ is wrong; and why banning night travels, (and weekend travels while they are at it) could be erroneous.
The data used in this analysis is taken from #opendatake portal (someone needs to provide more recent data). It comprises of 1354 recorded cases of road accidents across the country.
Standard descriptive analysis
Looking at the histogram above, there is temptation to suggest that Fridays and Saturdays are black days given the frequency of the accidents on those days. When you superimpose this to a strip chart (below), a few things come into perspective.
The strip chart puts the histogram into perspective. It suggests that while there Friday and Saturday register relatively the same number of accidents (Histogram), Saturday registers, averagely, more deaths than Friday. Also shows Mondays generally register fewer deaths (and accidents) than most other days, except for that one outlier Accidents in Kitui, on the Monday of 19th September, 2011. (http://www.standardmedia.co.ke/?id=2000043141&cid=159&articleID=2000043141). These are outliers. The distribution of the number of deaths on the strip chart would suggest that Saturdays and Sundays have relatively higher deaths, than most other weekdays. But is it right to assume that Saturday and Sunday are dark days? Note also the outliers, and the variation on the datapoints.
Distribution analysis (inferential statistics)
While analysis of frequencies (the very basic of analysis) is not really wrong (or totally erroneous when used correctly), a more robust approach is to look at the distribution of the data (inferential statistics) rather than the frequency (descriptive statistics). Analyzing the distribution allows more generalized and accurate conclusions to be made about the data. Distribution analyses compensate for outlier data points and extreme values. Distribution analysis can be used to test hypotheses/assumptions about a data.
I tested the hypothesis that there is no statistically significant difference between the number of deaths during weekdays or during the weekends. (Null hypothesis, number of deaths is the same whether it’s the weekend or weekday).
A crosstab of the data used is as below.
A non parametric test of independent samples (Done on SPSS) produced the following results.
NB: Mann-Whittney test suggest that we reject the hypothesis. However, given that the data violates at least two assumptions for a Mann-Whittney test, we will be committing a type II error if we use the results of the test. K-S tests, on the other hand, makes no assumptions, and consequently is the better test in this set up. Therefore we reject the alternative hypothesis that there is a statistically significant difference between deaths during the weekend and deaths during weekdays.
In short, the test show that statistically, accidents during weekdays and those during the weekend are at least 95% similar in the number of deaths and those hurt.
In conclusion, the above analysis shows how intuition and common statistics may lead to flawed conclusions. This conclusion must be taken with a few riders. First, the authenticity and correctness of this data is not ascertainable. Secondly, the cases used in this analysis only represent a smaller fraction of total accidents for 2011. According to police records, there were some 3000 deaths on the road that year. Our data only has about 1600. Thirdly, any statistical analysis is never really accurate, there are errors.
At least what this analysis shows is the error in concluding from common statistics of averages and means.
Elvis Bando (@levisdoban)