Traffic in Nairobi is a big issue. Traffic jam is about Kshs 50M daily issue, and everyone wants a go at it. Disparate ways but all converging on one issue, solving the traffic issue. Ma3Route, Twende twende, Here Maps, OverlapKe are just some of the solutions. Each approach, has its strength. And weakness.
There is no undervaluing the power of crowd-sourced information though. And here twitter wins. Apps that post to twitter like @ma3routes, @overlapKe are a great help if you need information. And more importantly, as twitter is like a dartsboard where people throw their frustration about traffic and other issues, if you listen keenly, maybe you can hear about the traffic situation on most roads. The problem comes when you have to either search through volumes of tweets (some irrelevant) to get the information you need. And with the huge numbers of #KOT and tweets about traffic this can be a real problem. Yet the abundance of tweets presents a valuable opportunity to build a filter.
I have been interested in how people tweet about traffic. I have routinely collected some 50K+ on tweets on traffic situation around Nairobi over three months period (some of the tweets can be found here). I will post the detailed analysis here sometimes in the near future. The biggest take out of this was that most people tweet about traffic situations to ask for updates, or to vent. When there is no jam, people tweet less (unless its an uncommon). Most traffic apps in Kenya today loosely satisfy these needs. The growth of these apps leaves a trail of big data which is ripe and albeit more useful for text mining and analysis.
Over the Easter weekend I decided to build a filter. My reasoning was quite simple. Build a twitter app which can be queried (via mentions) to give traffic updates about a particular road (and accidents). I was also interested in tracking cases of accidents within Nairobi roads. I decided to start this app with a road I use everyday; Thika Road
In a nutshell:
From the analysis of over 15k tweets, I created a hash table of some of the common words used to describe traffic with a scale from 1 to 10 reflecting the “badness” of the traffic situation.
Then I created a list of all bus stops (ordered) from Ruiru to town (Ruiru = 1, KU = 2 … etc)
The logic of the algorithm behind the app works as follows:
1. Using the library(twitteR) search and extract 200 mentions of the word thika road or the list of bus stops .
2. Clean the data by removing periods, comas, #, $, and any other characters, as well as links and people mentioned and save.
3. Breakup each tweet and loop through the tokens searching for the word accident using cosine distance less than 0.15 and code these as accident
4. Search through the tokens for all the mentions of the bus stops and if it is the first mention record this as FROM, if second, TO, if only one mention record as AT.
5. Search through the tokens (cosine/jaccard distance) for the ranking of the traffic situation and code the nearest using the Pseudo column of and Linkert column in the guide traffic data frame.
6. Get all the tweets that are coded as Accident and get the locations (AT) and the time tweeted. Tweet the accident if is has been less than 2 hours since the accident and is not similar to another reported accident (similarity in terms of location and time).
7. Get all the tweets that have traffic ranking and tweet them if they are updates for less than 1hour ago, and have not been already reported.
Finally, the output is given at @RoadStats
I am currently working on trigger for mentions so that anyone can request for traffic update by using a hashtag and road.
Feedback and opinions; highly welcome!!!