I know how it feels to captain a doomed ship

I know how it feels
When you cannot  feel the rudder bite under you
When the rudder chains flail while you are at the helm
When the wheel looses its scaffolding in a dark stormy sea.

I know the helplessness of command in tossing ship
the nauseating ruckus in your stomach pit
When the seams on deck are slowly opening up
And the seas roar right under your feet

I know the terror that dissolves bones
The gnawing fear that rattles even sturdy knees
When your command,having taken in too much water
lists dangerously on the edge of doom
Miles from succor, leagues from land

I know the loneliness of an abandoned ship
The menacing grows of deck beams ripping apart
When you alone stand at the bridge
Long after cargo holds have been emptied
And the alleys and cabins are desolately soul-less.
When metals and timber scream and groan
Creaking and bending, submitting to the spell of the dark seas.

I know how it feels
To run, hat in hand, into the howling sea
With only dear life in tow, tossed by wind and sea
I know the pain of the gripping cold water
Darkness and uncertainty
When feet behind you, waters swallow your command

I know how it feels to captain a doomed ship

AD 2013







How Popular is your music video

If love listening to music, over time, you might have started to THINK that I can recognize a hit song. There are some songs that just hit you and you know this is a winner. Given that music is probably the oldest form of entertainment, I am quite sure there are a few people out there who can spot ‘a hit’ music. Sometimes I think I might be one such person.

In fact, this spoting of hit song is so hot right now that computers are also trying it out. The craze is currently a new ‘science’ in machine learning called ‘Hit Song Science’  that tries to predict whether a song will be a hit or a flop based on such things as energy, tempo, danceability, loudness and other higher-level features such as harmonic simplicity (how simple the chord sequence is) etc. There is even a formula out for a hit song. There are, however, those who believe that this is not yet a science. And that there are a few things such as year, musical culture age etc which have a marked influence on popularity of music, that cannot not be gleaned from features in music.

A lot of that discussion is bound to continue, but I believe we can help the machines to predict how popular music can be by telling them how we see popular music. Sort of like training the machines. One of the ways we do this is by asking loads of whether they like some music or not. You are already doing this everytime you watch an online video/youtube. If someone is listening, like I am currently, we can have tones of decisions/intelligence from swams of people ( swam intelligence) to tell the machines how popular music looks like.

The way it would works is, everytime you watch a youtube video, you like it, or dislike it, you are telling the world your biases. The most basic metric for rating video or song popularity is the number of views (or listens). Its logical to assume that a video that has the highest number of views is equally popular.

However, so that the popularity score is estimated in the present, it is important to add a decay function so that as the video ages (length of time since published), you penalize old views more than newer views (old views are ranked lower than newer views).

New Picture (1)

This allows you to track the popularity of a video over time.


Is a web app that allows you to track the popularity of youtube videos. Youtube has become, sort of, a defacto distribution channel for all artists. When you release a track, everyone rushes to youtube to listen to it.

New Picture

The Y-Rater app  tracks the video viewership over time and applies the decay function to give a score of the ‘hotest’ videos people are currently watching. The app uses the score to rank the videos (you can read more about how the scoring works here http://amix.dk/blog/post/19574) . At the moment, most of the videos are by Kenyan artists, though the app can work with any video on youtube.

It is still in development, so if you do not find a video you want to track, just suggest it and I will add it to the tracking list.

The data collected in this way can then be used in combination with features of music to determine what flops and hits look like.


Reproducible Reports: Server performance reporting with brew and R

One of the hardest task, sometimes, is combing through millions of lines of server logs to find out just how your server is performing. I have always hated it, so I decided to do a one off repreducible reporting template so I do not have to go through access logs again. I just completed the skeleton template.  Now all I have to do is drag access.log file and hit run, and I get a complete report with what I need to know about my server performance at any time.

Basically, it give you a run down of your access statistics; How many queries are hitting your servers at a particular time of day, how many different IP’s, what is the average access time (how bad is your latency), Which scripts are causing you the latencies.  These, and other metrics you might find useful; as I did, can be found on my github repo . ( The code is very poorly commented, as I wanted use generate some piled up reports.)

IF you are like me, then feel free to use and extend. If you are on windows then you will have to download and install MIKTEX to be able to print the PDF reports.

Fill your boots!


How it all began; kwetha and other stories

I recently found a document we once did at Doban Africa. It is an interesting document because it is probably the only surviving how-to- document of my very first start up, Kwetha; a cyborg answer engine – part human, part artificial intelligence. In those days, both the computing environment, power, and access were quite limiting. And so was our knowledge of artificial intelligence.

all hail @ishuah_, @blackorwa, @kevoh(Fungus), @yusaf4, @albert, @steven(starges) ++++ the guys who made it happen.

All hail the bird watchers! all hail hammerhead, fire-in-the-hole, and other beautiful algorithms that ran the kwetha cyborg!

Here is the kwetha how to





Economics, Random Thoughts

Marriage Bill and the Law of Diminishing Marginal Utility

Utility is a very interesting concept in economics. Essentially it can be used to mean preferences. We prefer some things over others. Going to shop, an individual will pick one brand of say bread over another, or a loaf of bread over cake. In economics you can say that the utility of bread is more than the utility of cake.

Now, if you have not eaten bread for some time, you will really enjoy the first bread than if you had eaten some earlier. Each time you eat bread after eating it previously you will probably enjoy the bread less and less. You can generalize this for anything you like. In economics this is called the law of diminishing marginal utility. It can be represented by the graph below.

New Picture

Red line: represents Preference

Blue Line: represents Enjoyment derived from additional Q

You will note that while you prefer more loafs of bread than few (Red Line), you enjoy more and more                                                               bread less (you might even end up throwing some)

Suppose that the cost of a loaf of bread goes up, everything else remaining constant. The most likely reaction will be to reduce spending on bread. If you were buying two loafs, you might consider buying one etc etc.

Put differently; it would be cool to have a loaf of bread if you do not have one already, if you do, then its also great to have another, but having the one already, you do not enjoy having the second one as you did while getting the first, and if, perchance, the cost of buying a loaf increases, then you are less likely to buy a second.

Looking at this concept in the light of the new law on polygamous marriages, from an economics point of view, It would be preferable for men to marry many wives, however, for each additional wife, the man enjoys less of the women; and the additional lady is slightly less valued. And if the cost of living rises, the man is likely to dispense with one of the ladies.

And there it is; my three pence thought!


How many Kenyans earn upwards of 100K per month

Once in a while you come across a tweet that you read twice, then favourite;  This was one of those

Then this came through yesterday

And it immediately hit me that maybe this could answer, once and for all, how many Kenyans earn above 100K per month. I think that the original post on this issue was not empirical, and though the assumptions they made were plausible, there figure of  “ just under 20,000” seamed somewhat assumptious to me. This post hopefully, settles this (and I can now unfavorite the tweets!)

I will make a few assumptions of my own though. First, that banks, the main financiers in mortgage financing will look at the applicant’s ability to repay the loans (bank account details). And second, the report from which the second tweet is from (i think) is reliable.

Okay, now lets start with this tweet:

Kenya has about 8 million urban dwellers. Doing the math, that means about 80,000 Kenyans can finance 5.7M mortgage. To finance that kind of mortgage, you need to put a 20% deposit and need to have a salary of at least 100,000 per month.  

There you have it, about 80,000 Kenyans may be earning 100K a month (what they actually carry home is another thing all together)

My three pence thought



Traffic filter project: road updates from twitter

Traffic in Nairobi is a big issue. Traffic jam is about Kshs 50M daily issue, and everyone wants a go at it. Disparate ways but all converging on one issue,  solving the traffic issue. Ma3Route, Twende twende, Here Maps, OverlapKe are just some of the solutions. Each approach, has its strength. And weakness.

There is no undervaluing the power of crowd-sourced information though. And here twitter wins. Apps that post to twitter like @ma3routes, @overlapKe are a great help if you need information. And more importantly, as twitter is like a dartsboard where people throw their frustration about traffic and other issues, if you listen keenly, maybe you can hear about the traffic situation on most roads. The problem comes when you have to either search through volumes of tweets (some irrelevant) to get the information you need. And with the huge numbers of #KOT and tweets about traffic this can be a real problem. Yet the abundance of tweets presents a valuable opportunity to build a filter.

I have been interested in how people tweet about traffic. I have routinely collected some 50K+ on tweets on traffic situation around Nairobi over three months period (some of the tweets can be found here). I will post the detailed analysis here sometimes in the near future. The biggest take out of this was that most people tweet about traffic situations to ask for updates, or to vent. When there is no jam, people tweet less (unless its an uncommon). Most traffic apps in Kenya today loosely satisfy these needs. The growth of these apps leaves a trail of big data which is ripe and albeit more useful for text mining and analysis.

Over the Easter weekend I decided to build a filter. My reasoning was quite simple. Build a twitter app which can be queried (via mentions) to give traffic updates about a particular road (and accidents). I was also interested in tracking cases of accidents within Nairobi roads. I decided to start this app with a road I use everyday; Thika Road

In a nutshell:

From the analysis of over 15k tweets, I created a hash table of some of the common words used to describe traffic with a scale from 1 to 10 reflecting the “badness” of the traffic situation.



Then I created a list of all bus stops (ordered) from Ruiru  to town (Ruiru = 1, KU = 2 … etc)

The logic of the algorithm behind the app works as follows:

1. Using the library(twitteR) search and extract 200 mentions of the word thika road or the list of bus stops .

2. Clean the data by removing periods, comas, #, $, and any other characters, as well as links and people mentioned and save.

3. Breakup each tweet and loop through the tokens searching for the word accident using cosine distance less than 0.15 and code these as accident

4. Search through the tokens for all the mentions of the bus stops and if it is the first mention record this as FROM, if second, TO, if only one mention record as AT.

5. Search through the tokens (cosine/jaccard distance) for the ranking of the traffic situation and code the nearest using the Pseudo column of and Linkert column in the guide traffic data frame.




6.  Get all the tweets that are coded as Accident and get the locations (AT) and the time tweeted. Tweet the accident if is has been less than 2 hours since the accident and is not similar to another reported accident (similarity in terms of location and time).

7. Get all the tweets that have traffic ranking and  tweet them if they are updates for less than 1hour ago, and have not been already reported.

Finally, the output is given at @RoadStats

I am currently working on trigger for mentions so that anyone can request for traffic update by using a hashtag and road.

Feedback and opinions; highly welcome!!!