Uncategorized

I know how it feels to captain a doomed ship

I know how it feels
When you cannot  feel the rudder bite under you
When the rudder chains flail while you are at the helm
When the wheel looses its scaffolding in a dark stormy sea.

I know the helplessness of command in tossing ship
the nauseating ruckus in your stomach pit
When the seams on deck are slowly opening up
And the seas roar right under your feet

I know the terror that dissolves bones
The gnawing fear that rattles even sturdy knees
When your command,having taken in too much water
lists dangerously on the edge of doom
Miles from succor, leagues from land

I know the loneliness of an abandoned ship
The menacing grows of deck beams ripping apart
When you alone stand at the bridge
Long after cargo holds have been emptied
And the alleys and cabins are desolately soul-less.
When metals and timber scream and groan
Creaking and bending, submitting to the spell of the dark seas.

I know how it feels
To run, hat in hand, into the howling sea
With only dear life in tow, tossed by wind and sea
I know the pain of the gripping cold water
Darkness and uncertainty
When feet behind you, waters swallow your command

I know how it feels to captain a doomed ship

AD 2013

 

 

 

 

 

Standard
Uncategorized

How Popular is your music video

If love listening to music, over time, you might have started to THINK that I can recognize a hit song. There are some songs that just hit you and you know this is a winner. Given that music is probably the oldest form of entertainment, I am quite sure there are a few people out there who can spot ‘a hit’ music. Sometimes I think I might be one such person.

In fact, this spoting of hit song is so hot right now that computers are also trying it out. The craze is currently a new ‘science’ in machine learning called ‘Hit Song Science’  that tries to predict whether a song will be a hit or a flop based on such things as energy, tempo, danceability, loudness and other higher-level features such as harmonic simplicity (how simple the chord sequence is) etc. There is even a formula out for a hit song. There are, however, those who believe that this is not yet a science. And that there are a few things such as year, musical culture age etc which have a marked influence on popularity of music, that cannot not be gleaned from features in music.

A lot of that discussion is bound to continue, but I believe we can help the machines to predict how popular music can be by telling them how we see popular music. Sort of like training the machines. One of the ways we do this is by asking loads of whether they like some music or not. You are already doing this everytime you watch an online video/youtube. If someone is listening, like I am currently, we can have tones of decisions/intelligence from swams of people ( swam intelligence) to tell the machines how popular music looks like.

The way it would works is, everytime you watch a youtube video, you like it, or dislike it, you are telling the world your biases. The most basic metric for rating video or song popularity is the number of views (or listens). Its logical to assume that a video that has the highest number of views is equally popular.

However, so that the popularity score is estimated in the present, it is important to add a decay function so that as the video ages (length of time since published), you penalize old views more than newer views (old views are ranked lower than newer views).

New Picture (1)

This allows you to track the popularity of a video over time.

ENTER Y-RATER

Is a web app that allows you to track the popularity of youtube videos. Youtube has become, sort of, a defacto distribution channel for all artists. When you release a track, everyone rushes to youtube to listen to it.

New Picture

The Y-Rater app  tracks the video viewership over time and applies the decay function to give a score of the ‘hotest’ videos people are currently watching. The app uses the score to rank the videos (you can read more about how the scoring works here http://amix.dk/blog/post/19574) . At the moment, most of the videos are by Kenyan artists, though the app can work with any video on youtube.

It is still in development, so if you do not find a video you want to track, just suggest it and I will add it to the tracking list.

The data collected in this way can then be used in combination with features of music to determine what flops and hits look like.

Standard
Uncategorized

Reproducible Reports: Server performance reporting with brew and R

One of the hardest task, sometimes, is combing through millions of lines of server logs to find out just how your server is performing. I have always hated it, so I decided to do a one off repreducible reporting template so I do not have to go through access logs again. I just completed the skeleton template.  Now all I have to do is drag access.log file and hit run, and I get a complete report with what I need to know about my server performance at any time.

Basically, it give you a run down of your access statistics; How many queries are hitting your servers at a particular time of day, how many different IP’s, what is the average access time (how bad is your latency), Which scripts are causing you the latencies.  These, and other metrics you might find useful; as I did, can be found on my github repo . ( The code is very poorly commented, as I wanted use generate some piled up reports.)

IF you are like me, then feel free to use and extend. If you are on windows then you will have to download and install MIKTEX to be able to print the PDF reports.

Fill your boots!

Standard
Uncategorized

How it all began; kwetha and other stories

I recently found a document we once did at Doban Africa. It is an interesting document because it is probably the only surviving how-to- document of my very first start up, Kwetha; a cyborg answer engine – part human, part artificial intelligence. In those days, both the computing environment, power, and access were quite limiting. And so was our knowledge of artificial intelligence.

all hail @ishuah_, @blackorwa, @kevoh(Fungus), @yusaf4, @albert, @steven(starges) ++++ the guys who made it happen.

All hail the bird watchers! all hail hammerhead, fire-in-the-hole, and other beautiful algorithms that ran the kwetha cyborg!

Here is the kwetha how to

49011744-Untitled

 

 

 

Standard
Uncategorized

How many Kenyans earn upwards of 100K per month

Once in a while you come across a tweet that you read twice, then favourite;  This was one of those

Then this came through yesterday

And it immediately hit me that maybe this could answer, once and for all, how many Kenyans earn above 100K per month. I think that the original post on this issue was not empirical, and though the assumptions they made were plausible, there figure of  “ just under 20,000” seamed somewhat assumptious to me. This post hopefully, settles this (and I can now unfavorite the tweets!)

I will make a few assumptions of my own though. First, that banks, the main financiers in mortgage financing will look at the applicant’s ability to repay the loans (bank account details). And second, the report from which the second tweet is from (i think) is reliable.

Okay, now lets start with this tweet:

Kenya has about 8 million urban dwellers. Doing the math, that means about 80,000 Kenyans can finance 5.7M mortgage. To finance that kind of mortgage, you need to put a 20% deposit and need to have a salary of at least 100,000 per month.  

There you have it, about 80,000 Kenyans may be earning 100K a month (what they actually carry home is another thing all together)

My three pence thought

 

Standard
Uncategorized

Traffic filter project: road updates from twitter

Traffic in Nairobi is a big issue. Traffic jam is about Kshs 50M daily issue, and everyone wants a go at it. Disparate ways but all converging on one issue,  solving the traffic issue. Ma3Route, Twende twende, Here Maps, OverlapKe are just some of the solutions. Each approach, has its strength. And weakness.

There is no undervaluing the power of crowd-sourced information though. And here twitter wins. Apps that post to twitter like @ma3routes, @overlapKe are a great help if you need information. And more importantly, as twitter is like a dartsboard where people throw their frustration about traffic and other issues, if you listen keenly, maybe you can hear about the traffic situation on most roads. The problem comes when you have to either search through volumes of tweets (some irrelevant) to get the information you need. And with the huge numbers of #KOT and tweets about traffic this can be a real problem. Yet the abundance of tweets presents a valuable opportunity to build a filter.

I have been interested in how people tweet about traffic. I have routinely collected some 50K+ on tweets on traffic situation around Nairobi over three months period (some of the tweets can be found here). I will post the detailed analysis here sometimes in the near future. The biggest take out of this was that most people tweet about traffic situations to ask for updates, or to vent. When there is no jam, people tweet less (unless its an uncommon). Most traffic apps in Kenya today loosely satisfy these needs. The growth of these apps leaves a trail of big data which is ripe and albeit more useful for text mining and analysis.

Over the Easter weekend I decided to build a filter. My reasoning was quite simple. Build a twitter app which can be queried (via mentions) to give traffic updates about a particular road (and accidents). I was also interested in tracking cases of accidents within Nairobi roads. I decided to start this app with a road I use everyday; Thika Road

In a nutshell:

From the analysis of over 15k tweets, I created a hash table of some of the common words used to describe traffic with a scale from 1 to 10 reflecting the “badness” of the traffic situation.

trafficdf

 

Then I created a list of all bus stops (ordered) from Ruiru  to town (Ruiru = 1, KU = 2 … etc)

The logic of the algorithm behind the app works as follows:

1. Using the library(twitteR) search and extract 200 mentions of the word thika road or the list of bus stops .

2. Clean the data by removing periods, comas, #, $, and any other characters, as well as links and people mentioned and save.

3. Breakup each tweet and loop through the tokens searching for the word accident using cosine distance less than 0.15 and code these as accident

4. Search through the tokens for all the mentions of the bus stops and if it is the first mention record this as FROM, if second, TO, if only one mention record as AT.

5. Search through the tokens (cosine/jaccard distance) for the ranking of the traffic situation and code the nearest using the Pseudo column of and Linkert column in the guide traffic data frame.

 

another

 

6.  Get all the tweets that are coded as Accident and get the locations (AT) and the time tweeted. Tweet the accident if is has been less than 2 hours since the accident and is not similar to another reported accident (similarity in terms of location and time).

7. Get all the tweets that have traffic ranking and  tweet them if they are updates for less than 1hour ago, and have not been already reported.

Finally, the output is given at @RoadStats

I am currently working on trigger for mentions so that anyone can request for traffic update by using a hashtag and road.

Feedback and opinions; highly welcome!!!

Standard
Uncategorized

He’s dead, Jim! … musing on Open Data in the age of peta-data

I made a startling realization today. an aha moment if you like. In a few years, the clamor for open data will die out.

jim

Human behavior, and social economics, or at least my three pence thought on this issue, tells me as much. My reasoning is simple; we will only throw away what we no longer need (or ever will).

Over the past few years, the talk around open data, data science, big data … and all things data has grown louder. Everyone wants a piece of the action. Everyone wants to use their data to optimize processes, to make better decisions, to illuminate spending, … to be better. In advertently, what was previously trash (non-usable) is now being horded in abundant stores and servers. Suddenly, data has become that diamond in the rough. And with reactions reflective of its newly acquired status, the grip on data has tightened, and the grip is likely to grow tighter with every new spackle of the data diamond. We have collectively become compulsive hoarders

In 2012, every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created, with 90% of the world’s data created in the last two years … more

The question that arises is just how much of this data is open, or will ever be open. The reality is that almost a negligible percentage will ever be open, and even then, the data will have been hopelessly summarized and anonymized to be realistically usable. Human behavior suggests that we only throw out what we consider as garbage; what we do not hope to use; ever. And with such growth of data, the data dumps will be growing at an unprecedented rate; rates that will never be matched by the release.

Watching BBC documentaries trashopolis give a glimmer of hope. That even in the putrid mess of outdated, highly summarized, and distorted data, there will be those who will dive in and make some semblance of order, some meaning, some ideas. May be one may be able to build a beautiful island of trash data like these three amazing trash islands . May be, just may be, some George Waring(s)  will help make some order in the chaos of trash data.

Standard