November | 2007 | Taking Initiative

I don’t know if I mentioned this before but my father is a math’s professor at the University of Pretoria. I’ll be doing my honors year next year, and I can maybe get credit for my one subject by helping my dad develop an algorithm/program to do discrete pulse transform in a 2d space. The inventor behind these transforms (also referred to as LULU operators) is a math’s professor at the University of Stellenbosch.

My role isn’t an important one, my task is to get my dad’s semi working (or not) matlab programs and rework and optimize them in c++, and then focus on maybe porting them over to a GPGPU format. So far this has been a tiring task since matlab programming is basically scripting, really badly formatted scripting. Now I have to take managed code written by my dad who has almost no experience in programming and convert it into a fast and efficient c++ program.

Architecture wise this is proven to be a nightmare as I don’t know c++ intimately and by that I mean know ever little trick in the language, and I’ve been working in c++ for years. I now have to think about the smallest performance problems, for examples will a lookup table really be faster or will the resulting cache misses result in it being slower, if you remember I tried using a lookup table in a previous project and it turned out to be a lot slower, I think this might be the reason.

The most embarrassing thing is that I work in the computer science department; this department produces almost as much computer science research papers as every single other department countrywide put together. And there is’nt really any one that I can talk to about extreme c++ optimization and techniques. There is way too much focus on software engineering and documentation, and not enough on the serious in depth topics concern with programming and its techniques. I’ve completed my undergraduate degree and the term cache miss has never cropped up, I think this is absolutely unacceptable. And now the department has moved almost all the undergraduate courses to java including the data structures and algorithms courses, the degree might as well be a bsc “java monkey” or a bsc “I can’t program anything complicated”.

The entire degree program and evaluation system here has just been one frustration after another, everything I’ve learnt I’ve had to do myself, and the courses where I’ve had a thorough understanding of the material and that interested me (namely AI and graphics) my marks were bad in, I’m just bad at memorizing material verbatim, I’m sorry but I’m really angry at this, I’m probably in the top 10 programmers in my year group and my marks are sitting towards the bottom of the spectrum just because my brain doesn’t work well with remembering exact phrases and dry theory too well. And the worst part is I’m one of the top programmers not because I’m a great programmer or super smart, but because everyone else here are idiots, it’s easy to memorize class notes and vomit them back up at the exam time. I thought it was just here but I’m starting to think that it might be the same everywhere…

Anyways I’ve gotten off topic with that rant about the sorry state of computer science in academia and my disillusionment with it.

I had a meeting with my dad and that professor from Stellenbosch and I felt like such an idiot, at least it seems in mathematics there are still people that have talents and know what there are talking about, that discussion was crazy intense. It’s awesome to actually feel like an idiot for a change, I don’t run into many humbling experiences in my field unfortunately. The 2d application of the LULU operators / DPT transforms hasnt really been explored and we dont really know if we’d be able to even apply it to a 2d image but i guess thats the goal of research. At least i’m learning things bit by bit…

Now I’m still carrying on with my conversions but I’m stuck here staring at a profiling and I’m have lot of idea for different implementations but I’m so tired of having to write 6 different versions and test them to see how they perform and then sit in front of google and hope to find out what causes the performance differences. I’d love to just meet someone that I can ask a ton of technical questions on compiler optimizations, cache techniques, storage of instructions in registers etc. I’m so tired of self study…

Guess it’s just wishful thinking at this stage. At the end of the day the cold reality is that I can’t rely on anyone here for advice.

I’m tired now and this rant has gone completely off topic, i’ll write up my initial progress in the weeks to come… I’ve also started with my game dev project, its going to be a total conversion for the UE3 engine, should be tons of fun and a nice challenge. I’m pretty much living my life one challenge after another…

Wow, it’s been a rough couple of weeks: I had to hand in my graphics project, study for a statistics test, fighting off my allergies (I hate spring) and then I had to study for my finals. At least I have my degree now.

Finally!

Anyways, I promised I’d write about the radon transformation I used to convert from the extracted images to a numerical format suitable for input into our neural network. This technique is extremely effective and is already used in industry for just such purposes. We tested it on demo day with very minimal data and it worked remarkably well.

Before I get knee deep in the technical aspects of the system, I need to mention this: due to the preprocessing done on the motion detected, there is no need for a complicated AI system; the radon transformation and the recursive feature extractor together remove a lot of noise and problems that may have been present otherwise. The radon transform especially helps as we have built in scaling so this does not have to be taken into account later on. Also from the results of the transformation, objects similar in shape have extremely similar radon transformations so the training time of the neural network was reduced as was the amount of hidden neurons necessary.

In the final demo we used a neural network with 408 inputs and only 4 hidden neurons, scary isn’t it. 😛

Introduction:

Now back to the nitty gritty: the radon transform. If you Google the “radon transform” you’ll probably get the Wikipedia page with a scary looking equation. I also got a fright the first time I saw this but after some research it’s really simple.

The basic idea of the radon transform (or my modified version thereof) is simple: if you look at your 2D image in the XY plane, you simply flatten the image onto the X axis (figure1), then divide the X axis into several beams and you work out the amount of pixels within each beam. Your output will be the pixel contributions of the object to each beam. Then you’d rotate the object and flatten it once again. Doing this for multiple angles will give you a very good representation of the objects shape.

Figure 1

The most basic (and unfortunately most commonly used technique) for image classification is to simple get the centroid of the object and then trace the outline of the object giving you a silhouette. Now this doesn’t sound so bad does it? Well, it is firstly it doesn’t handle broken up images well (not without major preprocessing or modification) and it also loses a lot of detail and can provide false matches. In figure1 below we have the radon transform of a solid circle and a hollow circle, a standard outline trace would provide the exact shape result for these obviously different shapes while as you can see the radon transform (in one projection) provides completely different results. Again this pre-processing will take the strain of the neural network (or other AI technique we’ll use for classification).

Figure 2

Okay now for the technical details: as you remember we flatten the image according to some projection. Figure2 shows some of these projections. Now if you look at figure 2 you might notice that the now flattened image’s top border can be seen as a graph of some function, so the amount of pixels in a beam is the approximate area under the graph between the left and right end points of the beam.

Now that picture is misleading as you might think that that it is a square object that we’ve rotated and flattened, but it is in fact a single pixel. The algorithm works on a per pixel basis. Instead of actually flattening the object, we simply work out the equation of the graph for a single rotated pixel and then use that to run through all the pixels in the object, work out the left most and right most and then add them to their respective beams.

Now some of you are screaming that if we just rotate the pixels it will be wrong as we aren’t rotating the entire object but that is taken into account later on.

Now how do we calculate the area under the graph for each pixel and how do we figure out what beam to add it to since a beam will have lots and lots of pixels in it? Also the beam widths will differ per object.

Figure 3

What we do is simply divide the beams into lots of sub-beams, so that multiple sub-beams pass through each pixel. Then for each pixel we work out the left most sub-beam and the right most sub-beam that passes through the pixel. This then becomes the domain for the equation of the graph we have earlier and we loop through each sub-beam, calculate the pixels contribution to it(the area under the graph) and then add it to the sub-beam total. This is shown in figure 3. What you also notice from figure 3 is that there is a small degree of approximation to reduce the calculations required for the area, but remember that we’re talking about fractions of a pixel here so the total error in approximation can easily be ignored.

Now for each projection we run through each pixel and add it to the appropriate sub-beam. Once this is complete we sum the sub-beams up into the initial amount of beams and then we divide each beam by the scaling factor. The scaling factor is simply the total pixels over the beam width; this reduces the total area for the beams to 1. So every object gets reduces to an n-beam representation where the sum of all beams is equal to 1.

Okay, my explanation is very basic and I’m sure mathematicians would point out various mistakes and so on , but I’m trying to make this easy to understand and to follow, it is not meant as a 100% mathematically accurate explanation, obviously if you wish to implement something like this, you wouldn’t only use my guide here as a reference. I’ve also left out some details but they should become apparent from the below explanations.

I’m struggling to find a good way to structure this guide so I’m just going to run through the algorithm simply just to finish off.

Preparation:

The first steps we need to take before we can process the object is to get the total number of pixels, work out the centroid and the approximate radius of the object. Using the radius we work out the amount of beams and sub-beams we need for the transformation. Remember that we want several sub-beams to pass through each pixel.

Projection:

Now we run each of our projection functions to calculate the sub-beams totals. I’ll run through the basic procedure for a projection:Work out the center of the pixel on the new axis (this where the rotation of the object comes into play)Work out the left most and right most sub-beams that pass through the pixel.For each sub-beam add the pixel’s contribution to it.

Note: For some equations there is an incline to the graph and so this needs to be calculated too, and processed separately. I.e. Work out the left most and right most sub-beams for the increasing incline and the decreasing incline and then using that work out out all the section separately.

Combine Sub-beams:

Now once all the sub-beams have been calculated, we work out the scaling factor which is: beamWidth /numPixels. We then sum all the sub-beams into beams (per projection) and multiply each one by the scaling factor. And that’s it. We have our complete numerical representation of our image.

Note: I used only 8 projections as I had very limited CPU time left at this stage of the project and had to limit the amount of processing that needs to be done, obviously more projections will be better but then again too many would be worse. A fine balance needs to be found, I personally think that 8 projections are more than sufficient for my needs. Again GPGPU programming would be so useful here!

C++ Source Code: https://github.com/BobbyAnguelov/RadonTransformer

Taking Initiative

Bobby Anguelov's Tech Blog

Month November 2007

Discrete Pulse Transforms and Rants…

Radon Transform