Geometric Mean

So what exactly is geometric mean. and where and why is it useful.

The definition of what it is rather simple.
If you want to find geometric mean of n numbers (positive integers), you multiply them all and take the nth root of the resulting product. Now, why would it be useful, and what’s the point of doing this. The positive integers is not really big difficulty in real life, (as in the worst case we can just shift/translate the origin for the axis with negative numbers)

Ok, Now imagine you have standard graph with the axes having very different limits. i.e: x axis varies from 0-0.5 while y axis varies from 0-100.
Now suppose you want to compare two(or more) objects/distribution both of which have measures along x and y. You can plot points in different colours(for diff objects) on the x and y, and then try to make a decision, based on what you want to pick.

That sounds fine till you think there are very few(say 5-10) different objects to be compared. What if you have say (100 laptops and 10 features) you want to compare them across?.
Ah now we’re in real trouble. How do we know which ones are which among the 100 colours, on top of that you have 5 graphs(for 10 features).

What we need is a way to combine these axes into one axis. Then we can go back to simple bar charts.

Here comes geometric mean to the rescue. If you look at the definition it multiplies the feature values which gives us a area(if 2 features), volume (if 3) or a n-dimensional volume value.
We can’t simply use this because, at the moment this value is biased towards features that have a higher range of values.
i.e: in the previous example y axis which ranged from 0-100 will simply wash out any differences in x.

So we take the (2 or 3 or n)th root of this value. In effect we have found normalized the axis range itself.

Note: the cool part here is we don’t need to know anything about the actual range itself. The nature of the operations on the valiues (i.e: product and nth root) itself ensures the final geometric mean value is equalized.

For an example I’ll pick laptop CPU Speed, Hard disk size, and RAM here’s a link..
If you look at it closely, while in the examples i have picked, while all the three pythagorean means don’t change ordinality/ranking of the laptop being compared, the Arithmetic mean gets dominated/boosted simply by raising Hard Disk space.
On the other hand the geometric mean, doesn’t get raised (as much simply) by raising the attribute with higher values.

It’s not really surprising, since the geometric mean is a exponential function and arithmetic mean is a linear function.

You can ignore the Harmonic mean for now, as it’s not at all clear what’s common among the laptops. I’ll later make another post/update detailing how harmonic mean can be used for this case.

One case where it is used is in finding F-Score for comparing predictive algorithms, statistical tests etc.

Share: Brisingr

You will have to do better than that, Rider, if you want to change my mind.”

“I don’t want to change your mind,” said Eragon. “I only want to make sure you have given due consideration to the implications of your decision and that you are not being overly hasty.”

(Moon+ Reader Pro v2.6.5, Brisingr)

Share: Brisingr

Good hunting, O Wild Ones. May the wind rise under your wings, may the sun always be at your backs, and may you catch your prey napping. And, Wolf-Eyes, I hope that when you find the one who left your paws in his traps, you do not kill him too quickly.
(Moon+ Reader Pro v2.6.5, Brisingr)

DRY, YAGNI and other principles.

Better version of those randomly thrown about terms. Taken from here

That the Temple is in need;

that the need is best met by code;

that the code does not yet exist;

that the existence may be achieved with reasonable effort;

that the effort is best expended now and by myself.

Share: Brisingr

“I have not the slightest idea. Tenga always had a question he was trying to answer. If he succeeded, he immediately chose another one, and so on. He may have answered a hundred questions since I last saw him, or he may still be gnashing his teeth over the same conundrum as when I left him.”

(Moon+ Reader Pro v2.6.5, Brisingr)

Share: Brisingr

To while away the day contemplating evils that might have been is to poison the happiness we already have

(Moon+ Reader Pro v2.6.5, Brisingr)

Bloom Filter — How the code looks and reads

And here comes bloom filter. This one’s not from any OSS project, but found here.

<<bloom.h>>=
#ifndef __BLOOM_H__
#define __BLOOM_H__

#include<stdlib.h>

typedef unsigned int (*hashfunc_t)(const char *);
typedef struct {
size_t asize;
unsigned char *a;
size_t nfuncs;
hashfunc_t *funcs;
} BLOOM;

BLOOM *bloom_create(size_t size, size_t nfuncs, ...);
int bloom_destroy(BLOOM *bloom);
int bloom_add(BLOOM *bloom, const char *s);
int bloom_check(BLOOM *bloom, const char *s);

#endif

Again, the data structure doesn’t tell the story, but hints some things. Infact, I think this is one of those “more algorithms than data structure” category math object, even though it’s more often called a data structure.
Here’s the code that actually does the clever bit.

#define SETBIT(a, n) (a[n/CHAR_BIT] |= (1<<(n%CHAR_BIT)))
#define GETBIT(a, n) (a[n/CHAR_BIT] & (1<<(n%CHAR_BIT)))
int bloom_add(BLOOM *bloom, const char *s)
{
size_t n;

for(n=0; n<bloom->nfuncs; ++n) {
SETBIT(bloom->a, bloom->funcs[n](s)%bloom->asize);
}

return 0;
}

int bloom_check(BLOOM *bloom, const char *s)
{
size_t n;

for(n=0; n<bloom->nfuncs; ++n) {
if(!(GETBIT(bloom->a, bloom->funcs[n](s)%bloom->asize))) return 0;
}

return 1;
}

Let’s take closer look at those two functions above.
1. bloom_add, Let’s ignore the for loop, as it seems to apply a function to the given value before setting the BF’s bit.
For the moment assume it’s one function.
The macro definition, finds the character corresponding(i.e: by taking modulo of the value/systems’ default CHAR_BIT value, as defined by limits.h.)

What does SETBIT do?(It takes in a pointer(array of chars, a value(after applying one of the hash functions defined to original value and taking a modulo of asize(filter size )
So far it has all been stuff, that’s unrelated to the interesting math part, but necessary for the <side effects galore> real world computation.
At the core of the logic, it’s straightforward:
a, Left shift the Char value( after finding modulo, for splitting the characters inside the char array)
b, Or it with the already existing char value.

2. For the bloom_check function,Ignoring the for loop, we find the GETBIT macro as the core functionality.
The GETBIT, simply returns the result of ‘AND’ operation on the existing value, and the new value.

So this is all fine and dandy, but why is this special? or for that matter, how’s it different from some other scheme.

We the modulo,(i.e splitting the input value into <sizeof(char)> sized parts ) does nothing to limit the size of the input value.
Here’s where that for loop we ignored comes in. It applies one hash function(or many) to the input. These hash functions do the job of transforming/mapping the actual unlimited sized input into a limited sized char array.

Here’s the hash function code that comes along with the example program at the site.

unsigned int sax_hash(const char *key)
{
unsigned int h=0;

while(*key) h^=(h<<5)+(h>>2)+(unsigned char)*key++;

return h;
}

unsigned int sdbm_hash(const char *key)
{
unsigned int h=0;
while(*key) h=(unsigned char)*key++ + (h<<6) + (h<<16) - h;
return h;
}

So how exactly does this hash function, take in a random set of characters and always produces a finite size/limited number of characters? And what are the trade offs? How does it maintain unique mapping under that condition? when does it fail?
Stay Tuned.

Share: Dune: House Corrino

All proofs inevitably lead to propositions that have no proof. All things are known because we want to believe in them.

(Moon+ Reader Pro v2.6.5, Dune: House Corrino)

Share: Dune: House Corrino

Shaddam needed someone he could trust to act without question, because there would be many questions

(Moon+ Reader Pro v2.6.5, Dune: House Corrino)

Quotes:House Harkonnen

The capacity to learn is a gift;

The ability to learn is a skill;

The willingness to learn is a choice.

(Moon+ Reader Pro v2.6.5, Dune: House Harkonnen)