zaterdag 10 maart 2012

Four nines going quantum.

Four nines

A really interesting problem is the four nine problem. The problem is this:
You have to generate as much integers as possible. You have at your disposal 4 nines (9) and 5 operators (+,-,*,/,%). You may combine these as you please and you can use parenthesis at your hearts content. But you can only use four nines. You have to find all the integers and give the expression to generate them.
The most straightforward solution is to make a small language with four operators, create an abstract syntax tree and then generate all the solutions by rotating, shifting etc the tree. This works fine. It is a brute force method, but if you do it right, you won't generate invalid solutions. It works and it is extensible. But generation is difficult. Although testing is simple.
Normally one would search for a smarter method, but today I choose for a dumber, but interesting method to search a space for solutions, which is bigger than the solution space of the four nine problem. This space is the space of unsigned integers up to a size of 17 bits.

Encoding

How can we translate the original problem to a pile of natural numbers? Actually it is very easy. We encode every expression as follows:
A expression of the form (with arbitrary parenthesis) 9 o 9 o 9 o 9 can be represented by a bitstring.
First we enumerate the operators:
+ -> 000
- -> 001
* -> 010 
/ -> 011
% -> 100 

We can easily see, why this space is bigger than the actually space with solutions. Not all operators are valid. 111 for example is not an operator.
We can encode the numbers and the parenthesis together:
9 -> 00 
9) -> 01 
(9 -> 10 
(9) -> 11

Together we can write an expression as a bitstring:
9  |  +  | (9 | / | 9) | %  | 9
00 | 000 | 10 |011| 01 |100 | 00

Testing solutions

Generation is now easy. We just need to add one to the previous found potential solution. The first potential solution is 0, the second potential solution is 1. One bitstring has the size of 17 bits. We are essentially going to code this pipeline: generate new potential solution >>= test potential solution >>= print out solution. I have chosen perl6 for the task, because hyperoperators (see: http://perl6advent.wordpress.com/2009/12/06/day-6-going-into-hyperspace/ for a good explanation) are very handy when working with bit slicing code and because they have a coding contest running with this problem in it and to show the elegance of perl6. To help the code a bit, we create a special type to ensure we are not going below zero and we write the generation function.
subset Int::Positive of Int where { $_ >= 0 };

sub nextBitString (Int::Positive $x) {
        return $x + 1;    
}
This is essentially enough to start attacking the problem. Only have to print out and done. But not all solutions are valid solutions. So we need a function to weed them out. For example the bitstring:
)9  +  (9  /  9)  +   9 
10 000 10 011 01 000 00
Is not a valid expression. So we need a mechanism to check these strings.

Things we can check

The operators are always on a fixed position. They are one position 2, 7, 12. They are always 3 bits long and we know the invalid operators (101, 110, 111). We can check this in three lines of code with the help of hyperoperators:
    # for some solution $x
    my Int @inv = (5,6,7);
    my Int @pos = ($x <<+><< (2,7,12)) >>+&>> 7;

    if @pos >>===>> any(@inv) {
            return False;
    }
This looks very dense. So what happens here? First I make an array with the three invalid operators (line 1). The second line is the hard one. First I do a numeric shift to the right, while I distribute scalar on the left over the array on the right yielding a array again. ($x <<+><< (2,7,12) Then I am selecting only the first three bits, for 111 is 7, by distributing the scalar on the right over the array on the left, which yields an array again. Maybe spreading out is a better word. Now I have selected all the operators of the expression in one line of code. That is pretty cool. Then I do the same trick, I can use the hyperoperator version of the identity operator to check if any of the operators (@pos) are one of the invalid ones. If they are, I reject this solution.

Are we balanced?

The next thing we have to check is the parenthesis. For this we need a simple finite state machine with states: n = 0..Inf and the following transition function:
( n -> n + 1
) n -> n - 1
* n -> n 
The state machine has the following output functon:
f 0 = True
f _ = False 
And begin state n = 0. With this machinery, we can check if the parenthesis are balanced or not. Again in perl6 we can use for elegance hyperoperators to retrieve all the parenthesis: The parenthesis are on a fixed position (0,5,10,15) and have a fixed size (2 bits), thus reusing the same trick we yield the code:
@pos = ($x <<+><< (15,10,5,0)) >>+&>> 3;
The state machine is easily coded and we can even add a small short cut, to make it faster, because n < 0 is a invalid state and the state machine should immediately reject all the input.
    for @pos -> $pos {
        given $pos {
            when 1 {
                $n -= 1; 
            }
            when 2 {
                $n +=1;
            }
        }
        if $n < 0 {
            return False;
        }
    }
    if $n == 0 {    
        return True;
    } else {
        return False;
    }
Remember that '(' was encoded as 10 == 2 and ')' is encoded as 01 == 1.

Evaluating the bitstrings

The evaluator have to do two things. Evaluate the expression. And give a human readable version back. The easiest way is to generate a valid perl6 expression. Run it through eval and give back the string. The strategy to do this is simple. We select every parenthesis and every operator. Zip them together and output them as string. First an parenthesis then an operator.
    my Int @nrs = ($x <<+><< (15,10,5,0)) >>+&>> 3;
    my Int @ops = ($x <<+><< (12,7,2)) >>+&>> 7;
    my Str $out = "";

    for @nrs Z (@ops) -> Int $nrs , Int $op {
        given $nrs {
            when 0 {
                $out ~= "9";
            }
            when 2 {
                $out ~= "(9";
            }
            when 1 {
                $out ~= "9)";
            }
            when 3 {
                $out ~= "(9)";
            }
        }
        given $op {
            when 0 {
                $out ~= " + ";
            }
            when 1 {
                $out ~= " - ";
            }
            when 2 {
                $out ~= " * ";
            }
            when 3 {
                $out ~= " / ";
            }
            when 4 {
                $out ~= " % ";
            }
        }
    }
    return ($out, eval($out));

Testing and oops, it didn't work. It will only output expressions of the form 9 o 9 o 9. And more horror comes. Some of them are not valid. Why does this happen? The answer is simple. If you zip two arrays the following law always holds:
length (Z xs ys) == min(length xs, length ys)
So I need to lengthen the @ops array with one member. And it has to be a member, which doesn't evaluate to a string. SO the correct code must be:
    my Int @ops = ($x <<+><< (12,7,2)) >>+&>> 7, -1;
This solves the problem, because the length of the zipped array is now 4. The last operator gets never printed and we get valid expressions of the form: 9 o 9 o 9 o 9.

Patching it all together

The last thing we have to do is to generate all possible solutions. And then test them and print the valid ones out.
for 0..(2**17 - 1) -> $i {
    my Int $string = $i;
    my Bool $valid = validBitString($string);
    if $valid {
        my ($res, $un) = evaluator($string);
        "$res -> $un".say;
    }
    if $i % 1000 == 0 {
        "Done: $i".say;
    }
}
I hope you enjoyed this post. The whole script is on: http://hpaste.org/65101

donderdag 8 maart 2012

Using FFI to use a LibExiv2 in haskell

Haskell has a very well developed Foreign Function Interface. But one of the problems with the FFI is that it is specialized for C.

For a project in our company we needed an alternative to libexif, because the library has problems with reading binary blobs embedded in the exif data.

LibExiv2 is a perfect candidate to use, but gave rise to some technical challenges to overcome. The biggest challenge was, how to deal with auto_ptr's. We couldn't use the auto_ptr to create a StablePtr, which we can use with FFI in haskell.

First some basics of LibExiv2:

Exiv2 can open any image format. They provide a general function for opening files. This is Exiv2::ImageFactory::open. After that we can read the metadata with an object method. readMetaData. This reads the exif data from the image file. After we have done this, we can get an Iterator (some sort of Foldable-like structure) to search for certain keys and their values.

Exiv2::ImageFactor::open gives back an auto_ptr. An auto_ptr is a template class, which behaves like a pointer. But differs when a pointer leaves it scope, then it will be destroyed automagically and more important it is not a real pointer. It only behaves as a pointer, it memory address is stored in the template class.

So when we tried to pass the auto_ptr as StablePtr to the haskell side, we got an segmentation fault. After some research we concluded the following happened:

We retrieved the pointer from the auto_ptr. Then we passed this to the haskell side. But then the auto_ptr vanished from the scope, destroying the pointer in the process. Haskell then passed the original pointer back, but at that time it didn't exist anymore.

The solution was really simple. We created a wrapper around the auto_ptr:



struct Context {
Exiv2::Image::AutoPtr autoptr;

};


By carefully code, we let the auto_ptr never escape an existing scope. And converting only the pointer of the context to a haskell stableptr. This way we could safely send it to the haskell side.

As an example of how to use this trick, I will take some excerpts from the library:


/* This function opens a new image and creates and Exiv2::Image::AutoPtr
then wrap it in a Context to protect it from vanishing
*/
HsPtr openContext(const char * f){
std::string file(f);
Exiv2::Image::AutoPtr image = Exiv2::ImageFactory::open(file);
Context * c = new Context;
assert(image.get() != 0);
// assign the auto_ptr to the new context
c->autoptr = image;
c->autoptr->readMetadata();
// Cast it to a haskell pointer
return (HsPtr)c;
}

/* Later in the code we can restore the haskell pointer back to a context */

Context * ImagefromStablePtr(HsPtr p){
return ((Context *)p);
}

/* And then use it to get all the keys from the exif data (from haskell to haskell) */

HsPtr getExifKeysIterator(HsPtr p){
assert(p != 0);
Context * ctx = ImagefromStablePtr(p);
// As long as the autoptr is not leaving his context it will exists.
Exiv2::ExifData data = ctx->autoptr->exifData();
ExifIterator * it = new ExifIterator();
size_t size = data.count();
it->data = (char **)calloc(sizeof(char *), size);
it->size = size;
it->it = 0;
size_t counter = 0;
for(Exiv2::ExifData::const_iterator i = data.begin(); i != data.end(); i++){
std::string key = i->key();
it->data[counter] = new char[key.size()+1];
strcpy(it->data[counter], key.c_str());
counter++;
}
return (HsPtr)it;
}




The rest of the job was straightforward, except for the problem to let cabal handle the build process, but I am working on that. Advice would be appreciated.


You can view the code at bitbucket: https://bitbucket.org/eklerks/libexiv2bindings/src

I haven't include the build code. But manual building is not difficult. I will update the repos if I figured out, how to let it work with cabal.