Tuesday 13 August 2013

Get a B::Deparse piggy back through Gearmany

Gearman is a great tool to run asynchronous and/or distributed jobs over a cluster of machines. It's not a full general message queue ala RabbitMQ, but a rather minimalist piece of software that is very simple to use, does only one thing and does it very well.
To use gearman from your favourite language, the cpan provides two modules. The original Gearman and the more recent Gearman::XS. For this post we're going to use the original Gearman, but you should be able to implement the example using Gearman::XS without much trouble.
In a nutshell, writing for gearman is a two sides process, writing some client code and some worker code to actually do the job, with gearman just sitting in the middle and distributing the jobs:

Client(s) <--> Gearman Deamon <--> Worker(s)


Vanilla gearman

Let's have a quick pseudo code look at an asynchronous example inspired by the Perl package Gearman:

Worker:

 my $worker = Gearman::Worker->new(...);
 $worker->register_function('sum' => sub {my $job = shift; calculate sum of decoded($job->arg) and store it somewhere });
 $worker->work();

Client:

 my $client = Gearman::Client->new(...);
 my $handler =  $client->dispatch_background('sum', encoded([1, 2, 3]), { uniq => some string unique enough });
 ## wait for the job to be done
 while( my $status = $client->get_status() && $status->running() ){
   sleep(1);
 }
 .. Retrieve sum from the storage and print it ..

By the way 'encoded' and 'decoded' are entirely up to you, as long as they encode to bytes and decode from bytes. I personally use JSON, but it's a matter of taste (and performance but that's another story).
Also, I deliberately skip the task sets and synchronous mechanisms as this is not supposed to be a gearman tutorial :)
So here we go. You make sure that your gearman daemon is running, you fire up your worker script and while it's running, each time you run your client, it will print 6.
You feel great. You've written your first minimalist gearman application and you're ready to gearmanize the rest of your long running and/or easily distributable code.


Why this approach is a pain

So following the example, the temptation is great to just extend the worker like that:

 $worker->register_function('my_long_specific_thing', \&gm_do_long_specific_thing);

I guess that if you're reading this post, you probably already have something in your model code that is long and already packed into a function:

  package MyApp::Object;

  sub do_long_specific_thing{
    my ($self, $arg1 , ... ) = @_;
    ...
  }


So in reality your gearman specialised 'gm_long_specific_thing' will probably look like that:

sub gm_long_specific_thing{
  my($job) = @_;

  my $args = decoded($job->arg());

  my $application = .. Build or get application ..;

  $application->get_object( build object getting from args )->do_long_specific_thing( build arg1 , arg2 from the args);
}

Then you know the rest of the story. Every time you need something else to be gearmanized or every time you need to make a change to the arguments of one of your gearmanized method, you have to propagate your changes to the specific gearman registered functions. Your code has become a bit less maintainable, just because you want the benefits of gearman.

Fixing it

But what if.. you could write something like that:

$client->application_launch(sub{ my $app = shift;
                                  my ($oid) = @_;
                                  $app->get_object($oid)->do_long_specific_thing(1, 'whatever');
                                },
                              [ $oid ] );


No more worker specific code to write. Everything is done from the application's point of view, leaving the back-end details out of the way.
Want to change the API of do_long_specific_thing? No problem, just apply parameter changes where they appear in the code. No more headaches propagating the API change through the gearman specific methods.
Want to make your gearmanized process longer without changing do_long_specific_thing? No problem:

$client->application_launch(sub{ my $app = shift;
                                  my ($oid) = @_;
                                  $app->get_object($oid)->do_long_specific_thing(1, 'whatever');
                                  .. and something else ..
                                },
                              [ $oid ] );

What if you have another long thing to gearmanize? Well, you get the picture:

$client->application_launch(sub{ my $app = shift;
                                  my ($oid, $arg1) = @_;
                                  $app->get_object($oid)->do_another_thing($arg1);
                                  .. and something else ..
                                },
                              [ $oid, $arg1 ] );

Implementing it

The client 'application_launch' method

Thanks to B::Deparse, we can turn any sub into a plain string. The rest is trivial, so
here we go:

## In some object that wraps the $gearman_client
## you can also inherit if you prefer. But I like wrapping more, cause you can store utilities.
sub application_launch{
   my ($self, $code, $args ) = @_;

   $code //= sub{};
   $args //= [];

   my $gearman_client = $self->gearman_client() OR just $self;
   my $deparser = B::Deparse->new('-sC'); ## C style
   my $json = JSON::XS->new()->ascii()->pretty(); ## I like pretty. I know it's larger but well..

   my $code_string = $deparser->coderef2text($code);

   my $gearman_arg = $json->encode({ code => $code_string,
                                     args => $args });

   my $uniq = Digest::SHA::sha256_hex($gearman_arg);

   my $task = Gearman::Task->new('gm_application_do', \$gearman_arg, { uniq => $uniq });

   my $gearman_handler =  $gearman_client->dispatch_background($task);
   unless($gearman_handler){
    confess("Your task cannot be launched. Is gearman exposing the gm_application_do function?");
   }
   return $gearman_handler;
}

And that's pretty much it.

The worker 'gm_application_do' code

The worker code is very similar.

$worker->register_function('gw_aplication_do', sub{ _gm_application_do($application, shift) });
sub _gm_application_do{
    my ($app , $job) = @_;

    my $gm_args = $json->decode($task->arg()); ## Note you need a $json object.
    my $code_string = $gm_args->{code};
    my $code_args = $gm_args->{args};
   
    my $code = eval 'sub '.$code_string;
    $code || confess("EVAL ERROR for $code_string: ".$@); ## That shouldnt happen but well..

    ## This is not supposed to return anything, as we call that asynchronously.
    &{$code}($app, @$code_args);
    return 1;
}

Adapting to synchronous calls

Adapting that to synchronous calls is quite straight forward.
Remember gearman exposed functions should always return a single scalar.

Conclusion

gotchas:


  • This doesn't work with closures, so really your sub's should be pure functions and all parameters should be given as such.
  • As far as the magic goes, the parameters can ONLY be pure Perl structures; something that's serializable in vanilla JSON.
  • If you try to pass bless objects, bad things will happen.


Disadvantages:


  • Insecure. What if anyone injects sub{ destroy_the_world(); } in your gearman server. That's kind of easily fixed. Just implement some secure signing of the code in transit.
  • No strict control of what can be done through gearman. Developers enlightenment is the key here.
  • No strict control about what 'flavour' of gearman worker is running your code. Some people like to have specialised gearman workers exposing only a subset of functions. This can easily be fixed by adding a 'target' option to the application_launch method and exposing the same general purpose gm_application_do under different names on different machines. But again, choosing the right target falls under the developers responsibility.


To sum up the advantages:


  •  Stable gearman worker code that's decoupled from the application code itself.
  •  Flexible gearmanization of any application code you like.
  •  Clarity of what's going on. No more parameter encoding/decoding to write, and no API change propagation through what should be infrastructure only code.


Perl offers us the flexibility that empowers us to clearly separate code that deals with different concerns.
As developers, we should take advantage of it and build reactive, flexible and generic enough business components.
As an infrastructure developer, I don't really know what people are going to do with gearman, nor should I care too much. This approach let me concentrate on what is important: the stability, scalability and the security of my gearman workers.
As an application developer, I don't care that I have to encode my functions parameters in a certain way. And I don't want to bother changing code in some obscure gearman module when I make changes to my business code. What I want is to use gearman as a facility that helps me design the best possible application without getting on my way too much.

Hope you enjoyed this post.

Until next one, happy coding!

Jerome.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete