One of the best things about Parrot is that it's not just for Perl implementors. Parrot 0.0.3 came with support for extensible data types that can be used to implement the types used in your favorite language. The mechanism by which these types are extensible is called the PMC.
The PMC, or Parrot Magic Cookie, type is a special data container for
user-defined data types. Because these user-defined types are
essentially implementations of a set of methods, we refer to them as PMC
classes. Currently, the legal PMC classes are the PerlInt,
PerlNum, PerlString, PerlArray, and PerlHash types. The
PerlInt, PerlNum and PerlString data types combine to form the
PerlScalar data type.
PMC registers, unlike the basic Integer, Number, and String registers,
must be specially allocated with the new P0, PMCType instruction.
Other operations like set P0,5 are handled by special functions that are implemented by the PMC class. The rest of this article is about how
to create your own PMC class implementation, alongside the PerlInt and
PerlHash data types.
For our example, we're going to implement a simple queue data
structure. Our queue will be a set of integers; the queue will grow when
an integer is assigned to it, and will shrink when an integer is read
from it. We'll use the PerlInt class as a basis, so it may be helpful
to look at some examples of operations that use it:
new P0, PerlInt # Create a new PMC in the 'PerlInt' class set P0, 1234 # Set the value of the PMC to 1234 set P0, "4567" # Set the value of the PMC to 4567 set P0, 12.34 # Set the value of the PMC to 12 set I0, P0 # Set I0 to the current value of the PMC print P0 print "n"
Note that no special instructions like set_string or set_float
were required to assign data of different types to the PMC. Each
instruction does the Right Thing given the initial type of the PMC. This
has several important consequences when designing new data types, the
largest of which is that it generally isn't necessary to add special
instructions to access data contained within a PMC.
On the other side, this means that PMCs should attempt to behave rationally in all situations. It's not an onerous requirement, but in some cases, rational behavior is hard to define. Queues are fairly simple to define though, in terms of behavior. A queue has one way to get data in and one way to get data out.
Since we can use one instruction for multiple classes, we'll use set Pn,In
to add an integer to the queue, and set In,Pn to get an
element out of the queue. The last operation we need to perform on a
queue is to determine whether the queue is empty. The PerlArray class uses
set In,Pn to return the length of the array into In, but we've
already decided to use that to get an integer out of the queue.
Instead of set In,Pn to determine how many elements are in the queue,
all we really need to know is whether the queue is empty or in use. For that,
we can use the handy boolean operator, if Pn,In. Here, the integer
register is actually the number of instructions to skip over if the
condition is true. We'll have it branch if the queue is empty.
So, our IntQueue data type will implement three instructions. First, the set Pn,In instruction will add an integer to the queue.
Second, when the queue is empty, if Pn,In will branch to the appropriate
offset. Finally, set In,Pn will dequeue the last integer in the queue
and place it into the appropriate integer register.
Some sample source using IntQueue may come in handy at this time:
new P0, IntQueue # Create the queue set P0, 7 # Enqueue 7 set P0, -43 # Enqueue -43 set I0, P0 # Dequeue 7 print I0 # Should print '7'. if P0, QUEUE_emPTY # Goto label 'QUEUE_emPTY'
Core Operations
Before forging ahead with the IntQueue, let's take a look at the core
operations file. Within your CVS tarball, open parrot/core.ops and
search for the set operations. While there are files such as
parrot/core_ops.c and parrot/Parrot/OpLib/core_ops.pm, this is the
master file. Changes in parrot/core_ops.c will be overwritten the
next time you build, so make your edits to parrot/core.ops.
Having said that, let's look at a sample PMC operation.
inline op set(out PMC, in NUM) {
$1->vtable->set_number(interpreter,$1,$2);
goto NEXT();
}
Since core.ops is split into a Perl and C source file, the syntax is,
of necessity, a mixture of Perl and C. The 'inline' declaration is a
hint to the JIT compiler, which is beyond the scope of the article.
Parameters also have hints for the JIT compiler, but the most important
bits here are the PMC and NUM tags, because these let the compiler
know what types this operation can take.
When preprocessing into Perl, the prototype is the only piece of interest, as the assembler only needs to know the name and parameter list in order to build the assembly code.
C preprocessing is a bit more complicated, but still fairly
straightforward. Tokens like $2 are replaced with the appropriate
code to access the declared parameter, and a few keywords like NEXT()
are replaced with code to return the next instruction in the stream.
With the exception of those tags, the rest of the code is pure C, with access to all of the Parrot internals. Of course, you shouldn't access such things as the register internals, but the rest of the C API is available, the most common APIs being located in parrot/include/parrot/string.h and parrot/include/parrot/key.h, the latter primarily being used for aggregate data structures.
The preprocessor, while slightly confusing, is much more flexible than
the current system of nested CPP macros that Perl currently uses, and
hopefully easier to understand.
Virtual Tables
The code above used a curious construct:
$1->vtable->set_number(INTERP,$1,$2);
Parameter $1 is a PMC, and since these are user-defined types, the
code simply can't assign $2 to $1, as the non-PMC operations would
do. Instead, each PMC has a table of function pointers assigned to it,
and the interpreter calls the appropriate function.
For example, assuming that the P0 register is being initialized by the
new P0,IntQueue instruction, the above code would run the set_number
member of the IntQueue class. Since the type of P0 is decided on at
runtime, the dispatch mechanism is completely independent of the parameter type.
What this means in the case of the IntQueue type is that no modifications
need to be made to the parrot/core.ops file.
Parrot Class Files
parrot/classes contains all of the PMC classes used by Parrot. Like the parrot/core.ops file, this too is preprocessed before final compilation, so all edits should be made to the parrot/classes/*.pmc files.
Creating a new class file from scratch is somewhat daunting, so we'll use an
existing class file to base IntQueue on. While IntQueue is an aggregate
type like PerlHash, the interface matches PerlInt closest, in that it
only deals with one element at a time.
Start by copying parrot/classes/PerlInt.pmc to
parrot/classes/IntQueue.pmc, and replace all instances of PerlInt
with IntQueue. There will be some additional C code necessary that
will be available in the sample source at the end of the article, but
not discussed beyond the API.
Registering the parrot/classes/IntQueue.pmc is done in two files. Add
the appropriate lines to parrot/global_setup.c to initialize the new
PMC type, and add the new vtable entry to
parrot/include/parrot/pmc.h. This is only done in the case of types
that are intended to be part of Parrot itself; when Parrot has the
ability to dynamically load PMC classes at runtime, a more flexible
mechanism will be derived for registering classes, but for now, we'll
pretend that IntQueue is going to be a core interpreter data type.
Within parrot/core.ops, the instructions the IntQueue type uses look like
this:
op new(out PMC, in INT) {
PMC* newpmc;
if ($2 <0 || $2 >= enum_class_max) {
abort(); /* Deserve to lose */
}
newpmc = pmc_new(interpreter, $2);
$1 = newpmc;
goto NEXT();
}
inline op set(out PMC, in INT) {
$1->vtable->set_integer_native(interpreter, $1, $2);
goto NEXT();
}
inline op set(out INT, in PMC) {
$1 = $2->vtable->get_integer(interpreter, $2);
goto NEXT();
}
op if(in PMC, in INT) {
if ($1->vtable->get_bool(interpreter, $1)) {
goto OFFSET($2);
}
goto NEXT();
}
Naturally, each of these call PMC vtable entries, and each one of these has to be implemented. As of this writing, the appropriate vtable entries as they are in parrot/classes/perlint.pmc look like this:
void init () { /* This is called from pmc_new() */
SELF->cache.int_val = 0;
}
void set_integer_native (INTVAL value) {
SELF->cache.int_val = value;
}
INTVAL get_integer () {
return SELF->cache.int_val;
}
BOOLVAL get_bool () {
return pmc->cache.int_val != 0;
}
Any code before the pmclass declaration in a parrot/classes/*.pmc file
is literally copied into the C source, so we'll use this area to store our data
structures and APIs. In order to make matters simple, we'll assume that the
following API is available for our use:
static CONTAINER* new_container ( void ); static void enqueue ( CONTAINER* container, INTVAL value ); static INTVAL dequeue ( CONTAINER* container ); static INTVAL queue_length ( CONTAINER* container );
The API should be fairly straightforward to use. Initializing the container is
done with new_container, which returns a pointer to our new queue data type.
Adding a new queue element is done with enqueue, and deleting an element is
done with dequeue. The queue's length can be found with queue_length.
The CONTAINER data type has to be stored somewhere, and we look into
parrot/include/parrot/pmc.h to find out where to store it. We find
the definition of the PMC structure to be:
struct PMC {
VTABLE *vtable;
INTVAL flags;
DPOINTER *data;
union {
INTVAL int_val;
FLOATVAL num_val;
DPOINTER *struct_val;
} cache;
SYNC *synchronize;
};
There are two areas we can store data: data is used as a general
dumping ground for a data type's internal data structures, and the
cache union is used for fast access to simpler data structures.
data is the right place to hang our CONTAINER structure.
Like most of the other files within Parrot, the IntQueue class is
also preprocessed. The major preprocessing done here is to replace the
SELF tag with a reference to the current PMC. In the rare case that
you need a reference to the current interpreter, that tag is INTERP.
Initializing the IntQueue class is done with the init member. Since we're
storing our queue in the data, we'll let the new_container
function hand us a pointer to our new queue, and save that.
void init () {
SELF->data = new_container();
}
Getting an integer out of the queue is done with the get_integer member.
This isn't meant to be production-quality, so we won't worry about error
checking. So, we'll simply return the integer from the container.
INTVAL get_integer () {
return dequeue((CONTAINER*)SELF->data);
}
Adding an integer to the queue is done with the set_integer_native member.
We'll simply use the enqueue function to place the integer onto the queue
like so:
void set_integer_native (INTVAL value) {
enqueue(SELF->data,value);
}
The final function we need to support is being able to determine whether the queue
is empty, and we use the queue_length function for that. The PMC member
function that does this is get_bool, and the code to access this is pretty
straightforward:
BOOLVAL get_bool () {
return queue_length(SELF->data) != 0;
}
This code has been checked in to the Parrot CVS, so feel free to look at the full version there. We've now walked through the major files needed to implement a Parrot Magic Cookie. Next time, we'll explore the functions needed to implement aggregate data types like hashes and arrays, and learn about the new garbage collection system.
In the meantime, if you want to play with implementing your own data types for Parrot, then take a look at docs/vtables.pod in the Parrot source tree for more information about the members that you can implement and how to design your own classes from scratch.



