|
このページは大阪弁化フィルタによって翻訳生成されたんですわ。 |

Planet Parrot is an aggregation of select Parrot related blogs. You won't find anything about birdseed or molting here. The list of contributors changes periodically. You might also like to visit Planet Perl or Planet Perl Six for more focus on their respective topics.
Planet Parrot provides its aggregated feeds in Atom, RSS 2.0, and RSS 1.0, and its blogroll in FOAF and OPML
There is life on other planets. A heck of a lot, considering the community of Planet sites forming. It's the Big Bang all over again!
This site is powered by Venus.

Planet Parrot is licensed under
a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States
License. Individual blog posts and source feeds are the
property of their respective authors, and licensed
independently.
The Black-headed Parrot is ... most often found in pairs or small noisy flocks of up to 10 individuals, but sometimes up to 30.
--http://en.wikipedia.org/wiki/Black-headed_Parrot
On behalf of the Parrot team, I'm proud to announce Parrot 4.1.0,
also known as "Black-headed Parrot". Parrot
is a virtual machine aimed at running all dynamic languages.
In my post a few days ago I mentioned Google Summer of Code 2012 and gave a lightning list of simple project ideas that might be worth pursuing. Today I’m going to expand on one of these ideas because it’s fertile ground for many possible GSOC projects, including the possibility of several projects concurrently if we have multiple students interested in it.
Parrot has a lot of introspection ability, but we don’t really have the tools necessary to introspect bytecode. We need some kind of tool that, given a Sub PMC or a PackfileView PMC or similar will be able to provide a disassembled representation of the actual opcodes. Here’s a basic code example of what I am talking about:
function foo() { var x = 2; ... }
function bar() {
var disassem = new Parrot.PACT.Disassembler();
using foo;
var raw = disassem(foo);
var reg = raw.registers(); // Get register counts
var lex = raw.lexicals(); // Get info about lexicals
var constants = raw.constants() // Referenced constants
var ops = raw.opcodes(); // Symbolic Opcodes
say(ops[0]); // "set_p_i, $P0, 2", etc
}
These are just some random ideas and not all of them are necessary to implement. The most important part, in my mind, is getting a list of symbolic Opcode PMCs. Each Opcode PMC would have this general form:
class Parrot.PACT.Opcode {
var opname; // The name or short name of the op
var opnumber; // The number of the opcode
var oplib; // The oplib which owns it
var args; // Array of Arguments
// Either Register or Constant
...
}
I prefix the disassembler classname with the namespace “Parrot.PACT” because eventually this should be an integral component of the PACT library. When we use PACT to assemble packfiles (and, ultimately, bytecode files) we’ll be constructing a list of these Opcode PMCs and then using a serializer to write them down to raw bytecode.
Serializer
Array of Opcodes ------------> Packfile Bytecode Segment
Deserializer
Bytecode --------------> Array of Opcodes
An excellent proof of concept system would combine these two mechanisms together into a faithful round-trim assembly/disassembly mechanism. In fact, there are multiple little potential projects here that can be arranged and ordered/prioritized to create a summer-long project or many:
There are lots of ideas here and I’ve still only scratched the surface. My goal with this post is to show how fertile this ground is, how much available work there is to be had and how many new features we desperately need.
Here’s a basic flow graph of things I’m envisioning as eventual parts of PACT or its close cousins. This will show the kinds of components that PACT may either eventually contain or serve as the common substrate for:
PIR and PASM Code
|
Optimizers Analyzers <-+ | +-> Debugger/Live
^ | ^ | ^ | | | Interpreter
| V | V | | V |
HLL code -> AST -> Control Flow Graph -> Opcode stream -> Packfile
^ | ^ | ^ ^ |
| | | | | | |
HLL <----+ +---------+ | +-----------+
(Decompiled) | |
| |
PIR Code <------+
(Disassembly)
One day when I have more time I may try to put this together into a real image of some variety. ASCII graphics were good enough for our digital ancestors and they will suffice here for a first draft. As you can see this graph contains several components, any one of which or any small subsection might make for an interesting and extremely rewarding project over the summer. This also ignores the inherent complexity and layered architecture possible in things like the AST transformations and optimizations, register allocation, etc. My point is that even the blocks on the graph above can be further decomposed into a variety of smaller but still interesting projects. If any of this stuff looks interesting to you, please get in touch ASAP so we can start talking and planning. Obviously this is more work than one person will do in one summer, so we want to make sure we are coordinating between all interested parties.
I think that if we start on the left side of this chart and implement the routines for reading from and writing to packfiles first, we can start building layers of additional functionality on top of them. This gives us an ability to break such a big system up into managable parts, to complete some of those parts in small summer-sized chunks, and to be able to use intermediate implementations to solve real problems while we wait for the rest of the system to grow and mature.
If we had multiple students interested in working on PACT in one capacity or another it would be an awesome way to maximize developer resources and help push forward the idea of code reusability. I’m really excited about this whole area and would love it if some students were interested in it too.
You can compile Winxed code with Winxed itself! What’s that you say? The Winxed compiler is bootstrapped and self-hosted, and is written in Winxed and already compiles winxed? Well, that’s all true. Sort of. However there is one small caveat: The Winxed driver program historically has not been able to perform the last step of compilation. The driver compiles winxed code down to PIR, but then uses the spawnw opcode to invoke an instance of Parrot to compile the PIR down to PBC.
I’m pleased to say that this last step is no longer necessary. At least, not in Winxed master (which has not yet been snapshotted into Parrot core).
Here’s a small toy compiler driver that uses Parrot’s PackfileView PMC to compile a .winxed file down into .pbc without spawning any child processes:
function get_winxed_compiler(string pbc_name = "winxedst3.pbc")
{
var wx_pbc = load_packfile(pbc_name);
for (var load_sub in wx_pbc.subs_by_tag("load"))
load_sub();
return compreg("winxed");
}
function main[main](var args)
{
var wx_compreg = get_winxed_compiler();
string winxedcc_name = args.shift();
string infile_name = args.shift();
string outfile_name = args.shift();
string code = (new 'FileHandle').readall(infile_name);
var pf = wx_compreg.compile(code);
pf.write_to_file(outfile_name);
}
That’s less than 20 lines of Winxed code to get the Winxed compiler object loaded, to compile the code and to output the PBC to file. We can make this better, of course, by being more flexible in the handling of arguments and printing out basic help and error messages and all that stuff. Eventually we are going to update the winxed executable itself to use this trick instead of spawning the child process. This should, I hope, have a noticably beneficial effect on compiling with Winxed from the commandline. For large, long builds like Rosella has, any speed improvements are appreciated.
One particularly interesting tidbit to notice is the very first line: A new syntax for handling optional parameters. I put a patch for that feature together last week and NotFound decided he could do the same thing but better than I did. So, the latest Winxed compiler (again, not yet snapshotted into Parrot Core) supports this syntax for providing default values to optional arguments. I hope that this feature is included with the 4.1 release next week. Here are some examples of the new feature in action:
// This...
function foo(var bar [optional], int has_bar [opt_flag])
{
if (!has_bar)
bar = default_bar_value();
...
}
// ...is the same as this
function foo(var bar = default_bar_value())
{
...
}
The initializer can be any expression value, including expressions involving previous arguments:
function foo (var bar, var baz = bar.some_method(bar))
This new syntax probably has a few kinks to work out still, but it’s a very cool and very appreciated addition. I’m hoping to use this new syntax to clean up a lot of code in Rosella and hopefully in Jaesop too.
I had a precious few hours to myself yesterday and was able to do some updating work on Rosella’s Template library. I was able to use the time to implement a feature I had wanted for a while: template compilation. You can now compile a template into winxed code or even compile it all the way down to a Packfile. Actually, that’s sort of a lie. The Winxed compiler .compile() method returns an Eval PMC not a PackfileView PMC. I’m going to submit a patch for that soon, and when I do you’ll be able to save the template to a .pbc file.
Here’s how to use the Template library in the basic, interpreted way:
var engine = new Rosella.Template.Engine();
string output = engine.generate(template, context);
The template is a string with a format that I’ve demonstrated before, and the context is any user-defined data structure that you want to use to populate the variables in the template. I won’t go into detail about those things in this post. Now, after my recent changes and additions, you can compile your template to an executable Sub:
var engine = new Rosella.Template.Engine();
var sub = engine.compile(template);
string output = sub(context);
Or, if you really want to see the generated winxed code, you can get that:
string wx_code = engine.compile_to_winxed(template);
The compilation process does take some time, there’s no way to deny that. There are ways to mitigate that expense, of course. You can compile ahead of time and save the code to a file or even a .pbc and execute that later. There are several strategies if you’re really interested, I won’t go into too much detail here. Once the code is compiled, which can and should be done ahead of time, the time savings during execution are significant. Here’s some benchmarking I’ve done to time a relatively simple template with a ten thousand iteration <$ repeat $> loop:
Interpreted:
0.969569s - %100.000000
0.796563s - %82.156419 (-%17.843581 compared to base)
0.900937s - %92.921402 (-%7.078598 compared to base)
Compiled:
0.365500s - %37.697161 (-%62.302839 compared to base)
0.498571s - %51.421914 (-%48.578086 compared to base)
0.367616s - %37.915399 (-%62.084601 compared to base)
In some cases, pre-compiling the template can take a third as much time as interpreting the template directly. Almost every timing I’ve seen is around 50% or better of the interpreted time.
This is a very new feature and I haven’t added it to the test suite yet. Expect rough edges as I play with it and optimize it. If you’re doing a lot of templating with Rosella, this feature could help save some time for you and plus it was a really fun thing to work on.
You might not see it, but I’m starting to get very excited. Discussions about the Google Summer of Code program is starting up for the 2012 summer. Projects in years past have lead to some awesome developments in Parrot, either directly or indirectly, and 2012 could easily deliver more.
For prospective students, here are some blog posts I’ve written in years past about the process, and what you need to do to get started:
If you’re interested in participating in GSoC this year, with Parrot or any other organization, I suggest you read those two posts. They’re important. Many students in school or freshly graduated may know how to code but might not know much about some of the tangential topics: source control (git, in the case of Parrot), documentation, unit testing, refactoring, etc. Now would be a great time to start brushing up on all those topics, so you don’t waste time during the summer getting your tools in place.
As usual I will try to post some ideas for projects here on this blog. If you are an eligible student and you are interested in one or more of these ideas please get in touch with me or other interested Parrot developers. If you have other ideas that I don’t mention, that’s cool too. Get in touch anyway and we can start talking about those ideas. The important thing is to talk to us. Seriously, it’s important.
Here are a few ideas off the top of my head that might be worth some more investigation. I might write additional posts about these ideas if people want more information about them:
These are just a few of the ideas I have on the top of my head this morning. Some, I’m sure, are too big. Others are too small. But in each is the kernel of a good idea and if anybody reading this is interested we should start the conversation now to get these vague ideas focused into compelling proposals.
Earlier this month I released the new Reflect library in Rosella. I hadn’t mentioned it before, but the library is sufficiently interesting that I want to talk about it at least a little bit. The Reflect library adds in tools for reflection. Somewhere, an etymologist weeps a tear of joy for the creative naming, I’m sure.
The Reflect library adds in wrappers for classes and packfiles that makes them easier to work with for many operations. First, I’d like to use a couple code examples to show the most basic API:
// Get the Sub PMC that we're currently executing
var s = Rosella.Reflect.get_current_sub();
// Get the current context
var c = Rosella.Reflect.get_current_context();
// Get the current object, if the current Sub is a method call
var obj = Rosella.Reflect.get_current_object();
// Get the class of the current object, if the current Sub is a method call
var c = Rosella.Reflect.get_current_class();
// Get a Module object for the packfile where the current Sub is defined
var m = Rosella.Reflect.get_current_module();
// Get a reflection wrapper object for the given Parrot Class PMC
var r = Rosella.Reflect.get_class_reflector(myClass);
// Get a Module object for the packfile in "foo/bar.pbc", loading it as
// necessary
var m = Rosella.Reflect.Module.load("foo/bar.pbc");
That’s the basic API that the library provides to get basic information about where execution is happening at the moment when the call is made. Once you have a Module object or a Class reflector object, you can do all sorts of cool things that used to be a pain in the butt to do manually:
var m = Rosella.Reflect.get_current_module();
say(m); // Stringified, produces the name and version of the packfile
m.load(); // Execute all :tag("load") and :load functions
n.init(); // Execute all :tag("init") and :init functions
say(m.version(); // Get the version string of the packfile "X.Y.Z"
say(m.path()); // The on-disk path to the current packfile
// Get a hash of all Class PMCs defined at compile-time (using the :method
// flag on Subs) defined in the packfile, keyed by name
var c = m.classes();
// Get a list of all non-:anon functions defined in the packfile
var f = m.functions();
// Get a hash of all non-:anon functions in the packfile, organized into
// a hash keyed by namespace
var f = m.functions_by_ns();
// Get a hash of all NameSpace PMCs defined at compile-time
var ns = m.namespaces();
Once you have Class and NameSpace PMCs from the packfile, you can start to do all sorts of cool operations and analyses on them. Once you have a Class reflector object, you can do even more stuff with that:
var c = Rosella.Reflect.get_current_class();
// Create a new object of the current type
var o = c.new();
// Say the name of the class
say(c.name());
// Attributes are encapsulated as objects. You can get an Attribute
// reflector and use it later to get and set values on objects of this
// type or subclasses
var attr = c.get_attr("foo");
var value = attr.get_value(o);
attr.set_value(o, "whatever");
// Methods are also encapsulated. You can get a method reflector now and
// invoke it on objects later (including objects of different types)
var method = c.get_method("bar");
var result = method.invoke(o);
var meths = c.get_all_methods();
// Basic capability detection. Determine if objects are members of the
// class or their subsets, and determine if the class can perform certain
// methods
if (c.isa(o)) { ... }
if (c.can("bar")) { ... }
I hope the code examples make up for the terse explanations.
The Reflect library is currently focused on reading data from things like Classes and Packfiles, not on creating these things like the new PACT project is supposed to do. I want to extend this library even further with abilities to further introspect functions down to the opcode level and then…Well, when we have a stream of opcodes to analyze the possibilities are endless. I’d also like the ability to get better introspection of the interpreter and global state, though a cleaner interface than the hodge-podge of interpinfo opcodes and ParrotInterpreter PMC methods and whatever else we currently use.
As always, using the interface Rosella provides will help to insulate you from changes to the various underlying mechanisms when we finally get around to cleaning them up and making them sane. There isn’t a huge push to make such cleanups on a large scale yet, but I wouldn’t be surprised if a few things started getting prettified in the coming months at a slow pace.
I’ve already started using the new library in several of the Rosella utility programs such as those that create a winxed header file or a test suite from an existing packfile. In all cases the updated programs are both cleaner and have more functionality than the previous incarnations. Expect to see this library improve and grow in 2012 and beyond, and expect to see it work closely with PACT, once that project gets moving forward.
The first half of this month was dominated with some epic illnesses between my family members and myself, family functions and home maintenance. What little spare time I’ve had otherwise has been devoted to writing code, as opposed to writing blog posts about writing code. The blog has suffered.
The past couple days I’ve been working on Rosella’s Test library. It’s an old but good library and is, as far as I am aware, the most full-featured and easy to use testing tool in the Parrot ecosystem. With some of these most recent changes the library is better still.
Kakapo had a series of Matcher routines and objects as part of it’s testing facilities, and for a long time I’ve been wanting to port some of those ideas over to Rosella. As of last week, I have a simple version of them. Matchers allow you to ask the question “is this thing like that thing”, with a custom set of rules. Let me give a basic example.
Previously in Rosella if you were unit testing a method which returned an array and you wanted to check that the array contained the right values, you would have to do something like this:
var result = obj.my_method();
var expected = [1, 2, 3, 4, 5];
self.assert.is_true(result != null);
self.assert.equal(elements(result), elements(expected));
for (int i = 0; i < elements(expected); i++) {
self.assert.equal(result[i], expected[i]);
}
That’s a lot of work, although you can cut it down a little bit if you know for certain that the array isn’t null. With the new matcher functionality, you pass in two arrays and the Test library will match them for you:
var result = obj.my_method();
self.assert.is_match(result, [1, 2, 3, 4, 5]);
Internally the Test library maintains a list of matchers by name. When you pass in two objects, it loops over the list looking for a matcher that can handle the pair. In this case, one of the default matchers the library provides looks for objects which implement the "array" role, and then does element-wise matching on them. Another similar matcher does the same for hash-like objects that implement the "hash" role.
Another matcher checks to see if one or both of the two objects are strings, and then does a string comparison on them (converting the other, if it isn’t a string already) and the last of the default matchers is used to compare floating point values with a certain error tolerance.
Since matchers are stored in a hash, you can access them by name, delete them, add your own, and replace existing ones if you want new matching semantics. This is especially useful in something like Parrot-Linear-Algebra, where I can say
$!assert.is_match($matrix_a, $matrix_b);
…and the library will automatically compare the dimensions of the matrices and the contents of them without needing nested loops and other distractions.
Another item I’ve had on my wishlist for a while now has been nested TAP. I’ve always wanted to support it, and in theory at least the system was designed modularly enough to generate it without too much hassle. Last weekend I put on the finishing touches and now am proud to say that Rosella.Test can run nested tests and generate nested TAP. At the moment the interface to use it is a little ugly (I’m actively soliciting feedback!), but the capabilities are all there:
function my_test_method()
{
self.status.suite().subtest(class MySubtestClass);
}
function my_vector_test_method()
{
self.status.suite().subtest_vector(
function(var a, var b) { ... },
[1, 2, 3, 4, 5]
);
}
function my_list_test_method()
{
self.status.suite().subtest_list(
function(var test) { ... },
function(var test) { ... },
function(var test) { ... }
);
}
Here’s an example of output from a similar test file in the Rosella suite:
1..4
1..2
ok 1 - test_1A
ok 2 - test_1B
# You passed all 2 subtests
ok 1 - test_1
1..3
ok 1 - test 1
ok 2 - test 2
ok 3 - test 3
# You passed all 3 subtests
ok 2 - test_2
1..5
ok 1 - test 1
ok 2 - test 2
ok 3 - test 3
ok 4 - test 4
ok 5 - test 5
# You passed all 5 subtests
ok 3 - test_3
1..1
not ok 1 - test 1
# failure
# Called from 'fail' (rosella/test.winxed : 481)
# Called from '' (t/winxed_test/Nested.t : 40)
# Called from '' (rosella/test.winxed : 1589)
# Looks like you failed 1 of 1 subtests run
ok 4 - test_4
# You passed all 4 tests
The fourth test expects a failure in the subtest, which is why it says it passes when there is clearly some failure diagnostics appearing. This brings me to my next point…
Before when you ran a test and had a failure, you might see something like this:
not ok 2 - ooopsie_doopsies
# objects not equal '0' != '1'
# Called from 'throw' (rosella/test.winxed : 851)
# Called from 'internal_fail' (rosella/test.winxed : 1853)
# Called from 'fail' (rosella/test.winxed : 481)
# Called from 'equal' (rosella/test.winxed : 577)
# Called from 'ooopsie_doopsies' (t/core/Error.t : 18)
# Called from 'execute_test' (rosella/test.winxed : 1455)
# Called from '__run_test' (rosella/test.winxed : 1483)
# Called from 'run' (rosella/test.winxed : 1392)
# Called from 'test' (rosella/test.winxed : 1747)
# Called from '_block1000' (t/core/Error.t : 7)
# Called from '_block1177' ( : 158)
# Called from 'eval' ( : 151)
# Called from 'evalfiles' ( : 0)
# Called from 'command_line' ( : 0)
# Called from 'main' ( : 1)
# Called from '(entry)' ( : 0)
That’s a huge mess, and it’s a mess from two sides. At the top of the backtrace, you see all sorts of Rosella internal functions involved in the assertion and error handling. The bottom half of the backtrace is devoted to entry-way stuff. In this case there’s NQP-related entry code and then the Rosella entry code. You, as the test writer, don’t care about any of that. All you care about is the code you wrote and where its broken. If you have to dig through a huge backtrace to figure out where the error is, that’s a big waste of time and effort.
Now, Rosella filters that crap out for you. Here’s the same exact failure with the new backtrace reporting:
not ok 2 - ooopsie_doopsies
# objects not equal '0' != '1'
# Called from 'equal' (rosella/test.winxed : 577)
# Called from 'ooopsie_doopsies' (t/core/Error.t : 18)
Here you see the important parts of the backtrace only: The parts you wrote and the one assertion that failed. You don’t see the internal garbage, you don’t see the entry-way garbage, because those things aren’t of interest to the test writer.
Another small project I did a few days ago was getting the PLA test suite working again. It’s a testament to how stable both BLAS and Parrot’s extending interfaces are. Recent Rosella refactors removed some of the special-purpose features that existed only for PLA and for no other reason (and which were a pain in the butt to maintain). I fixed up the test suite and PLA builds and runs perfectly now.
That’s what I’ve been up to this month. I’m mostly done with my cleanups to the Test library now, barring a few more interface improvements I want to make. After that I’ve got a few projects to tackle inside libparrot itself. I’ll write more about those topics when I have something to say.
At one extreme, it is possible to approach the subject on a high mathematical
epsilon-delta level, which generally results in many undergraduate students not
knowing what's going on. At the other extreme, it is possible to wave away all
the subtleties until neither the student nor the teacher knows what's going on.-Stanley J. Farlow, Preface to Partial Differential Equations for Scientists and
Engineers
On behalf of the Parrot team, I'm proud to announce Parrot 4.0.0,
also known as "Hyperstasis". Parrot
At one extreme, it is possible to approach the subject on a high mathematical epsilon-delta level, which generally results in many undergraduate students not knowing what’s going on. At the other extreme, it is possible to wave away all the subtleties until neither the student nor the teacher knows what’s going on.
-Stanley J. Farlow, Preface to Partial Differential Equations for Scientists and Engineers
On behalf of the Parrot team, I’m proud to announce Parrot 4.0.0, also known as “Hyperstasis”. Parrot is a virtual machine aimed at running all dynamic languages.
Parrot 4.0.0 is available on Parrot’s FTP site, or by following the download instructions at http://parrot.org/download. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Git to retrieve the source code to get the latest and best Parrot code.
Parrot 4.0.0 News:
- Core
+ Several cleanups to the interp subsystem API
+ Cleanups and documentation additions for green threads and timers
+ Iterator PMC and family now implement the "iterator" role
+ A bug in Parrot_ext_try was fixed where it was not popping a context correctly
- Documentation
+ Docs for all versions of Parrot ever released are now available
at http://parrot.github.com
- Tests
+ Timer PMC tests were converted from PASM to PIR
The SHA256 message digests for the downloadable tarballs are:
a1e0bc3de509b247b2cea4863cc202cdceeaa329729416115d3c20a162a0dd88 parrot-4.0.0.tar.bz2
a63d45f50f7dd8ba76395cd2af14108412398ac24b8d827db369221cdb37fada parrot-4.0.0.tar.gz
Many thanks to all our contributors for making this possible, and our sponsors for supporting this project. Our next scheduled release is 21 February 2012.
Enjoy!
The Advent calendar idea ended up terribly. Let us never speak of it again. The holiday season was particularly busy this year with a higher-than-average load of family- friend- and work-related activities. Combine that with an unexpected (and absolutely unappreciated) invasion of a particularly unpleasant and long-lasting stomach bug, and you have something of a perfect storm. I won’t go into the details any further on this particular blog, but I will mention in passing that I’ve become extremely suspicious about the basic hygene habits of the other little turdbags that my son goes to daycare with.
To start 2012 off, and to get back into the swing of coding-for-pleasure, yesterday I went through and got the new Rosella Date library ready for prime time. The library is imperfect and incomplete, but those are things that can be fixed using the patented Andrew Style (tm) of meandering iterative development. For now, however, the library does seem to work well enough for basic tasks and it’s already proven to be a valuable tool for some tasks. Here I’m going to introduce the new library and talk about some of the things I changed in Rosella to support it and some of the ways I’ve already integrated it into the rest of the collection.
When you use sprintf to print out an integer (for instance) you can use some basic modifiers to control how the value is printed. %d prints out the basic version, but you can specify field width, padding, alignment and a few other details by using a format specifier like %-02d. Or, if you want to print the same value out as hex instead of base-10, you can use %x or %X, and use modifiers with those as well.
The problem with something more complex like a date/time representation is that sprintf is not able to handle them natively and some kind of mapping is needed.
Rosella has added a new StringFormatter type to the Core library to help with this problem. A StringFormatter is a type that takes an object and a format string and outputs a new string according to the two. The default StringFormatter uses sprintf internally, but other formatters may use different mechanisms and syntaxes.
var sf = new Rosella.StringFormatter();
string s = sf.format(my_obj, "This is %s");
As an aside, this does demonstrate the fact that our get_string vtable really is insufficient for a lot of purposes. I suggest that get_string should take a parameter for a format string. We could easily incorporate that into the default sprintf implementation like this:
$S0 = sprintf "%{foobar}p", $P0
In that invocation, the get_string vtable on the first parameter would be called with the string argument “foobar”. A normal invocation like this:
$S0 = sprintf "%p", $P0
…would call get_string with a null format and the behavior could be whatever the default string representation for that type is. By overriding get_string in your types to take different formats or to respond to common formats differently, you could have pretty detailed control over stringification at all levels.
The new Date library provides a Date type for working with dates and times. You can create one in three ways:
var d = new Rosella.Date(t);
var d = new Rosella.Date(year, month, day);
var d = new Rosella.Date(year, month, day, hour, minute, second);
The first uses a system-specific time integer value to represent a time since the system epoch. This is the kind of value you get from the time opcode, or from stat calls on filesystem objects, for instance. In the third option, hours are specified on a 24-hour clock.
There are also a few functions you can call to get particular dates:
var d = Rosella.Date.now();
var n = Rosella.Date.min();
var x = Rosella.Date.max();
The first value should be the current date/time (as gotten from Parrot’s time opcode). The second value is a minimum date object which corresponds to the minimum possible date value to display and is guaranteed to be evaluated as less than any other date. The third one similarly is a maximum date value which corresponds to the maximum possible date and is guaranteed to always be compared greater than any other date.
Dates are immutable. Once you create them, they cannot be modified in-place. Instead, several operations are provided to perform operations and return new Date objects with the results. Here are some examples:
var d = Rosella.Date.now();
var e = d.add_seconds(20);
e = d.add_hours(15);
e = d.add_months(24);
e = d.add_years(1000);
In each case, the variable e becomes a new date value and d is left unmodified. Two other methods let you pick out just the date components or just the time components:
var n = Rosella.Date.now();
var d = n.date();
var t = n.time();
Finally, you can use a new DateFormatter type to format the date value into a proper stringification:
var d = Rosella.Date.now();
string s = d.format_string("yyyy - MM - dd and the time is: hh:mm:ss");
At the moment the formatter is dirt simple and only supports a few formatting codes such as yyyy, MM, dd, hh, mm, and ss. I will be making it much more useful in the coming days, if I can settle on an algorithm which doesn’t completely stink.
The FileSystem library now returns Date objects from certain file-time methods:
var f = new Rosella.FileSystem.File("t/harness");
var ct = f.change_time();
var at = f.access_time();
var mt = f.modify_time();
In each case, the value returned from the stat call on the file is used to create a new Date object.
Dates are completely comparible, so you can sort them and work with them like other values in an iterable:
var a = [
Rosella.Date.now(),
Rosella.Date.max(),
Rosella.Date.min()
];
Rosella.Query.iterable(a)
.sort()
.foreach(function(d) { say(d); });
This toy example sorts the three Date values, with the min first, then now, then max, and prints them out to the console. The stringified versions of the special min/max dates aren’t particularly instructive, but they do get the point across.
So that’s the new Date library. It does need more functionality and definitely needs more tests, but it is working pretty well for me now and has already proven itself useful for a number of purposes. Expect to see more of it in 2012.
This blog is to announce the completion of placing all of Parrot's documentation on 'parrot.github.com'. The documentation ranges from the present version (i.e., v3.11.0) to Parrot's release v0.0.6(0)[1]. To view the documentation, please navigate your preferred browser to http://parrot.github.com and select (or click) the "Parrot Documentation Releases (3.10.0 - 0.1.1)" link.
Thank you.
----------
Bailador?is growing better and bigger and starts to resemble a real tool more and more. Let’s see what new features it has gained recently.
Remember the first example in perldoc Dancer? It’s not really different in Bailador:
use Bailador;
get '/hello/:name' => sub ($name) {
return "Why, hello there $name"
}
baile;
Aside of being a little Spanished, what did it give us? We have subroutine signatures in Perl 6 so we can pass the :name parameter to the sub; there’s no need to use param() now: it’s gone.
You don’t need to pass everything using GET of course. post keyword is also supported.
post '/' => sub {
return request.params.perl
}
The above will print something like ("foo" => "bar").hash, if fed with appropriate request.
any() is a reserved keyword in Perl 6, and while you can use it, it means a completely different thing. Instead of any('get', 'post') you can just do it like this:
get post '/' => sub {
if request.is_get {
return "I am GET"
} else {
return request.params.perl
}
}
post, as well as get return their arguments, so you can chain them like in the example above. It also shows the joy of request object, which you can use to inspect the request being processed. It’s not as cool as Dancer::Request, but it does the job, being quite small and simple.
What else do we have? Let’s show off a bit and write a simple-simple pastebin webapp.
use Bailador;
unless 'data'.IO ~~ :d {
mkdir 'data'
}
get '/' => sub {
template 'index.tt'
}
post '/new_paste' => sub {
my $t = time;
my $c = request.params<content>;
unless $c {
return "No empty pastes please";
}
my $fh = open "data/$t", :w;
$fh.print: $c;
$fh.close;
return "New paste available at paste/$t";
}
get /paste\/(.+)/ => sub ($tag) {
content_type 'text/plain';
if "data/$tag".IO.f {
return slurp "data/$tag"
}
status 404;
return "Paste does not exist";
}
baile;
Holy cow, what’s that! Let’s go there piece by piece. First, we’ll create a data directory if it doesn’t already exist. No black magic here, let’s proceed. What’s next? Templates! Here we just load index.tt, not passing any parameters, but that works too and some example apps use that in their example templates.
The handler of new_paste?uses our well-known request object again, and creates a new file for a paste, identified by the current time.
The last get block uses some nifty features, so let’s take a look. It uses regexes, and you can see that they also cooperate with subroutine parameters without black magic. We then set a content_type as we’ll do in Dancer, and send status 404 if no paste have been found. Easy peasy? I suppose so. That’s it, it works like a charm.
Thus we’ve covered all the features in Bailador as for now. I don’t think it’s that poor, as for about 100 lines of code.
What’s next? What’s missing? You tell me. Or you contribute; the code is dead simple and implementing stuff like before(), after(), before_template() etc should be a matter of 3-5 lines, I think. Feel encouraged to look into the code and hack on it. If you have any questions, suggestions or criticism, don’t hesitate to tell, or poke me on #perl @ Freenode. Have fun!
Over the summer of 2011 we had a GSOC project that was trying to build a JavaScript compiler for Parrot. That project made some interesting progress but ultimately I think the approach ended up being flawed. That project attempted to convert JavaScript into PIR code, which is a pretty big gap in terms of both syntax and semantics.
Towards the end of the summer, after it was too late to go back and start from the beginning, I had a different idea: What if we tried to generate Winxed code instead of PIR code? Winxed would handle things like register allocation, and the Winxed syntax is so similar in some places that the translation could almost be verbatim copying. I put those ideas together with node.js and the cafe compiler and Jaesop was born.
Jaesop intends to be a full JavaScript on Parrot compiler. The plan is to write as much of it in JavaScript itself as possible, and bootstrap upwards. The first component, “stage 0” uses node.js to run a converter that compiles Javascript code into winxed. That’s the only part we have working right now, but what we do have is working pretty well.
In 2012 I want to get the next component, “stage 1” working. Stage 1 will use the stage 0 compiler to compile itself. It will be written in JavaScript (translated to Winxed en passant) and will be running entirely on Parrot. My goal for stage 1 is to be generating bytecode directly instead of generating either Winxed or PIR as an intermediary. That’s going to require some serious help from a compiler toolkit of some variety and I have very high hopes for using PACT for this purpose when it’s ready.
Stage 0 works very well and passes a small but interesting test suite I’ve set up for it. It does not self-compile, but there are only a few relatively small things standing in the way. For instance, regular expression support and pcre bindings are not complete yet, and the grammar currently requires semicolons at the end of statements but the code generated from Jison does not always contain semicolons. I also haven’t built in the require() function, which is used by the stage 0 compiler to load in the various code files. These are all small issues and with a small amount of work I expect stage 0 to be able to self-host. Whether I want to expend that effort or focus attention on stage 1 instead is a different question entirely.
In 2012, once PACT has matured and 6model has been integrated into Parrot I expect to get back to work aggressively on Jaesop. I’m looking for helpers too, in case anybody reading this wants to get involved in the development of a new JavaScript compiler. I’ll put out more of a call to arms when the prerequisites are in place and we’re ready to get the wheels turning on stage 1. I estimate that by spring or early summer we should be ready to get started on the next phase of Jaesop development.
I had already written a post about Rosella for my woefully inadequate advent calendar, but it got lost to the sands of time (I think it’s on my other computer, committed but not pushed). Considering that I should have been on schedule to be on post #21 or #22 by now but am instead only on #8, I can’t really afford to not write a post when I have the ideas inside me. This post is also going to be about Rosella, and if I ever find the first one I may post that as well. I clearly haven’t had enough spare writing time this month to be even remotely picky about what I post or when.
I’m thinking I might like to try this little experiment again later in the year, when I’ve had more time to prepare and have fewer other things in real life demanding my undivided attention. Maybe we’ll shoot for some kind of christmas-in-July thing. Until then, I’ll happily conceed the “best Advent Calendar” crown back to moritz and the Perl peoples.
Rosella is a library project I started as a way to let me work on lots of different project ideas I had without having to duplicate build and test infrastructure. The goals from the project were solidified pretty early in the project lifecycle and have remained unchanged ever since: To provide solutions to common developer problems, in pure Parrot, in ways that are portable, reusable, and configurable. I’ve already talked about three of the oldest and most important libraries in the set: Test, Harness and MockObject on Day 5 of this Advent calendar. My other written-but-not-published post talked about three additional libraries as well: Query, FileSystem and Proxy. To avoid much duplicate effort in case I do ever find that post again, I’ll write about a few different libraries: Container, Random and Template.
The Container library is one of the oldest libraries in Rosella. It is an implementation of a dependency injection container for Parrot, and originally lived in it’s own Parrot-Container repository on github. My other library, for unit testing was also a separate repo. One day I was thinking about how I might like to use the Parrot-Container project to manage dependency injection in the Harness library, and trying to figure out how I wanted to manage the software dependencies. I wanted to use the Container library in the implementation of the unit test Harness, but I wanted to use the Harness and Test libraries to implement the test suite for the Container library repository. Combine that with a bunch of other ideas for new libraries I had brewing in the back of my mind and Rosella was quickly born.
The Container library is not quite as easy to use as a counter-part in the .NET or JVM environments because things are not always strongly typed, especially not at compile time or early in runtime when the container is initializing but the various bits of initialization logic in the program have not yet run. The Rakudo folks with their fancy object system and notion of gradual typing might have a different idea, but in terms of pure-parrot code, as Parrot exists right now, we can’t query a Sub PMC and ask it what types of parameters it expects to receive. For people who might be familiar with other dependency injection (or “inversion of control”) systems like Unity or Ninject, the syntax for setting up Rosella’s container type might seem a little verbose. It’s plenty powerful (and I have plenty of plans to upgrade things as Parrot’s threading support and object system improve in 2012 and beyond), but it is verbose:
var container = new Rosella.Container();
container
.register(class Foo)
.register(class Bar,
new Rosella.Container.Resolver.TypeConstructor(class Bar, "Bar", 1, 2, 3)
new Rosella.Container.LifetimeManager.Permanent()
)
.register(class Baz,
new Rosella.Container.Resolver.TypeConstructor(class Baz, "BUILD"
new Rosella.Container.Argument.Resolve(class Foo),
new Rosella.Container.Argument.Resolve(class Bar)
)
new Rosella.Container.Option.Attribute("my_attr", "value"),
new Rosella.Container.LifetimeManager.Thread()
)
.alias(class Baz, "Baz");
var b = container.resolve(class Bar);
The part that is significantly more verbose here than you would expect in Unity for example is the part where I have to specify the types of arguments to pass to the constructor. In a platform like .NET, the container can read the type metadata from the constructor object itself and make that decision. In Parrot, since we currently don’t make that kind of information available, you must specify. In Rosella’s offering you do have many of the features that you would expect from one of the more well-known containers: the ability to specify registration lifetimes, the ability to initialize objects by making method calls and setting attribute values after construction, the ability to specify global singleton instances, etc. It is a pretty cool tool, but I haven’t yet made as much use of it as I would have liked. In 2012 I expect to start integrating the container library into some of the other libraries, such as Harness, CommandLine and Template, to make user configuration easier and more straight-forward.
The Random library is a relative newcomer to Rosella, but is already demonstrating its usefulness in a variety of ways. The Random library was born from a few ideas I had turned into GCI tasks. An intrepid young student wrote an implementation of the Mersenne Twister algorithm for me in Winxed, and I set about writing up several other components such as a Box-Muller transform, a UUID generator, a Fisher-Yates array shuffler and a few other things. Now, you can do cool things with random numbers:
var prng = Rosella.Random.default_uniform_random()
int r = prng.get_int();
var uuid = Rosella.Random.UUID.new_uuid();
string id = string(uuid);
var a = [1, 2, 3, 4, 5, 6, 7, 8];
Rosella.Random.shuffle_array(a);
Implementations of things aren’t all perfect and there are a handful of bugs to be worked out (especially bugs resulting from arithmetic differences between 32-bit and 64-bit machines) but it is already very usable and reliable for the most part. If you need a random number generator, or a UUID generator, or other random-related things, this is a very nice tool to have available.
The Template library is something I wanted to write for a long time, but had to wait until all the prerequisites were in place first. Template is a text-templating engine library. You create an Engine object and feed it two basic pieces of information: The string template to use and the data context object to fill in the blanks. Presto-chango, out comes the complete text.
The Template engine can execute basic logical operations depending on the values in the data context, it can compile and execute inline snippets of code, it can load and assemble pieces of template from separate files, and do several other things that you would expect a templating library to do. I could write many examples of templates and their use, but I’ll stick with only one small example here for brevity:
Template:
Let's learn about <%
var bar = context["animals"];
return elements(bar);
%> new animals!
<$ for sound in animals $>
The <# __KEY__ #> says <# sound #>
<$ endfor $>
Code:
string template = ...;
var context = {
"animals" : {
"Cow" : "Moo",
"Bear" : "Growl!",
"Cat" : "I CAN HAZ CHEESEBURGER"
}
};
var engine = new Rosella.Template.Engine();
string output = engine.generate(template, context);
And the end result would be something like this (minus hash ordering concerns):
Let's learn about 3 new animals!
The Cow says Moo
The Bear says Growl!
The Cat says I CAN HAZ CHEESEBURGER
Rosella eats plenty of it’s own dogfood here. Several test files in the Rosella test suite and a few boiler-plate source code files are generated using the Rosella Template library. I’m planning to start using it to generate skeleton files for Rosella documentation as well. In the future, I may also use it to help with generating content for this very blog!
In 2012 Rosella is going to be adding a bunch of cool new stuff. Considering how far Rosella has come by now and the fact that it’s less than a year old, it’s kind of hard to speculate where we will be next year around this time. I’m planning to add a new Date/Time library within the next few weeks. I’m also planning a new reflection/packfile library, a benchmarking library, a code assertions library, and rewrites to several of the existing libraries to add new functionality and optimize performance in some key ways. Those are only my plans for the first two or three months of 2012!
When you want to find something on the internet, Google is a pretty popular tool to use. When you want to get some code written, Google turns out to also be a pretty good idea.
Google has been doing it’s Summer Of Code (GSOC) program for several years now and every summer it’s a smashing success. Last year they started up with a new program called the Google Code In (GCI). That program, while wildly different from GSOC, is also incredibly awesome. Every year Parrot receives a huge amount of code from these programs, and we would not be where we are today without them.
The Summer of Code program is very straight forward: Every year we receive applications from college age and some highschool-aged students for projects. Over the course of the summer the accepted applicants work diligently and at the end, if they are successful, they get paid. Also there are cool T-shirts. The GCI program is aimed at younger students, mostly those in high school. Instead of one large project spanning several months, the GCI students work on a large number of bite-sized tasks from many organizations. Each task is worth points, and the students who have the most points at the end of the program receive a prize. Also, I think there are monetary rewards for completing a certain number of tasks.
Last year we had a huge number of tasks completed by several GCI students. We had many bits of code written or re-written, some new documentation written, and many many tests added. Our project-wide test coverage metrics increased dramatically, since we had several tasks devoted to test coverage and several talented young coders who were chasing them down. Once you figure out how to write tests, subsequent tasks go much more quickly. We had some students who, after getting the hang of things, were able to complete several tasks per day at the end.
I heard one or two accusations that Parrot was inflating point values by offering tasks that could be completed so quickly. I pointed out that many tasks were considered very hard by students at the beginning, but as their familiarity of the code increased the relative difficulty decreased. It wasn’t a problem of Parrot’s tasks being too easy, but the students learning and improving much faster than we could keep up with. By the end of the program we were learning that the difficulty really needed to ramp up over the course of GCI, because students would be capable of much more towards the end than they would be at the beginning.
This year things are going a little bit more slowly. We have fewer of the “increase test coverage” tasks, because our test coverage is still so high from last year in the core VM. I’ve scoured several potential sources of tasks including searching for TODO notes in the Parrot source code, scanning through our myriad of trac tickets, digging into ecosystem projects (Plumage and Rosella especially) and begging other contributors for task ideas. Already we’ve seen some very useful bits of code contributed to the project, including new features and impressive cleanups and refactorings of old crufty code.
GCI this year is essentially closed off. There were two opportunities to publish new tasks and those have both passed. Students are slowly working their way through the remaining tasks in the queue until the program ends in a few weeks. However, next year I’m sure we’re going to see both another GSOC and another round of GCI (At least, I hope we see them, since these are both so awesome!).
Despite being months away we’re already starting to look for project ideas and potential applicants for GSOC. Getting good ideas picked early allows us time to refine them, and getting familiar with potential applications now helps us to learn more about them and be more comfortable with them when time comes to make final selections. GCI doesn’t require so much preparatory work, but it would be nice if we went into next year with a larger pile of varied tasks for students to work on, instead of needing to scramble to create tasks in sufficient numbers at the last minute.
GSOC and GCI have both been amazingly successful programs in the past and I am hoping that trend continues into the future. Parrot has benefited from both programs to an amazing degree, and with a little bit of luck and a lot of planning we can keep the train moving into 2012 and beyond.
If you’re a highschool or college-aged coder, or know somebody who is, and would like to talk about getting involved with either program in 2012 (especially if you would like to work with Parrot specifically!) please let me know and I can make sure you get pointed in the right direction.
In late 2010 and early 2011 we spent a good amount of effort building a new embedding API for Parrot. I would like to say that the new API replaced an older, inferior API but that’s not really the case. We didn’t really have an old embedding API per se. We had a mismash of functions in a file called embed.c, but they hardly represented a consistent API, much less a complete set of things that an embedder would need. If anything the old embedding API was the entirety of all publicly exported functions from libparrot combined with a handful of utility functions that embedders in the past also needed.
In short, it was a mess. By early 2011 we had a much nicer API around to play with. Now that 2011 is almost over, the new API is considered to be extremely stable and robust.
Last December I started a project called ParrotSharp which embeds a Parrot Interpreter into the .NET CLI with C#. I haven’t shown that project too much love in recent months, but as of today it’s still building and seems to run correctly (although my IDE is telling me it can’t find NUnit on my system, so it won’t run my tests). That has to tell you something, when code I wrote months ago with the embedding API still works correctly even though so many things have changed in Parrot since then.
Parrot’s embedding API is a little bit verbose but very easy and straight forward to use. Also, all API functions return a true/false status value, so calls can easily be chained together. Here is an example of the embedding API in action:
int main(int argc, char** argv) {
Parrot_PMC interp, bytecodepmc, args;
Parrot_Init_Args *initargs;
Parrot_String filename;
GET_INIT_STRUCT(initargs);
if (!(
Parrot_api_make_interpreter(NULL, 0, initargs, &interp) &&
Parrot_api_set_executable_name(interp, argv[0]) &&
Parrot_api_pmc_wrap_string_array(interp, argc, argv, &args) &&
Parrot_api_string_import(interp, "foo.pbc", &filename) &&
Parrot_api_load_bytecode_file(interp, filename, &bytecodepmc) &&
Parrot_api_run_bytecode(interp, bytecodepmc, args)
)) {
Parrot_String errmsg, backtrace;
Parrot_Int exit_code, is_error;
Parrot_PMC exception;
Parrot_api_get_result(interp, &is_error, &exception, &exit_code, &errmsg);
if (is_error) {
Parrot_api_get_exception_backtrace(interp, exception, &backtrace);
// Print out exception information to the console, or whatever
}
}
Parrot_api_destroy_interpreter(interp);
exit(exit_code);
}
This is, essentially, a simple program to execute a bytecode file. However, it does show some of the basics of the embedding API. Every function returns a true/false, pass/fail status bit. All data types passed around are properly wrapped Parrot_PMC or Parrot_String types and it almost never uses any other raw pointer types. Also since we’re using PMC and STRING types, Parrot’s GC manages all the memory for you and you don’t need to be freeing things or cleaning things up (except for the interpreter itself).
This example above only shows a handful of API functions, but there are several dozen of them in the API and more can be easily added. We have API routines for performing a variety of actions on Strings and PMCs. We have API routines for loading, executing and writing bytecode. The API has decent defaults so you can just get an interpreter up and running quickly if you want, but it also have a variety of routines for tweaking and configuring the interpreter too. And, like I said (and I’ll say it a million times more if I have to) we can always add new methods to the embedding API if there is a need.
We have at least the basics of a C# wrapper projects in the wings, and I’ve been planning a proper C++ wrapper for a while too but I haven’t gotten around to it yet. That would make an excellent, smallish project for an intrepid newcomer to work on, especially one that knows C++. I like to think it should be easy to embed Parrot as a plugin for things like text editors or other pluggable unixy programs, but I haven’t taken the time to really dig into any of them yet. This might make another great project for an eager new parrot hacker.
In late 2010 and early 2011 we spent a good amount of effort building a new embedding API for Parrot. I would like to say that the new API replaced an older, inferior API but that’s not really the case. We didn’t really have an old embedding API per se. We had a mismash of functions in a file called embed.c, but they hardly represented a consistent API, much less a complete set of things that an embedder would need. If anything the old embedding API was the entirety of all publicly exported functions from libparrot combined with a handful of utility functions that embedders in the past also needed.
In short, it was a mess. By early 2011 we had a much nicer API around to play with. Now that 2011 is almost over, the new API is considered to be extremely stable and robust.
Last December I started a project called ParrotSharp which embeds a Parrot Interpreter into the .NET CLI with C#. I haven’t shown that project too much love in recent months, but as of today it’s still building and seems to run correctly (although my IDE is telling me it can’t find NUnit on my system, so it won’t run my tests). That has to tell you something, when code I wrote months ago with the embedding API still works correctly even though so many things have changed in Parrot since then.
Parrot’s embedding API is a little bit verbose but very easy and straight forward to use. Also, all API functions return a true/false status value, so calls can easily be chained together. Here is an example of the embedding API in action:
int main(int argc, char** argv) {
Parrot_PMC interp, bytecodepmc, args;
Parrot_Init_Args *initargs;
Parrot_String filename;
GET_INIT_STRUCT(initargs);
if (!(
Parrot_api_make_interpreter(NULL, 0, initargs, &interp) &&
Parrot_api_set_executable_name(interp, argv[0]) &&
Parrot_api_pmc_wrap_string_array(interp, argc, argv, &args) &&
Parrot_api_string_import(interp, "foo.pbc", &filename) &&
Parrot_api_load_bytecode_file(interp, filename, &bytecodepmc) &&
Parrot_api_run_bytecode(interp, bytecodepmc, args)
)) {
Parrot_String errmsg, backtrace;
Parrot_Int exit_code, is_error;
Parrot_PMC exception;
Parrot_api_get_result(interp, &is_error, &exception, &exit_code, &errmsg);
if (is_error) {
Parrot_api_get_exception_backtrace(interp, exception, &backtrace);
// Print out exception information to the console, or whatever
}
}
Parrot_api_destroy_interpreter(interp);
exit(exit_code);
}
This is, essentially, a simple program to execute a bytecode file. However, it does show some of the basics of the embedding API. Every function returns a true/false, pass/fail status bit. All data types passed around are properly wrapped Parrot_PMC or Parrot_String types and it almost never uses any other raw pointer types. Also since we’re using PMC and STRING types, Parrot’s GC manages all the memory for you and you don’t need to be freeing things or cleaning things up (except for the interpreter itself).
This example above only shows a handful of API functions, but there are several dozen of them in the API and more can be easily added. We have API routines for performing a variety of actions on Strings and PMCs. We have API routines for loading, executing and writing bytecode. The API has decent defaults so you can just get an interpreter up and running quickly if you want, but it also have a variety of routines for tweaking and configuring the interpreter too. And, like I said (and I’ll say it a million times more if I have to) we can always add new methods to the embedding API if there is a need.
We have at least the basics of a C# wrapper projects in the wings, and I’ve been planning a proper C++ wrapper for a while too but I haven’t gotten around to it yet. That would make an excellent, smallish project for an intrepid newcomer to work on, especially one that knows C++. I like to think it should be easy to embed Parrot as a plugin for things like text editors or other pluggable unixy programs, but I haven’t taken the time to really dig into any of them yet. This might make another great project for an eager new parrot hacker.
Ever since I started working with Parrot I’ve noticed something interesting about the community: They were very interested in unit testing. Parrot alone as a suite with over ten thousand tests (and I still feel like there are some portions of the VM that are heavily under-tested). When I first joined Parrot I had never written a unit test nor did I really understand the value of testing. I was a newbie fresh out of school where such practical aspects as that were never covered. Despite my unfamiliarity with testing, I very quickly decided it was a good idea and that more tests can definitely make software more awesome.
Writing tests for Parrot and Parrot-related projects is quite easy because we have the infrastructure for it. The easiest way to write tests, in my opinion, is with Rosella, but we have a Test::More library as well that can be used for great effect and is the primary testing tool used in Parrot’s own test suite.
Several months ago we had Tapir, a simple TAP harness project written by dukeleto and others in NQP. There was also a project called Kakapo written by Austin Hastings which included several unit test and mock object utilities, also written in NQP. I absorbed a lot of ideas from both projects (and eventually rewrote those ideas in Winxed) for the Rosella Test, Harness and MockObject libraries.
Writing tests for Parrot is a great way to get involved in the project if you’re a new user, a great way to get familiarized with the capabilities of the VM, and a very big benefit to the project in any case. First let’s talk about Test::More as it’s used in the Parrot test suite and then I’m going to talk about Rosella’s test offerings.
Test::More is a very simple TAP producer library that implements a few standard test functions like plan(), is(), ok(), and a few others. You use Test::More like this:
.sub main :main
.include 'test_more.pir'
plan(5);
ok(1, "This test passes");
ok(0, "This test fails");
is(1, 1, "These things are equal");
isnt(1, 0, "These things aren't equal, test passes");
is(1, 0, "These things aren't equal, so the test fails");
.end
The test harness reads the TAP output from the test file, checks the plan, checks the results of each individual tests, and gives you a readout of the overall pass/fail status of the test.
Test::More is very simple, and if you’ve used a TAP library before the basics of it should be very easy to grasp. Plus, if you look through the Parrot test suite (t/ directory and subdirectories) you’ll see plenty of examples on usage.
Rosella offers several test-related libraries as part of it’s collection: Test (a unit testing library), Harness (a library for building test harnesses) and MockObject (a mock-object testing extension) are all standard parts of the Rosella lineup. There’s also an experimental Assert library that adds some new testing features as well.
Rosella ships with a default testing harness called rosella_harness which is available when you install Rosella. You can run it at the commandline with a list of directories like this:
$> rosella_harness t/foo t/bar t/baz
The harness will run through all *.t files in the given directories, reading the she-bang line in the file and using that program to execute the file. This is the fastest way to get started with testing. Of course, you can also use the Harness library to build your own harness in only a few lines of winxed code:
$include "rosella/harness.winxed";
function main[main]() {
var harness = new Rosella.Harness();
harness
.add_test_dirs("Automatic", "t/foo", "t/bar", "t/baz", 1:[named("recurse")])
.setup_test_run(1:[named("sort")]);
harness.run();
harness.show_results();
}
The listing for a harness written in NQP is almost as short, and I’ve shown it several times on this blog before.
Writing a unit test file with Rosella Test is similarly easy:
$include "rosella/test.winxed";
class MyTest {
function test_one() {
self.assert.equal(1, 1);
}
}
function main[main]() { Rosella.Test.test(class MyTest); }
Each method in the MyTest class will be run as a test. Each test method may contain zero or more assertions. If all assertions pass, the test passes. If any assertion fails the whole test method immediately aborts and is marked as having failed. Unlike Test::More above, we don’t need to explicitly count the number of tests for plan(). Instead, the library counts the number of methods for us automatically.
Rosella’s MockObject library can be used together with the Test library to add MockObject support to your tests. Mock Objects, as I’ve said before and I’ll say a million times again in the future, are tools that can do as much harm as good, especially if they are used incorrectly. Here’s an example of a test using MockObject:
$include "rosella/test.winxed";
$include "rosella/mockobject.winxed";
class MyTest {
function test_one() {
var c = Rosella.MockObject.default_mock_factory()
.create_typed(class MyTargetClass);
c.expect_method("foo")
.once()
.with_args(1, 2, "test")
.will_return("foobar");
var o = c.mock();
var result = o.foo(1, 2, "test");
self.assert.equal(result, "foobar");
c.verify();
}
}
function main[main]() { Rosella.Test.test(class MyTest); }
You have to do more setup for the test with mockobjects, but you get a lot more flexibility to do black-box testing and unit testing with proper component isolation. I won’t try to sell mock object testing here in this post, only demonstrating that it is both possible and easy with Rosella.
Parrot has several thousand tests in its suite. Winxed has a small but growing test suite. Rosella currently runs “728 tests in 116 files”. parrot-libgit2 has a growing test suite. Rakudo has a gigantic spectest suite. Jaesop has a growing suite of tests. NQP has tests. PACT is going to have extensive tests, once it has code worth testing. Plumage has tests (though not nearly enough!). PLA has a relatively large suite of tests. Testing is a hugely important part of the Parrot ecosystem, and we currently have several tools to help with testing. Expect the trend to continue, with more tests being written for more projects in 2012.
In GSOC two years ago my student Chandon started working on a very cool new project for Parrot: Hybrid Threads. I hadn’t really put too much thought into the future of Parrot’s concurrency architecture until that point and the idea certainly seemed novel and interesting to me. We went ahead with the project not quite sure how far he would get but very eager to see something happen with our ailing threading system.
The problem, needless to say, ended up being much larger than a project for a single summer and by the end of it Chandon had most of a green threads implementation set up, but nothing that was quite mergable to master. His branch sat, un-merged and un-molested, for a year before anybody decided to take a second stab at it.
Several months after Chandon’s work ended I sat down and started seriously thinking about what I wanted Parrot’s concurrency system to look like in the future. If we could have anything we wanted, what is it exactly that we would want? I started by reading up on all sorts of other technologies: Erlang, node.js, stackless threads in python, and others. I even read a few tangentially related materials, like some of the problems with common string optimizations. Ideas in hand, I began writing a short series of blog posts describing my personal conclusions. My plan for concurrency was very close to Chandon’s hybrid threads idea but with a few extra details filled in. I suggested that we should stick with the hybrid threads approach, explicitly restrict cross-thread data contention by using messages and read-only proxies, avoid locking as much as possible, and rely on the immutability of certain data to keep things as simple as possible.
I didn’t have the time to work on all this myself, at least not until several other projects cleared my TO-DO list. This is where hacker nine comes in. Nine wanted to work on adding threading to Parrot and liked the hybrid threads approach and some of the ideas I had been working on. So he checked out a copy of Chandon’s green_threads work, updated to master, and started fixing things. A few weeks later green_threads was merged to master with Linux support only. I said I would work to port the system to Windows as well and nine would push forward with the hybrid portion of the hybrid threads design.
We’re not ready with that yet, but we are getting painfully close to a working system. I’ve been stymied by the house hunt and the fact that I don’t have a windows system at home to play with. Nine has had a few infrastructural difficulties, especially with GC, but he’s making great progress regardless.
Here’s an overview of the threading system we’re working towards in brief: We will have two layers of concurrency: The first are the Tasks, or “green threads”. Each Task represents an individual unit of executing work and multiple Tasks can execute together on a single thread. They do this through preemptive multitasking, the Parrot scheduler occasionally fires alarms and switches Tasks on the current thread if there is more than one in the queue. Since Parrot uses Continuation Passing Style internally, this mechanism is relatively simple to implement (It’s the alarms that are surprisingly difficult and not cross-platform, but the actual Task switching is quite simple after that).
Multiple Tasks running on a single thread gives more of an illusion of concurrency than the real thing, because you aren’t making use of multiple cores in your processor hardware or exploiting any context-switching optimizations at the lowest levels of threads. What we will have to take things to the next level is the OS threads implementation. Internally Parrot will maintain a pool of worker threads. When you create a Task you will have an option about where to dispatch it: On the current thread (useful where we need safe read/write access to PMCs on the same thread without locking), on a specific target thread, or on a completely new thread. Or, if you don’t want to specify you can let the scheduler dispatch the Task to the best thread.
When you think about what this kind of system enables, it’s actually pretty impressive: easy asynchronous IO (schedule an IO request Task on a dedicated IO thread), easy event handling (schedule a new Task on the current thread to keep data locality), easy threading (schedule a new Task, the scheduler will set up the Thread for you), easy eventing loops (main thread reads event sources and schedules Tasks), easy library callbacks (library callback schedules a Task in the owner thread), auto-threaded array operations (schedule tasks for subsets of the data, the scheduler puts the tasks on the threads with lowest latency) and a variety of other modern techniques. This system, once it’s completed and all the bugs are ironed out, will really be a big boost for Parrot in terms of feature set and usability.
I don’t know when we will be ready to merge, but I can’t imagine it will happen before the 4.0 release. Sometime in early 2012 expect to see the new hybrid threading implementation for Parrot, even if it only works on Linux initially. By the 4.6 release I expect we will have a pretty robust hybrid threading system available on all our target platforms, and several of our HLLs and libraries will be making use of it.
Already this advent series of posts is an unmitigated disaster. The last two days of ommissions can be squarely blamed on my wife and her parents, and last night I will blame on Septa (for eating 5 hours of my day because of “mechanical problems”). Here, finally, is day 3 of my miserable failure of an advent calendar.
For the third post in my almost-advent calendar, I’m going to talk about Winxed.
Winxed, written by long-time Parrot hacker NotFound, is quite an interesting project and is definitely worth learning more about. It serves as something of a counterpoint to NQP, the other lower-level Parrot language in the ecosystem but the two aren’t competing. I think they are very complimentary. Where NQP is designed to help building compilers (Rakudo Perl6 especially), Winxed seems much more geared towards writing libraries and utilities. It’s for this reason that I’ve written my library project, Rosella, in Winxed.
In return for making such an awesome language, I’ve turned around and started writing some comprehensive documentation about Winxed on the Rosella website.
Winxed is written in itself using a home-brewed recursive descent parser. To perform the bootstrapping, the stage 0 winxed compiler is written in C++. In stage 0, a paired-down version of the language is used to compile the stage 1 compiler which is written in that subset. Stage 1 is used to compile stage 2, which is a more full-featured version of the language. Stage 2 is used to compile itself into stage 3. Stage 3 is what you and I use when we install Winxed.
It sounds much more complicated than it is, but the net result is clear and simple: Winxed written in Winxed. If you know Winxed, or are familiar with some of the big languages it is inspired by (C++, Java, C#, JavaScript), you’ll be able to not only write software for Parrot, but also be able to hack on the Winxed compiler itself. I’ve submitted a few patches and feature additions, and I’ve found it to be a pleasure to work on.
It’s not anything goes on the compiler, however. NotFound maintains pretty tight editorial control over the software. The consistency of vision and planning does produce refreshingly nice results, though. He’s very good about taking requests and suggestions, so if you see something that’s missing definitely drop him a line.
Since bundling with Parrot in th 3.6.0 release, which incidentally is when NotFound started keeping track of version numbers, the language has come a long way: Various optimizations, a debugging mode with optional asserts and conditionals, several new built-in functions, support for multiple dispatch and most recently a new “inline” feature which allows you to inline certain types of code for performance improvements by avoiding extra PCC calls.
Here is a random-ish sample of Winxed code that I’ve been playing around with recently, written in Winxed for testing some new ideas I’m thinking of adding to Rosella:
function main[main](var args)
{
var rand = Rosella.Random.default_uniform_random();
var mutator = new Rosella.Genetic.Mutator(
function() {
float value = float(rand.get_float()) * 1000.0;
return new Data(value);
}
);
var e = new Rosella.Genetic.Engine([10, 3, 0], mutator,
function(var d) {
int x = int(d.value) - 200;
${ abs x };
return x;
}
);
var w = e.run(500).first().data();
say(float(w.value));
}
This sample is a driver program for a new Genetic Algorithms library that I’m playing with. This toy example uses the Genetic Engine type to pick some random numbers at each generation for five hundred generations to try and get a random number which is closest to 200.0. It’s a trivial example to be sure, and I’ll write more about that library in the future if anything ever comes of it. That’s not the important part of this example, however. The important part is the Winxed syntax. The syntax should be immediately familiar to anybody who has used JavaScript or C++ and any of its descendents. Here we can clearly see closures created with the function keyword, creating objects with new (Winxed is OO, not prototype-based like JavaScript), low-level types like float and int, Parrot Sub flags like [main] (AKA :main in PIR) and calling PIR opcodes directly (the ${ ... } syntax). Winxed in a nutshell is an an answer to this question: What would it look like if I took a language like C++ or JavaScript and bent it to work closely with Parrot?
Winxed creates a nice mix of dynamic behavior with static analysis. There is plenty of syntax and semantics in the compiler that can be used to inline code, propagate constants, statically link functions by name instead of doing named lookups, checking known types at compile time and issuing warnings, and other tools that would be more familiar to a user of statically typed languages.
I don’t know where NotFound is planning to take Winxed in 2012. His relentless pursuit of new features, better internals, more optimizations, better diagnostics and other improvements leaves open many potential avenues for him to travel down.
I have a few ideas for features I might like to propose and provide patches, many of which will be intended to mirror changes that will need to be made in Parrot. I have been thinking about submitting a patch to add some cleaner syntax for parameter and argument flags instead of using PIR flags by name directly. I’m also keen to get Winxed updated to use some of my new PCC changes as soon as they are available. It will make both a great test case for the new functionality and a good demonstration of any performance benefits to be had. Eventually I would like to get Winxed updated to generate .pbc packfiles directly instead of generating PIR and using IMCC as an intermediary. That’s a pretty big project, and might have to wait until we get more work done on PACT. Again, this would make an excellent demonstration of the functionality, when we have it ready to test.
I personally use Winxed to implement my Rosella project and the first stage of my JavaScript compiler “Jaesop”. Other people are using it as well. Hacker plobsing has used it for a few projects, including an OMeta parser port. Hacker benabik is using it for PACT, a re-design of the Parrot Compiler Toolkit. Dukeleto is writing bindings for libgit2 in Winxed. NotFound is writing xlib bindings in Winxed called Guitor (and some of the example programs are actually quite impressive already). There are many other examples of projects that are or will be written in Winxed, and I’m sure the number will only increase in 2012.
If you’re doing systems-level library or utility work on Parrot in 2012, Winxed is probably the language you are going to be using. I’ll talk about compilers and NQP in later posts.
I asked for some ideas about topics to write about for this advent calendar thing, and PerlJam (Perl hacker and long-time Parrot ecosystem contributor and well-wisher) suggested I devote a post to how to contribute to Parrot. Say no more! I think it’s an excellent idea for a second post. This, the second day of my lazy, late pseudo-advent calendar for Parrot will be devoted to contributions.
In the olden-days of Parrot, like last year, we used SVN. These were dark and dangerous times when new contributors were forced to submit their proposed changes in patch files via email. There was much weeping and gnashing of teeth. However, we finally made the switch not only to version control with git but also hosting with github. Now contributing to Parrot or any of its many ecosystem projects is a snap. Create a fork of the repository you want to contribute to, make your edits and commit them, and then open a github pull request. It’s so easy, you’re going to be wishing you started doing it earlier. Seriously, what are you waiting for?
If you submit enough cool changes, we’ll probably do one or both of the following:
We may also try to rope you into doing other stuff, like being one of our monthly release managers (looks great on a resume, especially if you’re entry-level), writing blog posts, mentoring GSoC and GCI students, and writing even more awesome code. If you ever read the book The Giving Tree, it’s kind of like that except everybody wins and the ending is much happier.
Parrot is a relatively rare kind of project because we have so much happening at so many levels of abstraction. If you’re a nuts and bolts kind of coder and like doing stuff “on the metal”, we have plenty of internals work that needs doing: Threading and Concurrency, Garbage Collection, Object Model, and more optimizations than you can shake a stick at. If you’re more of a middle-ware person we have lots of libraries and infrastructure projects to work on. Then we have HLL compilers like Rakudo that need help and finally end-user code and programs written in those HLLs.
Want to write games? We have xlib and opengl bindings available. Sure they may need a little bit of love, but we have them and they work very well.
Like compilers? We have a handful in development and are always looking for more. Winxed is a fun and familiar (For C++ and Java lovers) systems language. I’m working on a new bootstrapped JavaScript compiler. In the past we’ve had Ruby, Python and Tcl compilers under development, and things
Like Perl6? RAKUDO RAKUDO RAKUDO! Also, Rakudo.
Like writing documentation or making cool new websites to hold our docs? We need you! We’ve got lots of documentation that we need to expose to the users better, and we are missing lots of documentation that needs to be written. Also, since code is changing at a break-neck pace we need to update all the docs we have.
If you like solving problems, fixing bugs or implementing cool new code then we have tons of jobs for you! Sign up for a free account on Github if you don’t have one already. Search around the various Parrot repositories or search for “Parrot” to find a repository you want to hack on. Create a fork and get to work! If you want to contribute to a particular project or particular type of project and aren’t sure where or how is the best place, come talk to us and we’ll try to get you pointed in the right direction.
Speaking of talking to us, there are three good ways to get into contact with us Parrot folks, if you want to chat: IRC (#parrot on irc.parrot.org), The parrot-dev email list (parrot-dev at lists.parrot.org) or in comments and pull requests on github. Leave a comment on this blog too, and I’ll help you personally.
Parrot is a big open-source project with lots of work to be done at every level of abstraction. We’re always looking for new contributions and new contributors. If you’re interested in getting involved, don’t hesitate to get in contact with us and start writing some great code.
I had intended to publish this advent post yesterday, but even though I had it mostly pre-written I couldn’t find 5 minutes in the day to do it. An inauspicious start to this little sequence! It is already painfully clear that this advent calendar of mine isn’t going to be nearly as reliable or successful as moritz’s. To (finally) kick off this pseudo-advent calendar I’m going to jump right in and talk about the main course, the bird, Parrot. This post is going to be a short retrospective about the developments in 2011 and some clues about where we are heading (or, where I hope we are heading) next year. I’ll very likely post a more in-depth yearly retrospective around the 4.0 release in January.
Parrot, as we all know, is a virtual machine aimed at running dynamic languages. Originally it was envisioned as the backend VM for the new Perl6 language, but Parrot quickly deviated from that path. The idea quickly became to create a language-agnostic platform for hosting a variety of languages in a common, interoperable way.
I don’t want to attempt a complete retelling of the history of the project. I wasn’t around for most of it and at best I’m going to give a faulty recount. Regardless, it doesn’t really matter how we got to where we are now. What matters is what our current trajectory is. Prior to 2011 Parrot had been trying to do many things well but did no single thing well enough to really drive adoption. I think we’ve made up our minds to refocus our efforts on supporting Perl6 specifically, and many of the biggest developments planned for 2012 are going to be headed in that direction. Almost all of the work I personally am planning to do next year will be directly tied to Rakudo, trying to make their system even better, and to give them a better and more compelling infrastructure to do their thing on.
2011 has seen a lot of changes to Parrot, though the vast number of them are internal and involve code that is cleaner and more maintainable, even if not much more functional or with better performance. This continues a trend that was happening in 2010 and even earlier: The biggest tasks we’ve been working on in the last few years involve trying to get some of our older, uglier, and more brittle systems up to a decent level of quality to support additional future work. It’s the nature of the beast, and as much as I would love to say that the code we had was perfectly pretty to begin with, in many cases it has not been. This is not to disparage any prior contributors. If the history of Parrot tells us anything, it’s that Parrot hackers historically didn’t (and honestly, might still not completely) understand exactly what the goals were. So many decisions intellgently made with the best of intentions lead in the exact wrong directions. Such is life. So many systems were prototyped and those prototypes silently became “the real thing” without anybody explicitly giving a stamp of approval.
Starting several years ago we had many subsytems either deleted outright because they were unsalvagable or dramatically rewritten. Some of those events were a little painful, but in all cases we did what was necessary. Because of all the hard work from our development team we are really starting to get to a point where big system improvements and feature additions are not only possible but now very plausible. Things that would have been near impossible to implement in the code base as of 2009 are very reasonable to consider doing in 2011 and 2012. That’s a big step up, even if many of these changes aren’t visible to the end user.
Much of my personal work has been focused on getting some of the essentials into place with regards to the core execution pathway: IMCC, Embedding API, packfiles, etc. The new embedding API was added in January. IMCC’s new public interface was likewise improved and several bits of related code were cleaned in April. The new PackfileView PMC type, the load_bytecode_p_s opcode, the new bootstrapping frontend, and the new :tag syntax for PIR all came in the second half of the year. Starting in early 2012 I’m going to be ripping out all the old cruft that these things were designed to replace, and we are going to see some improvements in code quality to go along with that.
Previously, users had the :load, :init and :main flags to try and schedule when certain subroutines should be executed. The rules for all these flags were messy and overlapping, and they still weren’t considered enough for all necessary use-cases. Some people had suggested adding even more subroutine flags with more special-case semantics throughout the Parrot codebase. This, in my mind, was nonsensical.
Now in Parrot you can use the new :tag syntax to tag a sub with any flag you want:
# Almost same as old :load
.sub 'foo' :tag("load")
# Almost the same as old :init
.sub 'bar' :tag("init")
# Couldn't do this before!
.sub 'baz' :tag("SomethingNew")
And now with the new PackfileView PMC you can find and execute subroutines by tag at any time you want without having to hope and pray that the magical Parrot behavior will do what you need when you need it:
$P0 = load_bytecode "foo.pbc"
$I0 = $P0.'is_initialized'("SomethingNew")
if $I0 goto already_initialized
$P1 = $P0.'subs_by_tag'("SomethingNew")
...
$P0.'mark_initialized'("SomethingNew")
already_initialized:
...
Yes, it is a little bit more code for the end user (or intrepid HLL developer) to write, but the increase in control and flexibility more than makes up for it, in my mind. Plus, this kind of code is very easy to abstract away into a new function, so you only need to write it once. For the record, and I know I’ve discussed this topic at length on my blog in the past, the above code snippet actually executes faster than the old magical semantics, despite the fact that we have more explicit PIR code and more code total running in the runloop instead of at the C level.
The packfile loader, PIR compilation symantics and the way things like Classes, Namespaces and Multisubs are created will all be changing in 2012 for the better. Expect to see performance improvements and feature additions to these things in 2012. If you’re being smart and writing your code in Winxed or NQP we will try our best to keep you shielded from almost all of these changes. If you’re writting code in PIR I feel bad for you, son. I’ve got 99 problems and PIR is like 97 of them.
In a related note, a selection of other Packfile-related PMCs were refactored in January and now we have the ability to create usable bytecode from a Parrot-run program. The interface isn’t great and we haven’t made much use of this ability yet, but I expect big things to happen in 2012 with PACT (benabik’s rewrite of PCT) and maybe a few details coming to Rosella as well.
In 2012 I expect that we are going to finish the work we started in 2011 and have IMCC removed as a permanent built-in part of Parrot. It will still be available to people who want it as an optional library, but it will not be a part of the libparrot binary itself. This is going to have some big ramifications throughout some of the oldest and ugliest parts of the code base and will allow us to start pursuing several goals: Decreasing total code size, especially for embedded environments, being much more flexible about compilers and cleaning up the compreg system, divorcing packfile creation from packfile execution, giving us a much cleaner and more usable interface for executing packfiles, and breaking up some of the syntactic quirks of PIR from the inner semantics of Parrot. It really doesn’t seem like much but trust me when I say that removing IMCC represents more than just some theoretical edification, it opens many doors that we do want and need to travel through soon.
In March bacek added the new Generational garbage collector (GMS), and it became the default collector in May. Performance jumped considerably, especially for Rakudo; although I think some other optimizations can push the performance numbers even better. Sometime in 2012 I want to test out an idea I have with cutting out indirect function calls in the GC, and then I want to dig through some profiles to see where else I can squeeze out a few percentage points. I don’t know if there will be much, but it never hurts to look. With threading on the horizon, I also want to look into a few concurrency-friendly GC algorithms that we can utilize to cut GC overhead and maybe improve GC performance for heavily concurrent workloads.
Parrot hacker plobsing had been doing a lot of NCI-related work through the year, adding the StructView, Ptr, PtrBuf and PtrObj PMC types to replace some of the older types we had for working with raw pointers. These tools are pretty low-level and don’t always expose a very friendly interface (as we would expect at such a low level of abstraction), but have a lot of potential that we can definitely build on. Not all of the NCI changes were met with great enthusiasm (especially those that broke some order signature syntaxes), but getting NCI cleaned up and fixed up is a major hurdle for us that we have to leap over in spite of the pain. Being able to share common native library bindings among multiple HLLs is a huge deal for Parrot, and has the potential to become a major selling point if we can get it done correctly. The Rakudo/NQP folks have also been doing some cool-looking NCI work that we may want to copy in whole or in part, so look forward to those kinds of developments in 2012 as well. Hacker NotFound has been working on a new project called Guitor which uses pure Winxed code to create graphical user interfaces by calling into xlib using Parrot’s NCI. The work he’s been doing is pretty fantastic, and serves as a clear demonstration that NCI is working and working reasonably well.
Adding new native library bindings for Parrot is a very productive and very informative way to get started hacking with Parrot, for anybody who is interested!
Parrot’s object model has remained relatively static for a while, despite the fact that the problems and limitations with it are well known and oft-decried. I hard started to port 6model, the new Rakudo object model developed by Jonathan Worthington, over to Parrot in the summer months but put that work on hold while I came up with a better plan. I kept hoping that development on 6model would slow down and become more stable so I could find a good jumping off point to start with, but that never happened. Eventually I am going to need to just put my foot down and start moving code around. Expect 6model to come to Parrot in early 2012, especially if a few other people volunteer to help.
Winxed, NotFound’s system-level language for Parrot was added as a snapshot to the Parrot repository in July, and has continued to improve by leaps and bounds in that time. Recently NotFound added syntax for inline functions, which can help to improve performance in many places. I’ve got a few syntax ideas I want to play with and try to contribute as well. If we can divorce Winxed a little further from IMCC and PIR syntax (especially in the area of calling conventions and flags), we can be more free to change Parrot’s internals because Winxed’s abstracted syntax will create more of a buffer for us. In 2012 I would like to continue the trend of using PIR less and less, and using Winxed and NQP more and more for basic Parrot tasks.
Specaking of which, NQP-rx, an older variant of the amazingly useful NQP language is showing it’s age and may be dead in 2012. I would love to see it replaced by the newer 6model-powered NQP, especially once we get 6model in Parrot natively. Then we will have two awesome lower-level languages to play with in an interchangable, inter-usable way.
Late 2011 also brought us nine’s rewrite of Green Threads. They aren’t working on Windows yet, but they are working reasonably well on Linux and (I think) Mac although some kinks are still being worked out. In early 2012 expect to start seeing his implementation of full hybrid threads which are already looking awesome. As with the green threads Windows support may come later, especially since we seem to have a relative dearth of hackers working on that platform right now. When I finally get my new laptop I’ll keep windows around to dual-boot from, and will do my best to get concurrency working as well there as anywhere. As always, help in this endeavor would be appreciated. If you’re interested in Parrot concurrency, be it implementation of the internals or using it to write cool new programs and libraries, definitely let me know.
Also in 2012 expect to start seeing some of my proposed changes to PCC start getting integrated into the system. Actually, you probably won’t see them, most of the changes will be transparent in IMCC or maybe in new optimized code generated by Winxed and NQP. We’ll definitely see a few percentage points in performance improvement across PCC calls to start, and opportunity for further improvements after that.
This little retrospective has already grown into a substantial post so I won’t write too much more. Expect other posts in this advent series to be shorter and sweeter (and hopefully, more on time).
Rakudo hacker moritz is auditioning for the role of the Good Idea Fairy, which I personally think he’s a shoe-in for. A few days ago in the IRC chatroom he had this gem:
moritz> hey, have you ever thought of starting a parrot advent calendar?
whiteknight> moritz: sort of, but I wouldn't want to steal your thunder
moritz> whiteknight: I wouldn't worry about that. There's usually half a
dozen advent calendars in the Perl community, and we don't seem
to suffer from that
...
moritz> whiteknight: don't feel obliged, 'twas just a quick idea. Maybe
next year if it's too much work this year
whiteknight> moritz: If there is one thing I like doing, it's stealing good
ideas from the Rakudo folks
So that’s what I’m going to try to do: Do a daily advent-ish sort of calendar for Parrot as similar to the one that moritz does himself as I can manage. Instead of a normal advent calendar which would be the first 25 days of December, I’m going to do the remaining days leading up to New Years. Since I’m already running irrepairably late, I won’t be able to do 25 daily posts between now and then. All the better, really, since I am having trouble coming up with that many ideas! Give me a break, I’m new to this whole advent thing.
What I say I would like to do, and what I am actually able to get done in the coming days are two entirely different things. Add the usual holiday stressors and time-sinks with the cleanups and organizational hassles of moving to a new house (That’s right, we got the house!) and suddenly doing a blog post every day seems like a pretty daunting order. Again, I will do my best
Every day from now till the end of the year, starting tomorrow, I’m going to try to post at least a short blurb about something cool in the world of Parrot. I’ll try to focus on things that we’ve done cool in 2011 or are planning to do come 2012, but if I can find some things worth writing about that are older and are standing the test of time particularly well, I’ll write about that too.
If this goes well and I don’t get all lazy and skip too many days, maybe we can turn this into some sort of regular tradition. If other Parrot hackers want to jump in and write a short guest post here on this blog, or write their own posts elsewhere, let me know. More help is more awesome!
Obviously I’m going to mention some of the projects I’ve been working on because those are the things I know the most about. I’ll try to include a bunch of cool projects from other people too. Please send me ideas if there is something you want to read about in the coming month.
Tomorrow the advent calendar officially starts. I’m going to begin by talking about the main course: the bird.
I've created a new project: Guitor, a GUI creaTOR module written in winxed using Xlib via NCI.
It's not much feature complete yet, but has enough funcionality to provide some nice examples, including a viewer for R->R functions and a drawing board.
I've tested only in linux amd64 and i386, so the StructView used may be not appropiate in some platforms, feedback will be appreciated.
https://github.com/NotFound/Guitor
Install: winxed setup.winxed install
Run an example: winxed examples/pizarra.winxed
Enjoy!
Since I haven’t been posting often enough recently, and my schedule is so screwed up because of holidays and the like, and since I happen to have two drafts available I’m posting them both today.
A while back I was playing around with some sorting algorithms and benchmarks sorting_benchmarks for Parrot. I had a quicksort hybrid implementation written in winxed that was consistently out-performing the built-in C implementation of quicksort by about 20%. I decided that I wanted to play with a few more algorithms, especially algorithms which were known to have different performance characteristics on different input types.
For GCI I created two new tasks asking for alternate sort implementations. The first was a Timsort implementation, and the second was for Smoothsort. GCI students Yuki’N and blaise each delivered a winxed implementation of their respective sort, and now I’m able to do some interesting benchmarks showing how they work on different inputs. Here are some of those results:
N = 100000
SORT_TRANSITION = 6
FORWARD-SORTED (PRESORTED) BENCHMARKS
sort with .sort BUILTIN (reversed)
9.872862s - %100.000000
Number of items out of order: 0
sort with Rosella Query (reversed)
8.623591s - %87.346416 (-%12.653584 compared to base)
Number of items out of order: 0
qsort+insertion sort (reversed)
7.812607s - %79.132142 (-%20.867858 compared to base)
Number of items out of order: 0
timsort (reversed)
0.693221s - %7.021481 (-%92.978519 compared to base)
Number of items out of order: 0
Smoothsort (reversed)
2.154514s - %21.822589 (-%78.177411 compared to base)
Number of items out of order: 0
REVERSE-SORTED BENCHMARKS
sort with .sort BUILTIN (reversed)
10.000436s - %100.000000
Number of items out of order: 0
sort with Rosella Query (reversed)
8.891811s - %88.914232 (-%11.085768 compared to base)
Number of items out of order: 0
qsort+insertion sort (reversed)
8.555817s - %85.554438 (-%14.445562 compared to base)
Number of items out of order: 0
timsort (reversed)
0.812737s - %8.127015 (-%91.872985 compared to base)
Number of items out of order: 0
Smoothsort (reversed)
10.521473s - %105.210141 (+%5.210141 compared to base)
Number of items out of order: 0
RANDOM BENCHMARKS
sort with .sort BUILTIN (random)
13.536509s - %100.000000
Number of items out of order: 0
sort with Rosella Query (random)
12.566452s - %92.833773 (-%7.166227 compared to base)
Number of items out of order: 0
qsort+insertion sort (random)
11.859363s - %87.610203 (-%12.389797 compared to base)
Number of items out of order: 0
Timsort (random)
14.461384s - %106.832449 (+%6.832449 compared to base)
Number of items out of order: 0
Smoothsort (random)
12.764498s - %94.296823 (-%5.703177 compared to base)
Number of items out of order: 0
The SORT_TRANSITION parameter above is the size of the array below which the hybrid sort switches from quicksort to insertion sort. 6 is an arbitrary value, but seems to have reasonably good results. I could spend some time to tune this value, but I haven’t.
These benchmarks show some things we already knew: the built-in Quicksort implementation from Parrot is poor across the board. The Quicksort variant that’s in Rosella is better, and my hybrid quicksort+insertion sort variant is better still. What’s interesting to see is how Timsort and Smoothsort perform on these workloads.
Timsort is designed to work well with “real-world” data which is already sorted or already partially sorted. It identifies runs in the data that are already mostly sorted and merges subsequent runs together. Timsort also has the nice feature of identifying runs which are already reverse-sorted and does a very fast reverse to get them ready for merging. We see that the Timsort blows all other challengers out of the water when the array is already sorted and already reverse-sorted. In these instances, the analysis stages of Timsort figure out that no sorting is ever necessary.
Smoothsort constructs a special type of heap from the input data, and uses basic balancing operations on the heap to find the largest value, extracts it, and rebalances the heap. It works very well for the pre-sorted case, but not quite as well as Timsort because it does need to construct this heap first then iterate over it. Smoothsort goes so quick because the heap rebalancing operations when the array is already sorted are almost free. So Smoothsort is quick on a pre-sorted array, but we also see that it’s terrible when the input array is reverse sorted. Timsort still does very well in this case.
When the array is completely random, the story is a little bit different. Both Timsort and Smoothsort lose to the quicksort implementations for completely random data. Timsort actually is worse than Parrot’s built-in quicksort, one of the few results we measure to be slower. Smoothsort is in the same ballpark as the Rosella quicksort, but is a few percent off of the hybrid sort.
If I had to put together a small report-card for these algorithms under all these conditions, it would look something like this:
algorithm pre-sorted reversed random
------------------------------------------------------
quicksort B B A
hybrid sort B+ B+ A+
Timsort A+ A+ C
Smoothsort A C B+
At a glance you can really see where each algorithm excels.
It’s worth nothing here that all these implementations are relatively naive and unoptimized. So we can say that “But Algorithm X could be optimized to be even better!”, but the same can be said about all of them. I’ll be doing some of that in the coming days, but I don’t expect any radical changes.
What I would like to do in the future is provide a default sort implementation but also have the sorting interface take some sort of optional “hint” flag that can tell the sorter about certain properties of the data, and select an algorithm specifically tuned for that workload. From the data I have seen so far, I suspect I would like to use by quicksort hybrid as the default, but be able to switch to Timsort if the user hints the input data might already be partially sorted.
Everybody will tell you that quicksort is O(n log n) on average, and has a pathological worst-case that’s O(n^2). People will also happily point out that something like Timsort has a best case of O(n). What these simple expressions ignore are all the details. The pathological worst case of Quicksort requires a very specific input ordering and an absolute worst selection of the pivot element at each recursion. Even basic modifications to the algorithm or using a hybrid approach completely eliminates these worst-cases. Without such basic modifications the worst-case is certainly possible, but relatively unlikely.
What people also forget when talking about big-O notation are the coefficients. When I say that quicksort has average complexity of O(n log n), what I really mean is that the amount of time it takes is:
t = c * n * log(n) + f(n) + d
Where f(n) is any function that grows more slowly than n * log(n), and c and d are arbitrary coefficients. The quicksort algorithm, properly implemented and optimized, has very low coefficients. The algorithm requires very little setup (d) and performs relatively few operations per iteration (c). The reason why Parrot’s built-in quicksort performs so poorly is because the c there involves recursive PCC calls and nested runloops, so c is unnecessarily large. Just by having it all run in a single runloop we can drop c enough to beat the original implementation. The two implementations use almost exactly the same algorithm, so it’s differences in c (and, to a smaller extent, differences in d) that result in the timing improvements.
Insertion sort, and this is why I picked it to be part of my hybrid quicksort, has O(n^2) complexity, but with very low c. Below a certain threshold, the quick sort algorithm becomes dominated by recursion calls and stack management, and below that threshold the insertion sort performs better. Basically, there’s a very narrow window below which insertion sort’s O(n^2) is lower than quicksort’s O(n log n). By switching algorithms below that threshold, we can squeeze out a few extra percentage points in performance savings. I could easily have used something else like Bubblesort here for the same kind of effect.
Timsort, because it does involve ahead-of-time analysis steps to detect pre-sorted runs is always going to be at something of a disadvantage when faced with a purely-random input. Assuming it’s core sorting algorithm is as efficient on random data as quicksort is (it isn’t, but we can pretend), Timsort is always going to lose those benchmarks because quicksort will be just as fast during the sorting and wont have a forward analysis phase.
Smoothsort is very interesting from a mathematical perspective, and it doesn’t do ahead-of-time analysis like Timsort does. Of course, it does need to construct that special heap, which likewise acts like a damping agent on overall performance results. Smoothsort does very well on pre-sorted data, reasonably well on random data, and completely falls apart when the data is almost exactly reverse-sorted. I suspect we could do some kind of analysis there to detect the worst case and build our heaps backwards, but that would further cut into the performance cost of the common cases.
I’ve never been too happy with the random number generation capabilities of Parrot. Parrot essentially provides a thin wrapper around the system rand and srand capabilities, with an unimaginative interface. This is a fine system for most purposes, but sometimes you need something a little bit different, or want to substitute your own version without too much hassle.
One of the tasks I submitted to GCI was asking for an implementation of the Mersenne Twister algorithm, a good and well-known algorithm to generate pseudo-random numbers. GCI student Yuki’N, who was also a prolific contributor last year for the program, submitted a fine implementation for Rosella to use. Shortly thereafter I wrote up a quick Box-Muller implementation to get normally-distributed numbers too. Now, Rosella has a pretty decent start of a random number library.
For an example I wrote up a short histogram program to display the output. Here are the histograms for the Mersenne Twister and the Box-Muller generators:
Histogram of 500 uniformly-distributed floats:
0: ##########################
1: ####################
2: ##########################
3: ##########################
4: #############################
5: #########################
6: ###########################
7: ########################
8: ####################
9: #######################
10: ############################
11: ##############
12: ###############################
13: #########################
14: ##############
15: ##########################
16: ##########################
17: ##############
18: ##############################
19: #######################
Histogram of 500 normal-distributed floats:
0: #
1: #
2: ####
3: ######
4: ########
5: ########################
6: #######################
7: #########################################################
8: ##########################################################################
9: ###################################################################
10: ################################################################
11: ####################################################
12: #######################################################
13: #########################
14: #################
15: ########
16: ##########
17: #
18:
19: ##
These are the two distributions that I wanted to have most, but they are certainly not the only one ones available.
And random number generation is not the only feature that this library will have, either. The Rosella Query library currently had an implementation of a Fisher-Yates Shuffle algorithm for shuffling an array. I moved that implementation to the new Random library already and updated it to use the Mersenne Twister as the random number source instead of Parrot’s built-in rand opcode.
I have got a handful of other small features and additions that I would like to add to this library, but considering that it is so straight-forward and algorithmic I expect I can make it stable pretty soon without much headache.
All and Everyone:
Parrot's current set of documentation is now up on github. Hopefully, this will aid in the documentation revision effort.
To view the html documents, simply navigate your favorite browser to http://parrot.github.com. To edit the documentation, simply navigate your browser to https://github.com/parrot/parrot.github.com, clone the repo, and make your edits (or provide a pull request).
Thank you.
Alvis
The first step in generating a packfile is understanding the packfile. So I've been writing a Winxed disassembler. It's pretty fully featured at this point. It's showing constants, annotations, and symbolic instructions. Despite my fears, it turned out that PCC wasn't all that difficult to deal with.
On behalf of the Parrot team, I’m proud to announce Parrot 3.10.0, also known as “Apple Pi”. Parrot is a virtual machine aimed at running all dynamic languages.
Parrot 3.10.0 is available on Parrot’s FTP site, or by following the download instructions at http://parrot.org/download. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Git to retrieve the source code to get the latest and best Parrot code.
Parrot 3.10.0 News:
- Core
+ The mark VTABLE was added to the Select PMC
+ The Parrot::Embed Perl 5 module was removed from parrot.git and now lives
at https://github.com/parrot/parrot-embed
+ A set_random method was added to the Integer PMC, so random numbers can
be generated without needing to load math dynops
+ A new implementation of green threads was added to Parrot, in preparation
for a robust hybrid threading system. Green threads are currently
not available on Windows.
- Languages
+ Winxed
- 'multi' modifier improved
- throw "string" now emits throw instead of die
- several optimizations in generated code
- improved some error dianostics
- Community
- Documentation
- Tests
Many thanks to all our contributors for making this possible, and our sponsors for supporting this project. Our next scheduled release is 20 December 2011.
Enjoy!
Finally! I have managed to find enough time -- in the wee hours of the morning no less -- and a stable current of electricity -- meaning the power company has finished its little poll-replacement project -- to push a new 'documentation_revisions' branch to the parrot repo. This is the branch from which anyone involved with the documentation revision effort will work. Hopefully, that is, if I've gotten everything correct. (I make no pretensions here to knowing or to understanding, well, how git works.
This is just a quick, first blog post to ensure everything is working correctly.
Cheers!
Alvis
Trying to struggle with Select in Parrot, I accidentally discovered that its Socket has a .poll method. What a trivial, yet satisfying way to have some simple non-blocking IO. Thus, MuEvent was born.
Why MuEvent? Well, in Perl 6, Mu can do much less than Any. MuEvent, as expected, can do much less than AnyEvent, but it’s trying to keep the interface similar.
You’re welcome to read the code, and criticise it all the way. Keep in mind that I can no idea how should I properly write an event loop, so bonus points if you tell me what could have been done better. I don’t expect MuEvent to be an ultimate solution for event-driven programming in Perl 6, but I hope it will encourage people to play around. Have an appropriate amount of fun!
On behalf of the Parrot team, I'm proud to announce Parrot 3.9.0 "Archaeopteryx".
Parrot (http://parrot.org/) is a virtual machine aimed at running all dynamic languages.
Parrot 3.9.0 is available on Parrot's FTP site
(ftp://ftp.parrot.org/pub/parrot/releases/supported/3.9.0/), or by following the
download instructions at http://parrot.org/download. For those who would like
to develop on Parrot, or help develop Parrot itself, we recommend using Git to
retrieve the source code to get the latest and best Parrot code.
TL;DR: https://github.com/parrot/PACT
So after my last blog post, I started a gist to keep track of "how would I write PCT". I called it PACT, the Parrot Alternate Compiler Toolkit. I suppose I could have called it PCT2, but I really don't want to try to claim it will 100% replace PCT. PCT's very valuable to the people using it right now, but there's no small desire to add to it and I'd like to help it be better. Parrot's main audience, to my mind, is prospective compiler writers and the easier we can make their lives the better.
I haven’t had a lot of time for hacking recently, but I did have a little bit of time this weekend. I decided that Jaesop needed a little bit of love, so I went ahead and provided it.
The first thing I did this morning was to add PCRE bindings to the stage0 runtime. Now, if you build Parrot with PCRE support, you can write this kind of stuff in JavaScript:
var regexp = new RegExp("ba{3}", "");
regexp.test("baaa"); // 1
regexp.test("caaa"); // 0
regexp.test("baa"); // 0
regexp = /ba{3}/; // Same.
Support isn’t perfect yet, but it’s enough of a start to get us moving with other things. Specifically, I don’t have modifiers like g and i working yet, but once I figure out the way to tell PCRE to do what I want it shouldn’t be too hard to add.
If you don’t build Parrot with PCRE support, the RegExp object won’t be available, and I think it’s going to spit out some ugly warning messages. Considering this is just an ugly bootstrapping stage and it’s not complete yet, I don’t mind making these kinds of things optional.
I also added in the beginnings of a runtime. Now there are some basic objects like Process which you can use to interact with the environment, and FileStream which you can use for input and output. The process variable is always available as a global, and it gives access to the standard streams and command-line arguments. Here are some examples:
process.stdout.writeLine("Hello World!");
process.stdout.writeLine(process.argv[0]);
var s = process.stdin.readLine();
And so, after several weeks of development, you can finally write a simple “Hello World” program in Jaesop.
What my hacking today has shown me is that the one thing I am severely lacking on are the tests. I do have some tests but I don’t have nearly enough. Specifically, I’ve learned that my test coverage of Arrays is severely inadequate. I also need to test a few other details which I found out were horribly broken when I went to play with them today.
Despite some of the setbacks, Jaesop stage 0 compiler is progressing nicely and it’s getting to the point when we can start to do some real work with it. These few runtime additions, though small, have greatly improved the situation. There are a few things I still need to do to it, besides the testing I mentioned above: I need to improve the PCRE bindings because right now they are very basic. I need to add methods to Object, Array, and String for usability. I also need to add in a mechanism like node.js’ require routine to load modules, and maybe a few other similar code management details. When that stuff is all done, and when Parrot has proper 6model support, we can start moving forward on the stage1 compiler. I’m really starting to get excited about taking that next step.
On behalf of the Parrot team, I'm proud to announce Parrot 3.8.0, also known as "Magrathea". Parrot (http://parrot.org/) is a virtual machine aimed at running all dynamic languages.
Parrot 3.8.0 is available on Parrot's FTP site (ftp://ftp.parrot.org/pub/parrot/releases/devel/3.8.0/), or by following the download instructions at http://parrot.org/download. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Git to retrieve the source code to get the latest and best Parrot code.
In my work on Jaesop, I realized that some parts of the Rosella Harness library were a little bit more messy than I would like. I decided to take some time and get that library raised up to a better level of quality. To make some cleanups, I used the Query and FileSystem libraries for certain tasks, which turned out to be a great move, because I identified nifty new features that were needed in those libraries as well.
The first version of the Query library functionality was very straight forward. It basically provided the implementations of some higher-order functions and method semantics that allowed calls to be chained together. Here is a quick example:
var result = Rosella.Query.as_queryable([1, 2, 3, 4])
.filter(function(i) { return i % 2 == 0; })
.map(function(i) { return i * 2; })
.fold(function(s, i) { return s + i; }, 0)
.data();
That example, clearly contrived, takes an array of numbers. It filters out the odd numbers, then multiplies everything else by 2 and sums them together. It’s simple and straight-forward: The filter takes an input array and generates an output array of values which meet the requirements. The map routine takes an input array and produces an output array. The fold routine takes an input array and outputs a single integer number. If each method output its name to the console when it was invoked, we would see something like this:
filter filter filter filter
map map
fold fold
data
We do all the filtering first, then all the mapping, then all the folding. It’s very straight forward, but it’s also eager, which isn’t great when we would rather be working with a lazy object.
The new addition to Query is the Stream. A Stream is any iterable object which might prefer to be read lazily. Here’s an example that I’m playing with, using some new improvements to the FileSystem library as well:
var f = new Rosella.FileSystem.File("foo.txt");
var result = Rosella.Query.as_stream(f)
.take(5)
.filter(function(l) { return l != null && length(l) > 0; })
.map(function(l) { return "<<" + string(l) + ">>")
.data();
I’ve updated Rosella.FileSystem.File to be iterable. The default File iterator reads the file line by line. This is a new feature and isn’t really configurable yet. In this example, we create a Stream from the File object. We take the top 5 lines from the file, remove any empty lines, and surround the remainder in << >> brackets. The best part is that we read the file lazily. This example only reads the top 5 lines of the file, it does not read the entire text of it. That’s a big help if we have a huge file, or if we have something like a long-lived pipe that is spitting out an endless sequence of data. Another thing that is different about Streams is that they are interleaved. To see what I mean, if the methods above printed out their names when invoked, we would see this pattern:
take filter map
take filter map
take filter map
take filter map
take filter map
data
Where the first example did all the maps first and all the filters second, this example does them each one at a time for each input data item. Some of the methods which are on a normal Queryable aren’t present on Stream, and some of the Stream methods aren’t lazy. Some of them need to be eager, like .data(), .sort() or .fold().
If you want an updated harness for your project, it’s easy to get one. Even easier than copy+pasting the code from above. If you have Rosella installed, you can automatically create a harness from a template. run this command at your terminal:
rosella_test_template harness winxed t/ > t/harness
That’s all you need, and you’ll get a spiffy full-featured harness which takes advantage of all the new features I’ve been working on. If you prefer your harness be written in NQP instead of Winxed, just change the “winxed” argument above to “nqp” and you get that.
I’m going to work on an installable harness binary so you can just use one without needing to create your own harness. I don’t have it yet, but it will not be too hard to make.
The harness is basically a huge iterator. You set up a bunch of tests organized into a list of test runs. The harness iterates over each test run, iterates over each test, gets the output, and iterates over the lines of text in that output to get results. Then it iterates over all test runs, iterates over all result objects to get the results display to show to the user. This sounds like a perfect use for Query functionality, doesn’t it? That’s exactly what I thought, anyway. I reimplemented several parts of it using Query and the new Stream object. The input is set up as a stream over a pipe, and the TAP parsing is implemented as a stream of tokens from a String.Tokenizer. Combine those changes with some refactors, fixing abstraction boundaries, and an eye towards test coverage, and the new code is much prettier than the old code.
What most affects users is that harness code can now be cleaner. Here is what a simple harness used to look like in Winxed:
function main[main]() {
var rosella = load_packfile("rosella/core.pbc");
using Rosella.initialize_rosella;
initialize_rosella("harness");
var factory = new Rosella.Harness.TestRun.Factory();
var harness = new Rosella.Harness();
var view = harness.default_view();
factory.add_test_dirs("Winxed", "t", 1:[named("recurse")]);
var testrun = factory.create();
view.add_run(testrun, 0);
harness.run(testrun, view);
view.show_results();
}
Here is what a new one looks like:
function main[main]() {
var rosella = load_packfile("rosella/core.pbc");
var (Rosella.initialize_rosella)("harness");
var harness = new Rosella.Harness();
harness.add_test_dirs("Automatic", "t", 1:[named("recurse")])
.setup_test_run(1:[named("sort")])
harness.run();
harness.show_results();
}
Not too bad for 6 real lines of code! The new "Automatic" test type reads the she-bang line (”#! ...”) from the test file to determine how to execute it. If you want to specify a particular language like "NQP" or "Winxed", you can still do that too. Notice also that we can sort files by filename too, if we pass in that parameter to .setup_test_run. At the moment, sorting is only alphabetic and per-run only. We don’t shuffle files between runs.
Harnesses using the older-style should all still work. I’ve tried as well as I can to keep the code backwards compatible. If you have a harness that doesn’t work anymore after updating Rosella it’s a bug and I would love to hear about it. Also, all the same capabilities are there: The ability to substitute a custom View, the ability to break files up arbitrarily into test runs, the ability to specify custom subclasses of TestRun or TestFile or other stuff like that too, if you need some custom semantics.
After these rewrites, the version of the Harness library is 3. I don’t know if anybody follows along with these per-library version numbers, but it is a decent point of reference. I don’t expect to be making any large changes to this library again for a while.
I’ve implemented a very quick and basic iteration facility for files as part of the Rosella FileSystem library. The iterator type I have so far is a basic line iterator, calling the .readline() method on the given handle until EOF.
There are two ways to use the new FileIterator class: Iterate over a Rosella File object directly, or create an IterableHandle object over an existing low-level handle.
// Iterate over a File object
var file = new Rosella.FileSystem.File("foo/bar.txt");
for (string line in file) {
...
}
// Make an Iterable Handle
var fh = new 'FileHandle';
fh.open("foo/bar.txt", "r");
var ih = new Rosella.FileSystem.IterableHandle(fh);
for (string line in ih) {
...
}
That second option is actually kind of neat, because you can use it over any Handle object: FileHandle (including standard input and pipes), StringHandle, Socket, etc.
If you really want to be tricky, you can do what I do in the Harness library and create a stream over a handle and really do a lot of cool stuff:
var fh = new 'FileHandle';
fh.open("foo/bar.txt", "r");
var ih = new Rosella.FileSystem.IterableHandle(fh);
var stream = Rosella.Query.as_stream(ih);
stream
.take(5)
.filter(function(l) { return l != null && length(l) > 0 && substr(l, 0, 1) != "#"; })
.project(function(l) { return split(";"); })
.foreach(function(string s) { say("Look at this: " + s); })
.execute();
Again, this is a contrived example, but it should become apparent what kinds of stuff you can do with this.
I don’t have directory iterators yet. That is something I haven’t needed yet, but for which I can see some uses.
I don’t know why I didn’t think about it earlier, but now Tokenizers are iterable as well. If you have a tokenizer, you can iterate over them in two different ways:
var tokenizer = new Rosella.String.Tokenizer.Delimiter(",");
tokenizer.add_data("a,b,c,d,e,f");
for (string field in tokenizer) {
...
}
tokenizer.add_data("g,h,i,j,k,l");
for (var t in tokenizer) {
...
}
In the first, we do a shift_string operation which returns the raw string data. In the second we do a shift_pmc operation which returns the Token object. The Token contains some information like the type of token, some custom metadata, etc.
And of course, since you can iterate over them, you can use a Stream:
var tokenizer = new Rosella.String.Tokenizer.Delimiter(",");
tokenizer.add_data("a,b,c,d,e,f");
var stream = Rosella.Query.as_stream(tokenizer);
var new_data = stream
.map(function(t) { return t.data(); })
.filter(function(s) { return s != "b" && s != "e"; })
.fold(function(a, b) { return sprintf("%s,%s", [a, b]); })
.next();
The new Harness library is making me pretty happy. I’m working on tests now, because the library has never had good test coverage and is suddenly much more testable than it ever has been. Considering Harness is a central part of my TAP testing strategy, it’s kind of embarrassing that it has never been well tested itself. I am working on test coverage in a branch, and will probably be merging that to master soon. I don’t expect to make any big changes to the library for a while after that.
The Stream class is not well fleshed-out yet and is absolutely untested besides the tests for its use in Harness. I need to finish up with a few of the features that I haven’t needed yet, and then test it all. There are a few tweaks I might want to make to the way it works, but for the most part I am pretty happy with how it turned out and how fun it is to use.
The FileIterator and Token Iterator types I mentioned are even newer and less mature than Stream is, and need some serious review. They’ve been useful tools to get me to this point, but they can definitely stand to be improved in non-trivial ways. I’ve got some big refactors of the String library planned in the future, so if anybody has any requests for features now is a good time to mention them.
In my testing of Harness, even though it’s not complete yet, I’ve already found a few changes that need to be made in the MockObject and Proxy libraries as well. I plan to take a good hard look at those things to make sure they are up to the level I expect them to be. Also, I have a few other unstable libraries floating around that need attention, and could potentially become stable if I like where they are going.
For anybody who missed it, Parrot Architect Christoph sent an email to the parrot-dev mailing list suggesting that things were not going well for the Parrot project and we need to make a few changes. Specifically, we need to become faster and more nimble by ditching the deprecation policy, being less formal about the roadmap, and being more focused on making Parrot better. Later, Jonathan “Duke” Leto sent out an email talking about how we need to have a concise and clear vision to pursue, to prevent us from getting stuck in the same kind of swamp that we’re trying to pull ourselves out of. He had some good ideas there that need to be addressed.
That’s right, go back and read it again: We’re ditching the deprecation policy. I’ll bring the champagne, you bring the pitch forks and lighter fluid. It’s time for celebration. At least, it should be.
After the email from cotto went out, things went in a direction I didn’t expect. People started getting angry, and some of that anger was directed towards Rakudo. I think it was misdirected. Rakudo isn’t the problem and never has been. The problem was the deprecation policy and some of the related decisions that have been made with it over time.
The thinking goes, I think, something like this: The deprecation policy was bad. Rakudo expected us to do what it said and that we promised to do. Therefore, Rakudo must have been bad also. I’m oversimplifying, of course.
Rakudo has their own thing going on. They have goals, and they make long-term plans and they have developers and they have dependencies and all the other stuff that an active software project has. Parrot is a pretty damn big part of their world, and knowing what Parrot is doing or what it plans to do and at what times is important for them. If Parrot has a policy that says “we guarantee certain things cannot change faster than a certain speed, or more often than certain limited times”, they start to make plans around those guarantees and start to organize themselves in a way to take advantage of that. It’s what any project would do.
Imagine, as a Parrot developer, that the GCC developers sent out a mass email tomorrow that said the equivalent of “Oh, we’re not supporting the C89 standard anymore, we’re only going to be compiling a custom language similar to C but with some new non-standard additions, but without some of the trickier, uglier parts of the standard. Nothing you can do about it, so get to work updating your code if you want to continue using GCC”. We’d have some work to do, I’m sure, and we probably wouldn’t be too happy about it. Rakudo on the other hand knows that Parrot isn’t nearly so mature or stable as GCC, and has a lot of improvements to make. They might not always get credit for it, but they have been pretty patient with Parrot over the last few years, even when we probably didn’t deserve so much patience.
Some people have said that Rakudo has been an impediment to Parrot development, or that they are a reason why Parrot has problems and why the pace of development has been so slow. I think that’s a short-sighted sentiment. It’s not Rakudo’s fault that they expect us to mean what our official policy documents say. It’s also not their fault that we put together the deprecation policy in the first place, or that we’ve implemented it in the particular way we have over the years. In short, whatever negative feelings some parrot devs think they have about Rakudo is just a smoke screen. The way Parrot and Rakudo interact is the symptom. There are larger cultural aspects at the root of the problem, and the deprecation policy was a large part of that.
Rakudo developers, by and large, want Parrot to develop and improve faster. I haven’t spoken to a single Rakudo developer who was unhappy to see the deprecation policy go. Most of them are ecstatic about the change. It’s hard to say that these people somehow want to sabotage us, or delay us, impede us, or whatever else. The things that Parrot needs (better performance, better encapsulation and interfaces, better implementations, better focus) are all things that are going to benefit Rakudo as well. This is what they want too. We’re always going to quibble over details, but in general they want from us what we want from ourselves. 95% of the improvements we need to make for Rakudo are going to benefit other languages as well. The other 5% can be negotiated.
Parrot and Perl6 have a pretty long and interesting history together. Unfortunately, that history hasn’t always been pretty. Parrot was originally started as the VM to run Perl6. The idea of running multiple dynamic languages in interoperation harmony, including early plans to support later versions of Parrot 5, came later but eventually eclipsed the original goal in importance. You don’t have to look far, even in some of the subsystems that have been most heavily refactored in recent months, to see Perl-ish influences in the code. Sometimes those influences are far from subtle. You also don’t have to look too far to find instances of subsystems that were designed and implemented (and redesigned, and reimplemented) specifically to avoid doing what Perl6 needed.
Even in subsystems where the original goal may have been to support the needs of Perl6, many of those were designed and developed before people knew too well what Perl6 needed. There was a lot of guessing, and a lot of attempts made to become some sort of hypothetical, language-neutral platform that would some how end up supporting Perl6 well, without ever taking the needs of Perl6 into account specifically. It’s like throwing the deck of cards in the air and hoping that they all land in an orderly stack. It’s almost unbelievable, and thoroughly disappointing to think that the “Perl6 VM” would do so little over time to address the requirements of Perl6 and keep up with its needs as the understanding of those needs became more refined.
Around the 1.0 release Perl6 moved to a new separate repository which severely decreased bandwidth in the feedback loop. Whether this was a good move in terms of increased project autonomy, or a bad move in terms of decreased collaboration is a matter I won’t comment on here. After the two projects separated, Parrot added a deprecation policy which guaranteed that our software couldn’t be updated to reflect the maturing Perl6 project as it gained steam.
The short version goes like this: Parrot was supposed to be the VM to run the new Perl6 language. At few, if any, points in the project history did Parrot focus strongly on the needs of Perl6. Now, a decade later, people act shocked when Parrot doesn’t meet the needs of Perl6 and meet them well. This, I believe, is the root of the problem.
Look at other VMs like the JVM and the .NET CLR. The JVM was developed with a strong focus on a single programming language: Java. When the JVM became awesome at running Java, and became a great platform in its own right, other languages like Clojure, Groovy and Scala started to pop up to take advantage. This is also not to mention the ported versions of existing languages that also found a home there: Jython and JRuby are great examples of that. The .NET CLR was set up with a focus on the languages C++, C# and VisualBasic.NET. Once the CLR became great at running these, other languages started to pop up: F#, IronPython, IronRuby, and others.
Those other VMs became great because they picked some languages to focus on, did their damndest to make a great platform for those, and then were able to leverage their abilities and performance to run other languages as well. Sure, we can always make the argument that Scala and Groovy are second-class citizens on the JVM, but that doesn’t change the fact that both of those two run better on JVM, even as second-class citizens, than Perl6 runs on Parrot.
Somewhere along the line, the wrong decision was made with respect to the direction Parrot should take, and the motivations that should shape that direction. We need, in this time of introspection and reorganization, to unmake those mistakes and try to salvage as much of the wreckage as possible.
It should be obvious in hindsight where mistakes were made, and how we ended up in the unenviable situation we are in now. This isn’t to say that the people who made those decisions should have known any better. At the time there were good-sounding reasons all around for why we needed to do certain things in certain ways. Hindsight is always clearer than foresight. It’s easy to say that Rakudo is to blame because Parrot is filled with half-baked code that should have been good enough for Perl6 but never was, and then a deprecation policy that Rakudo expects to be followed. It’s easy to misattribute blame. I understand that. What we shouldn’t do is keep following that line of logic when we know it’s not true. It’s not correct and we need to get passed it. Set it aside. Put it down. Walk away from it.
Look at the example of 6model. I don’t know what the motivations were behind the design and implementation of Parrot’s current object model, but I have to believe that it was intended to either support or enable support through extensibility of the Perl6 object model. It failed on both points, and eventually Rakudo needed to implement its own outside the Parrot repo. It’s an extension, which means it works just fine with Parrot’s existing systems and doesn’t cause undue interference or conflicts. 6model is far superior to what Parrot provides now, and is superior for all the languages that we’ve seriously considered in recent months: Cardinal (the Ruby port) was stalled because it needed features only 6model provided. Puffin (the Python port) needed 6model. Jaesop, my new JavaScript port, is going to require 6model because the current object model doesn’t work for it. These represent some of the most important and popular dynamic languages of the moment, and all of these would prefer 6model over the current Parrot object model by a wide margin. So ask yourself why the Rakudo folks were forced to develop 6model elsewhere, and why Parrot hasn’t been able to port it into its core yet. Ask yourself that, and see if you can come up with any reasonable answer. I can’t find one, other than “oops, we screwed up BIG”.
6model should have been developed directly in the Parrot repo. Everybody knew that our object model is garbage and that 6model was going to be a vast improvement. Maybe we didn’t know how much it would improve, but we knew that the amount would be any. But instead we had a policy that effectively prevented that kind of experimentation and development, and a culture that claimed doing things the Perl6 way would prevent us from attracting developers from other language camps. Again, despite the fact that the developers of other languages on Parrot desperately wanted an improved object model like 6model, we basically made it impossible.
And then because of that mistake that was made, if we finally want to get the real thing moved into Parrot core where it belongs, we have to spend some significant developer effort to do it. Of course Rakudo already has 6model working well where it is, so moving 6model into Parrot core is listed as “low priority” by the Rakudo folks. Not that we can’t do it still (and we will do it), but why would they prioritize moving around something they already have? We shot ourselves in the foot, but the bullet ricocheted a few times and hit us in the other foot, the hand, and then shoulder.
There’s a sentiment in the Rakudo project that the fastest way to prototype a new feature is to do it in NQP or or in Rakudo and eventually maybe backport those things to Parrot. That’s a devastating viewpoint as far as Parrot devs should be concerned, and we need to do everything we can to change that perception. It’s extremely stupid and self-defeating for us to hold on to ugly, early prototypes of Perl6 systems and not jump at the ability to upgrade them to the newer versions of the same Perl6-inspired systems where possible. This is especially true when there are no other compelling alternatives available, or even clear designs for alternatives. It’s also extremely stupid of us to make it harder for people to improve code that have frequently been referred to as “garbage”.
There is nothing in Parrot that is so well done that if we were asked by our biggest user to change it that we shouldn’t take those suggestions seriously. In most cases, we should take those suggestions as command. We’re VM people and maybe sometimes we might know better (or think we know better) or at least think about things differently. We can have the final say, but we should take every suggestion or request extremely seriously. Especially when those requests come from active HLL developers and Especially when those developers are part of a project like Rakudo.
We should be much more aggressive about moving our object model to 6model. We should be very aggressive about moving other Rakudo improvements, in whole or in part, into Parrot core. Things like the changes required by the argument binder, or the new multi-dispatcher. We should also be very aggressive about having other such radical improvements prototyped directly in Parrot core, especially where we don’t have an existing version, or where our version is manifestly inferior. Parrot core is where those kinds of things belong. Of course, we need to keep an eye towards other languages and make tweaks as appropriate, but we need to pursue these opportunities when they are presented.
dukeleto sent an email to the parrot-dev list in follow-up, trying to lay out some ideas for a new direction and a new vision for Parrot. Some of his ideas are good, but some need refinement. For instance, he says that a good goal for us would be to get working JavaScript and Python compilers running on Parrot and demonstrate painless interoperability between them. I do agree that this is a great goal and could bring Parrot some much-needed attention. However, it can’t be our only goal.
Right now Parrot has one major user: Rakudo. There’s no way around it, in terms of raw number of developers and end users, they are the single biggest user by a mile. No question. For us to put together a vision or a long-term roadmap that doesn’t feature them, and feature them prominently, is a mistake. There may come a time in the future when they decide to target a different VM instead of Parrot. That time might even be soon, I don’t know and I can’t speak for them. What I do know is that so long as Rakudo is a user of Parrot, Parrot needs to do its damndest to be a good platform for Rakudo. A better vision for the future would be something like: “Make Parrot the best platform possible for Rakudo, but do so in a way that adequately supports and does not actively preclude implementations of JavaScript and Python”. Talk about having a vision with sufficient focus!
I’m also not taking a jab at any other languages. Ruby, the Lisps, PHP and whatever other languages people like can be added to the list as well. JavaScript and Python are the two dukeleto mentioned and are two that I happen to think are as important as any of them, and are good candidates to be the ones we focus on. I would love to have a Ruby compiler on Parrot, and many others as well if people want to work on them.
If we increase performance by something like 50% and add a bunch of new features and Rakudo still leaves for greener pastures, at least we are that 50% faster and have all those new features. It’s not like making Parrot better for Perl6 somehow makes it instantly worse for other languages. Sure we are going to come to some decisions where moving in one direction helps some and hurts others, but the biggest things on our roadmap right now don’t require those kinds of hard decisions to be made. There is plenty of work to be done that brings shared benefit.
We want JavaScript. I know, because I’m working on it personally. We want Python too. Right now, we have Perl6 and we should want to keep it. We should want to do it as best as we can. Talk that involves distancing the two projects, or even severing the link between them is wrong and needs to stop. Saying things like “Well, the python people aren’t going to like a VM that is too tied to Perl” is self-defeating. We don’t have a working, complete Python compiler on Parrot and havent been able to put one together in a decade of Parrot development. We do, however, have a damn fine Perl6 compiler. If Puffin, a product of this summer’s GSoC, continues to develop and becomes generally usable and more mature, the conversation changes. If Jaesop, my new and extremely immature JavaScript compiler comes around, the conversation changes. But until we have those things, we do have Perl6 and that needs to be something we focus on.
There are two directions we can logically go in the future: We can do one thing great, or we can do many things well. We can focus on Perl6 and be the best damn Perl6 VM that can possibly be, or we can improve our game to support multiple dynamic languages, but not be the best with any. Both of these are fine goals, and I suspect what we want to do eventually lies somewhere in the middle. We do know what path JVM and CLR took, and where that got them. We also know what path Parrot has pursued for a decade, and where we are now because of it. I think the course of action should be very clear by now: So long as Perl6 is our biggest user, it needs to be our biggest source of motivation. Parrot is not so strong by itself that it can afford to ignore Rakudo or become more separate from it than it already is. Parrot might be that strong and mature one day, but that day isn’t today.
In direct, concrete language, this is what I propose: We need to focus as much effort as we can to be a better VM for Rakudo Perl6, including moving as much custom code from the NQP and Rakudo repos as possible into Parrot core to lower barriers and increase integration. We need to do that, trying to put priority on those parts of the code that are going to affect JavaScript and Python implementations and making the difficult decisions in those cases. When compilers for those languages become more mature and we start to run into larger discrepancies between them, we can start revisiting some decisions as necessary. Until then, Rakudo is our biggest user and beyond that they are friends and community members. We need to focus on their needs. We need to focus on making Rakudo better, and we need to focus on making Parrot better for Rakudo. Everything else will come from that, if we do our job well enough.
A few days ago I started the Jaesop project (formerly “JSOP”) to explore creating a JavaScript compiler on Parrot using bootstrapping. After only a few days of real effort I’m getting pretty darn close to having a stage 0 compiler ready for use.
The Jaesop stage 0 compiler, called js2wxst0.js translates JavaScript code to Winxed. It is not a full JavaScript compiler; instead it’s a useful subset of JavaScript which can be used for bootstrapping. Most of the syntax is supported, and the object model has acceptably faithful semantics. What I don’t have is complete support for all built-in object types and methods, or 100% complete syntax translation. Some things like the with keyword are not and will not be supported, for example. The compiler doesn’t currently handle some common bits of syntax like try/catch, switch/case, or a few other things. Many of the basics like operators, assignment, variables, functions, closures, and basic control flow (for, while, if/else) are working just fine.
Of course, if it did everything and was perfect, we wouldn’t call it “stage 0” we would just call it “the JavaScript on Parrot Compiler”. The stage 0 compiler isn’t the end goal, it’s just a tool we’re going to be able to use to make a better compiler. I’m not looking to make something perfect here, I’m trying to put together a bootrapping stage 0 as quickly as possible.
The stage 0 compiler architecture is very simple. The Jison parser outputs AST, which I’ve had to make only a handful of modifications to from the original Cafe source. Then, the AST is transformed into WAST, a syntax tree for creating Winxed code. Finally, the WAST outputs Winxed. Most of the code here is complete and working very well. Late this week I finished the basics of the object model, and then I updated the compiler to output correct code for the model, and just today I got the test suite working again with all the new semantics.
The test suite is up and running again, although it doesn’t have nearly enough tests in it to cover the work I’ve done until this point. The suite has tests written in Winxed and also tests written directly in JavaScript. The former is for testing things like the runtime, the later is for testing parsing and proper semantics. I want to increase coverage in both portions, because I’ve been dealing with a lot of aggravating regressions here and there as I code, and I want to make sure things get better monotonically from here.
Getting the test suite to work with the real JS object model was a little bit tricky. To get an example of why, here is a test I had in the suite prior to todays hacking:
load_bytecode("rosella/test.pbc");
Rosella.Test.test_list({
test_1 : function(t) {
t.assert.equal(0, 0);
},
test_2 : function(t) {
t.assert.expect_fail(function() {
t.assert.equal(0, 1);
});
},
test_3 : function(t) {
t.assert.is_null(null);
}
})
Basically, this was my first sanity test, to prove that I could call Rosella Test functions from JS code. Unfortunately, after I re-did the object model, this test was broken. It got broken because of a fundamental feature of JavaScript: methods are just attributes, except they can be invoked. So a call to this:
Rosella.Test.test_list(...)
After compiling to Winxed, looks like this:
var(Rosella.*"Test".*"test_list")(...)
The .* operator looks up an attribute by name. The var(...) cast pulls out the value of the attribute into a PMC register, and the parens at the end invoke it. Notice that Rosella.Test isn’t an object, it’s a namespace. So that code was broken. Also notice that JavaScript has a notion of a global scope. We haven’t explicitly declared a variable named “Rosella”, so Jaesop tries to do a global lookup:
var Rosella = __fetch_global("Rosella");
var(Rosella.*"Test".*"test_list")(...);
Also, inside the tests, the assertions are done with t.assert.equal(), etc. But that’s clearly wrong too, for all the same reasons. In short, the code was broken.
After some fixing and refactoring, I have the test situation all sorted out. Here is the same test today:
var t = new TestObject();
test_list([
function() {
t.equal(0, 0);
},
function() {
t.expect_fail(function() {
t.equal(0, 1);
});
},
function() {
t.is_null(null);
}
]);
The methods TestObject and test_list are defined in the test library as globals. TestObject is basically a JavaScript-ish wrapper around Rosella.Test.Asserter with all the same methods.
The test suite is working but does need to be expanded. I have a few more things to add to the compiler and runtime as well, which will be easier to do with better test coverage. I very much intend to have a working and usable Jaesop Stage 0 to release soon. Certainly it should be available by the Parrot 3.9 release, hopefully much earlier. With that available, I want to get started on Stage 1. Stage 1 doesn’t have to happen at the same blistering pace. In fact, I think it would be beneficial for us to wait until Parrot has 6model support built in, so we can start making the “real” object model using those tools instead.
The other day I quietly bumped the Rosella Template library to stable status. I’ve been living with the API for a while and am happy with it for the most part. I’ve also gotten documentation and unit tests up to a nice level and I felt pretty comfortable that it wasn’t a buggy, incomplete piece of crap. I don’t want to claim that it’s perfect, but I’m pretty happy with it as a first attempt. There are a few internal bits that are awkward and difficult to test, but the overall public-facing form of the library is working well enough.
It’s extremely easy to write up things like unit tests and documentation when I have a templating tool that can produce files for those kinds of things semi-automatically.
I’ve talked a lot about the templating library in two previous posts, and am planning a separate post for it later this week, so I won’t go into details about it here as well. As I start putting together more tools that use it, I’ll show those off so the reader can see the new library in action.
Instead, today I want to talk about some of the new Rosella libraries I have been playing around with. Some or all of these might become a stable part of Rosella some day too, but right now they aren’t quite ready for prime time.
Rosella.Assert, formerly known as “Contract” is a library for debugging. It provides a few interesting tools: Runtime assertions, debug logging, and contracts. All of these features read a global flag to determine if they are on or off. If off, calls to the various assert, debug, and contract routines all do nothing. The calls themselves aren’t completely removed, but they do short-circuit and exit early with no side-effects. Several people, especially GSoC students have asked for these kinds of debugging routines, so it’s about time somebody added them.
The library basically does nothing unless you turn it on, so it won’t interfere with your code at all unless you enable it. To turn on the Assert library, you do this:
var(Rosella.Assert.set_active)(true);
Without that line of code, most calls in the Assert library do nothing at all. The calls are still made, however, but they check the flag and immediately return if it’s not activated. Winxed does do dead code elimination from conditionals with constant expressions. That dead-code elimination along with a new __DEBUG__ constant NotFound added recently, means that you can make assertions disappear entirely if you want:
if (__DEBUG__)
var(Rosella.Assert.debug)("This message probably won't appear");
The Assert library provides assertions too. So you can start peppering your code with calls to the assert function:
using Rosella.Assert.assert;
assert(1 == 1);
Notice that the conditional here is executed before the assert function is called, so side-effects aren’t invisible. However, a different form of assertion takes a predicate Sub, which won’t be evaluated at all if the library is turned off:
using Rosella.Assert.assert_func;
assert_func(function() { return 1 == 1; });
It’s not very pretty, but it does what you expect it to do. If the Assert library is enabled and the assertion condition fails, an error message and backtrace are printed. Otherwise, nothing happens. Likewise, you can make the code disappear entirely using the __DEBUG__ flag.
The new library also provides contracts in two flavors: Object interface contracts, and function contracts. In the first, we verify certain features of an object: Does it have the necessary list of methods? Does it have the necessary attributes? We can use the contract to verify that the object has the expected interface, or throw an assertion failure if not. In the second type, we can insert predicates into an existing function or method, to do pre- and post-call testing of values. For instance, to assert that the first argument of a call to method bar() on class Foo is never null, we can set up this assertion:
var c = new Rosella.Assert.MethodContract(class Foo);
c.arg_not_null("bar", 0);
likewise, to verify that the method bar() never returns a null value, we can do the following:
c.return_not_null("bar", 0);
That will automatically inject predicates into the method, which will be checked every time the method is invoked, if the Assert library is enabled. If a check fails, you get an exception and a backtrace. If you turn off the library, no checking happens and method calls happen like normal with no interference or slowdown.
One last detail, the Assert library integrates with the Test library, if you have them both loaded together. If you use Assert in a test suite using Rosella Test, you can use assertions and contracts as tests directly, and things printed out with debug() are printed out as normal TAP diagnostics. It’s quite handy indeed, because if you have put the assertions and contracts into your code, you now have test code running inside your program, testing things that would otherwise be hard to get to. Setting up predicate assertions on methods in your code is a handy and useful replacement for mock objects, if mocks aren’t your kind of thing.
This library has a lot of potential to be used as a debugging and testing aide, and I expect to be using it a lot in my own work once I get some of the last details sorted out.
Rosella.Reflect is a library for doing easy, familiar reflection. Basically, it’s an abstraction over methods and tools already provided by Parrot’s built-in types, but with a nicer interface. It is the same motivation as I had for the FileSystem library, which is like a much nicer veneer over the OS PMC and a handful of other lower-level file-manipulation details provided by Parrot. Right now Reflect is an early-stage plaything, but it’s already looking nice and I have plenty more things to add to it.
With Rosella.Reflect, you can do things like this:
var f = new Foo();
var c = new Rosella.Reflect.Class(class Foo);
var b = c.get_attribute("bar");
b.set_value(f, "hello"); # Set Foo.bar = "hello"
var m = c.get_method("baz");
m.invoke(f, 1, 2, 3) # f.baz(1, 2, 3)
var x = new Bar();
m.invoke(x, 1, 2, 3) # Error, x is not a Foo
Basically, it provides type-safe object wrappers for classes, attributes and methods. It provides routines for iterating attributes and methods, dealing with the indirectly, and doing other reflection-based tasks too.
In the future I want to add tools for building classes at runtime, tools for exploring packfiles, namespaces, and lexicals, and doing a few other things. This library is pretty heavily influenced by some of the things Parrot user Eclesia has been doing with his Eria project. It gives a nice object-based way to interact with some things in Parrot that don’t always have the most friendly interfaces.
This library is very very young and is mostly a prototype. I am looking for more things to add to it, and hope it will become more generally useful. It’s complicated by the fact that the upcoming 6model refactors could radically change the way we do some types of reflection, so I don’t want to reinvent any wheels.
Rosella.Dumper is a replacement for the Data::Dumper library that ships with Parrot. It uses an OO interface with pluggable visitors and configurable settings. It’s very early in development but it’s already much more functional and usable than Data::Dumper. With the new interface, you can do something like this:
var dumper = new Rosella.Dumper();
string txt = dumper.dump(obj);
say(txt);
The Dumper object contains several collections of DumpHandler objects, which are responsible for dumping out particular types of object. DumpHandlers are arranged into 4 groups: type-based dumpers that dump objects of specific types, role-based dumpers which dump objects that implement a given role, miscellaneous dumpers which are given the opportunity to dump anything else, and special dumpers for things like null and anything that falls through the cracks. By mixing and matching the kinds of things you want to see dumped, you can customize behavior. By subclassing the various bits, you can change behavior and output formatting.
This library is pretty straight forward, and is already pretty generally useful. I have a few decisions left to make about the API and some of the default settings, but it’s useful and usable now and I’ve already employed it myself for several recent debugging tasks.
This is the newest prototype library of all. So new that as of the publishing of this blog post I haven’t pushed the code to github yet. Rosella.CommandLine is a library for working with the command line and command line arguments in particular. Basically, it’s a replacement for the GetOpt::Obj library which comes with Parrot, along with a few other features for making program entry easier. To give a short example, here’s a test program I’ve been playing with using the new library:
function main[main](var args) {
var rosella = load_packfile("rosella/core.pbc");
var(Rosella.initialize_rosella)("commandline");
var program = new Rosella.CommandLine.Program(args);
program.run(real_main);
}
function real_main(var opts, var args, var other) {
...
}
The main function initializes Rosella and loads in the CommandLine library. Then it creates a Rosella.CommandLine.Program object, to handle the rest of the details. The Program object takes the program arguments and automatically parses them out into a hash and some arrays based on some basic syntax rules. You can specify formats like you do in GetOpt::Obj if you want, or the library can parse them by default rules and just pass you the results. The run method of the Program object takes a reference to a Sub object to treat like the start of your program. It sets up a try/catch block to do automatic error catching and backtrace printing. Also, it can be used to dispatch certain arguments to different routines entirely, which is useful if you need to set up routines for printing out help or version information:
program.run(real_main, {
"help" => function() { ... },
"version => function() { ... }
);
The argument processing is done by a new Rosella.CommandLine.Arguments class, which mimics much of the behavior of GetOpt::Obj, but has a few subtle differences, which are partly because it’s early in the implementation and partly because I like certain syntaxes better than others. Also, if you return an integer from your real_main routine, that integer will be the exit code of the process, which should be familiar for most C coders and their ilk. If you return no value, or if you use the exit opcode, things will continue to behave as you would expect. As with all Rosella libraries, there will be plenty of opportunity for subclassing and customization, so if you need something different from the provided defaults, it will be easy to change things.
I have a couple other projects in mind that I want to start playing with, some in Rosella, and some that intend to use Rosella to build bigger things. I’ll certainly share more information about whatever else I am planning in future blog posts.
I was looking at the CorellaScript project the other day, and wanted to try to tackle the same problem in a different way. This isn’t an insult against CorellaScript, but I know a little bit more today than I did at the beginning of the summer and some of our tools have progressed along further than they had at the point when CorellaScript was designed and started. I wanted to see if we could convert to Winxed as an intermediary language, since Winxed is syntactically similar to JavaScript already in some ways, and since Winxed already handles most of the complicated parts PIR generation.
My idea, in a nutshell, is this: We create a JavaScript to Winxed compiler in JavaScript, using Jison and Cafe. Jison is an LALR parser generator written in JavaScript, and Cafe is an old project to use Jison to compile JavaScript into JavaScript. At first, compiling JavaScript to itself doesn’t sound like such a great thing to do, but if we do some basic tree transformations and make a few tweaks to the generated code suddenly it’s producing Winxed instead of producing JavaScript. Now all we need is an object model and a runtime, and we have a basic stage 0 compiler for JavaScript on Parrot.
Over the weekend, when we were trapped in doors because of the hurricane, I put some of these ideas to the test. By the end of the weekend I had a new project called JSOP (JavaScript-On-Parrot. It’s a lousy name. I need a better one). Today, the stage 0 JSOP compiler is parsing a decent amount of basic JavaScript and has a small test suite. JavaScript doesn’t have classes like other languages do, so I had to add in some support to Rosella.Test to handle javascript tests. Now that I’ve done that, we can use Rosella.Test to write tests for JavaScript. Here’s an example test file that I just committed:
load_bytecode("rosella/test.pbc");
Rosella.Test.test_list({
test_1 : function(t) {
t.assert.equal(0, 0);
},
test_2 : function(t) {
t.assert.expect_fail(function() {
t.assert.equal(0, 1);
});
},
test_3 : function(t) {
t.assert.is_null(null);
}
})
That test file compiles down to the following Winxed code:
function __init_js__[anon,load,init]()
{
load_bytecode('./stage0/runtime/jsobject.pbc');
}
function __main__[main,anon](var arguments)
{
try {
load_bytecode('rosella/test.pbc');
Rosella.Test.test_list(new JavaScript.JSObject(null, null, function (t) {
t.assert.equal(0, 0); }:[named('test_1')], function (t) {
t.assert.expect_fail(function () {
t.assert.equal(0, 1); }); }:[named('test_2')], function (t) {
t.assert.is_null(null); }:[named('test_3')]));
} catch (__e__) {
say(__e__.message);
for (string bt in __e__.backtrace_strings())
say(bt);
}
}
Formatting is kind of ugly right now, but it does the job. Executing this file produces the TAP output we expect:
<jsop git:master> ./js0.sh t/stage0/01-rosella_test.t
1..3
ok 1 - test_1
ok 2 - test_2
ok 3 - test_3
# You passed all 3 tests
So that’s not a bad start, right?
The stage 0 JSOP compiler is very simple, and I hope other people will want to hack on it. I’ve borrowed, with permission, code from the Cafe project to implement the parser. Cafe comes with a JavaScript grammar for Jison already made, and some AST node logic. I added a new tree format called WAST which is used to generate the winxed code. I modified the Cafe AST to produce WAST, and deleted all the other code generators and logic from Cafe.
It all sounds more confusing than it is. The basic flow is like this:
JavaScript Source -> Jison AST -> WAST -> Winxed
Winxed converts it to PIR, and execution continues from there.
So what still needs to be done? Well, lots! I’ve only implemented about 25% of a stage 0 compiler, so most of the syntax in JavaScript is not supported yet. I’ve only implemented enough to get a basic test script running (functions, closures, “new”, string and integer literals, etc). Basic control flow constructs and almost all the operators are not implemented yet. I’ve also implemented a basic runtime, but I don’t have any of the built-in types like Arrays yet, or most of the nuances of the prototype chain, etc.
The ultimate goal is a bootstrapped JavaScript compiler. Once we have stage 0 being able to parse most of the JavaScript language and execute it, we need to create a stage 1 compiler written in JavaScript. It can borrow a lot of code from Stage 0 (including the Jison parser). For that, we’re going to need PCRE bindings, among other runtime improvements. When we can use Stage 0 to compile a pure JS stage 1 compiler, we self host and it’s mission accomplished. We’ve got a long way to go still, but I think this is a promising start and I’m happy with the quick rate of progress so far. I’m looking for people who are interested in helping, so please get in touch (or just create a fork on github) if you want to help build this compiler.
The hard "pencils down" date was Monday, so now seems like a good time for a blog post summarizing what I ended up completing.
I have DPDA generation and parsing working for LR(0) and SLR(1) grammars. I have the beginnings of a grammar specification DSL (a grammar, but no actions or tokenizer yet; it's in the dsl branch). I do not have support for LALR(k) grammars or general LR(k) grammars. I have not implemented generating code to parse grammars (as opposed to interpreting the DPDA in order to parse them).
“When will Perl 6 be production ready?” ? they ask from time to time. I know the feeling, there was a time I wanted to know too, and after a year working on Rakudo, I can truly say,
I have no freaking idea!
I’d really like to tell you, seriously. If you ask #perl6, they will start tricking you into thinking that it’s ready enough and they’re actually using it, right? Tricky bastards. But, what do you actually ask for? What is this mighty Production Ready?
I dedicated some thinking to this today. What makes something Production Ready? I can think of two possibilities
The first one is a bit tricky to achieve when it comes to Perl 6. As we know, Perl 6 is a language. How can language be Production Ready? Think, think. Is there another example of something which is rather a spec than an end-user product, and is either not declared as finished, or the spec freeze date is ridiculously far in the future? Right, it’s HTML5. Spec is a draft, it’s nowhere near finished, and neither of the implementation implement all of it. So what makes HTML5 production-ready? I don’t think it’s declared ready by its creators. It’s that people didn’t bother with official opinions and started actually solving problems with it. Took the existing implementations and made use of it. Therefore, we can safely assume that by “Production Ready Perl 6” we really mean “A Perl 6 Compiler I can use to get the job done”. So what are the current compilers lacking for the majority of people?
Yes, I’m asking you. You don’t really know, do you? You didn’t even try them? It’s just that people don’t use them too often, so they’re probably crap, right? Ok, there’s some logic in that.
There is a possibility that Perl 6 is already capable of solving your problems. You should try it. But! Enough of the advertising business, I’m wondering here.
“So what is your Production Ready?”, you may ask. What do I expect from Perl 6 before it will be Production Ready for me? It’s not, I’m not gonna lie. It’s solving my problems, it pays my bills, but it lacks this Something that will make it Purely Awesome. In my opinion, there are two major things we’re missing:
That’s it. I can live without most of the things. But what I’m really looking for, is a better Perl 5. It needs CPAN, and it needs to be less slow that it is. I’m not looking for a C performance. I could live with Perl 5 performance here probably.
That’s what I’m missing. And what is Your Production Ready?
I've spent the last few days cleaning up my branch: adding documentation, checking that it passes the code standard tests, and trying compiling Rakudo nom. Don't get too excited, we can't compile nom directly to bytecode. Heck, it can't compile squaak directly yet. But I wanted to make sure that all my tinkering hasn't broken the original PIR generation path.
Actually, I suppose the flight has really just begun. It's true that GSoC is nearly at its end but, ironically enough, it doesn't really feel like the end but more of a new beginning. A new debug data format is in the works and it has so much potential!
While I didn't implement everything I thought I would, there is now a basic framework for bytecode generation in the nqp_pct branch. I'm uncertain if it should be merged... While the bytecode generation doesn't fully work, it doesn't interfere with the existing usage of PCT and does have the nice feature that PAST::Compiler is now written in NQP for ease of hacking. I'll leave that up to the rest of the community to decide. The rest of this blog post is the contents of docs/pct/bytecode.pod, which I hope will be helpful if anyone want to explore what I've been working on all summer.
The good parts
Three months ago, I was a programmer who knew how to programme in javascript and few other languages but nothing about what is going inside the compiler which drives a language but now I know what makes javascript so dynamic and powerful ,I know how code is read, converted into tokens and the formation of Syntax tree.
It's been a while since I've posted - real life has been crazy with moving and setting up and finals. That's my fault because I didn't take into account the fact that I would be moving nor my finals being right around the time GSoC would be ending. The good news is that I've moved and setup my place (hooray for having my own office) and my classes are done - I've written the last research paper and taken my last final for my masters.
YAPC::EU 2011 in Riga has just about finished, and it has been great seeing long-time friends again and making new ones. I’ve heard many people remark that we really wish there could be more weeks like these.
There are two items that stand out in my mind about this year’s conference:
1. Andrew Shitov and his crew are absolutely amazing at organizing and running a conference. This was the most flawlessly executed conference or event I think I’ve ever been to. Not only that, but Andrew and the other organizers made it look effortless, which to me is a mark of true greatness. I’m certain that in fact there was a lot of planning and effort behind it, but the entire team just looked relaxed and at ease throughout the event. I’d definitely encourage folks to attend any event that Andrew and this group organizes.
2. Riga is a stunningly beautiful place. I definitely want to return here again some day, and I’m grateful that the organizers chose this location.
Pm
e.g.
var arr = []
results in
.HLL "corella"
.include "corella_system.pir"
.sub 'main' :main
Yesterday I posted something of a rant about the poor state of our Sub and NameSpace PMCs. The problems in these two PMCs, and resulting problems caused by them in other places, are really symptoms of a single problem: That the packfile loader blindly inserts Sub PMCs into the corresponding NameSpace PMCs at runtime, and leaves the NameSpace PMC to sort out the details.
I would like to show you the three offending pieces of code that really lead us down this rabbit hole. The first snippet is from the heart of IMCC, the code that adds a namespace to a Sub during compilation:
ns_pmc = NULL;
if (ns) {
switch (ns->set) {
case 'K':
if (ns_const >= 0 && ns_const < ct->pmc.const_count)
ns_pmc = ct->pmc.constants[ns_const];
break;
case 'S':
if (ns_const >= 0 && ns_const < ct->str.const_count) {
ns_pmc = Parrot_pmc_new_constant(imcc->interp, enum_class_String);
VTABLE_set_string_native(imcc->interp, ns_pmc,
ct->str.constants[ns_const]);
}
break;
default:
break;
}
}
sub->namespace_name = ns_pmc;
In this snippet, ns is a SymReg* pointer. Basically, it’s a pointer to a parse token representing the NameSpace declaration. ns->set is the type of declaration, 'K' for a Key PMC and 'S' for a STRING literal. The two forms are written in PIR as:
.namespace "Foo"
.namespace ["Foo"]
Actually, I don’t know if the first is still valid syntax, so the 'S' part of this block might be dead code. We can dig into that later, it’s not really important now. The last line of the snippet sets the namespace_name attribute on the Sub PMC to be either a Key PMC or a String PMC containing the name of the namespace.
Notice that the Sub PMC has attributes namespace_name and namespace_stash, both of which are PMCs. The namespace_name is populated at compile time and is used by the packfile loader to create the NameSpace PMC automatically. A reference to that NameSpace PMC is stored in namespace_stash. Thereafter, namespace_name is rarely used. We can definitely talk about ripping out one of these two attributes, if we can’t make bigger improvements in a reasonable amount of time. In the long run, both will be gone.
The second snippet I want to show you is inside the packfile loader:
for (i = 0; i < self->pmc.const_count; i++)
self->pmc.constants[i] = PackFile_Constant_unpack_pmc(interp, self, &cursor);
for (i = 0; i < self->pmc.const_count; i++) {
PMC * const pmc = self->pmc.constants[i]
= VTABLE_get_pmc_keyed_int(interp, self->pmc.constants[i], 0);
if (VTABLE_isa(interp, pmc, sub_str))
Parrot_ns_store_sub(interp, pmc);
}
I’ve removed comments for clarity (No, that’s not some kind of a joke. There were comments in this snippet, and they are not helpful for understandability).
In this snippet we first loop for the number of PMC constants, unpacking and thawing each from the packfile. Then in the second loop we loop over all of them again, to see which, if any, are Subs. For any Subs, we call Parrot_ns_store_sub to read the namespace_name out of the Sub PMC, create the NameSpace if necessary, and then insert that Sub into the namespace. That brings me to my third snippet of code:
ns = get_namespace_pmc(interp, sub_pmc);
/* attach a namespace to the sub for lookups */
sub->namespace_stash = ns;
/* store a :multi sub */
if (!PMC_IS_NULL(sub->multi_signature))
store_sub_in_multi(interp, sub_pmc, ns);
/* store other subs (as long as they're not :anon) */
else if (!(PObj_get_FLAGS(sub_pmc) & SUB_FLAG_PF_ANON)
|| sub->vtable_index != -1) {
STRING * const ns_entry_name = sub->ns_entry_name;
PMC * const nsname = sub->namespace_name;
Parrot_ns_store_global(interp, ns, ns_entry_name, sub_pmc);
}
This snippet comes to us from the heart of the Parrot_ns_store_sub routine I mentioned above. This is basically a duplication of some logic found in the NameSpace PMC. My first thought was that the duplicated code paths should be merged. My second thought is that they both need to just be deleted. In this snippet, we call get_namespace_pmc to find or create the NameSpace that suits the current Sub. If the Sub is flagged :multi, we add it to a MultiSub instance in that namespace. Otherwise, so long as the Sub isn’t flagged :anon or :vtable, we add it to the NameSpace.
So those are the three bits of Parrot that are causing all these problems. These are the three bits that really encapsulate what the problem with Sub is, why it’s so bad, why Subs eat up so much memory, and why load performance on packfiles is so poor. These three, small, innocuous snippets of code are really the root of so many problems. This is all it takes.
What is really being done here? IMCC has all the compile-time information, and as it’s going along it’s collecting that information in a parse tree of SymReg structures. A SymReg representing a Sub contains a pointer to a SymReg for the owner namespace, a SymReg for the string of the Sub’s name, a SymReg for each flag, etc. When it comes time to compile the Sub down to bytecode, it creates a Sub PMC and dutifully stores all this information from the .sub SymReg into the Sub PMC. After all, all that information is together when we parse and compile it, so storing it all together in the same place for storage does seem like a reasonable thing to do. We store the Sub PMC in the constants table of the packfile and move on to the next Sub.
On startup then, since we have all these Sub PMCs and each contains all the necessary data to set themselves up, we read out all this data and start recreating the necessary details: NameSpaces, MultiSubs, etc. This all makes some good sense as well, in theory. In practice, we end up with lots of bits of data (the Subs) tasked with recreating the containers in which they are to be stored (the NameSpace, Class, and MultiSubs). This means that each Sub needs to contain enough information to possibly recreate all the possible containers where it could be stored (NameSpace, Class, and MultiSub), and that is extremely wasteful.
Here’s the big question I have: Why are we creating the NameSpace and the MultiSub PMCs at runtime? Why don’t we create them at compile time, and simply have them available and ready to use already when we load in the packfile?
Here’s an alternate chain of events to consider. When IMCC reaches a .namespace directive and creates the associated SymReg, we should also immediately create a NameSpace PMC. Then when we create .subs in PIR, we can add each to the current NameSpace, instead of storing the namespace SymReg reference in the Sub. When we go to create the bytecode, we serialize and store all the NameSpace PMCs, which will recurse and serialize/store all their component Subs. The same thing goes with MultiSubs: When we find a :multi flag, IMCC should create a MultiSub PMC immediately instead of a Sub PMC, and perform the mergers at compile time. When we serialize the NameSpace, it recursively serializes the MultiSubs, which recursively serializes the various component Subs. One other thing we are going to want to do is create Class PMCs, or some kind of compile-time stand-in. This way we are able to handle :method Subs easily, by inserting them into Classes at compile time, not runtime. There are some complications here, so I’ll discuss them in a bit.
The real beauty of this change shows up on the loader side: When we load the packfile, we loop over and thaw all the PMCs. Then….We’re done. Maybe when we load a NameSpace we need to fix it up and insert it into the NameSpace tree. Also, when we load a NameSpace with the same name as a NameSpace that already exists in the tree we need to merge them together. That’s a small and inconsequential operation, especially considering how much other code we’re going to delete or streamline.
Here are some problems that we are going to fix:
VTABLE_isa on every PMC to see if it’s a Sub. If so, we can avoid logic for inserting the Sub into an auto-created NameSpace, the storing it in weird ways depending on flags, and doing all that other crap.namespace_stash, namespace_name, vtable_index, method_name, ns_entry_name, multi_signature, and probably comp_flags too. Some of these are going to require deprecations, and some of them are going to require us to provide workarounds in certain areas.The issue of Classes and Methods is a little bit tricky because looming overhead are the impending 6model refactors. The :method flag on a Sub really performs two tasks: First, it tells IMCC to automatically add a parameter to the front of the list called ”self”. Second, it tells the NameSpace to store the Sub, but to do so in a super-secret way that only the Class PMC can find. The first use I’ve always felt was nonsensical, especially when you consider requests to add a new :invocant flag for parameters to manually specify the name of the invocant, and the fact that the automatic behavior of it creates a large number of problems especially with respect to VTABLEs. In my new conception of things, NameSpace won’t be storing methods anyway, so the second use of the flag disappears completely. I say we deprecate it soon and remove it entirely at the earliest possible convenience.
In the near-term, I think the :method flag is going to be used to create a ProtoClass or an uninitialized Class PMC at compile time. Then, when we call the newclass opcode at runtime it will look to see if we have a Class (or ProtoClass) by that name and if so will return the existing object, marking it as being “found” to prevent calling newclass again on it. That sequence of events is a little bit more messy than I would like (again, I would like to avoid the :method flag entirely and do class creation either entirely at compile time or entirely at runtime, but maintaining compatability with current syntax may lead in a more messy direction). I’ll explore some of these ideas in a later post.
I’ve created a branch locally to start exploring some of these ideas, and will push it to the repo when I have something to show. Depending on how this project lends itself to division, I may end up with many small atomic branches or one large overwhelming one. Both paths are equally plausible. I’m going to start playing around with some code, and when I find something that seems like a reasonable stopping point I’ll try to merge it if reasonable.
OTTO. Apes don't read philosophy. WANDA. Yes they do, Otto. They just don't understand it. Now let me correct you on a couple of things, OK? Aristotle was not Belgian. The central message of Buddhism is not 'Every Man For Himself'. OTTO. You read... WANDA. And... the London Underground is not a political movement. Those are all mistakes, Otto. I looked them up. Now. You have just assaulted the one man who can keep you out of jail and make you rich. So, what are you going to do about it, huh? What would an intellectual do? What would Plato do? -- A Fish Called Wanda, by John Cleese.
With that, I'd like to apologize for Parrot 3.7.0, also known
as "Wanda". Parrot is a virtual machine aimed
at running all dynamic languages.
Parrot 3.7.0 is available on Parrot's FTP site, or by following the
download instructions. For those who would like
to develop on Parrot, or help develop Parrot itself, we recommend getting
the latest and best Parrot code from github.
The soft deadline has passed and the hard deadline is not far away. Soon, what is likely to be my last GSoC will be over. And it was great! These three years I've had the summer job of my dreams. I worked on projects I was passionate about, using tools I liked and with people I liked.
Well, GSoC is starting to wind down. I can't believe it's almost over. It feels like the "pencils down" date just jumped up out of nowhere. I had a lot more planned for HBDB but there are many flaws in Parrot's design that make even some of the most basic debugging tasks very difficult which I'll explain in a moment.
As a Winxed user, I haven’t made a heck of a lot of use of Parrot’s MMD features. I’ve used it in NQP, but the details are sufficiently abstracted in that language that you don’t really get the feel for what is occuring at the lower levels. Since the feature is so messy, I’ve made some effort to avoid using it at the PIR level. Let me rephrase that. I’ve made some effort to avoid PIR entirely.
As I mentioned in a previous post, I’ve been working on adding MMD support to winxed. To really get a handle on multiple dispatch, I did what I always do: I went right to the source. I opened up the MultiSub PMC, which is the primary user-visible entry way into the multiple dispatch system. What I found there was…underwhelming. The MultiSub PMC is not descended from the Sub PMC. It’s basically an array which does basic type-checking on insert operations (push_pmc, set_pmc_keyed_int, etc) to ensure that objects being added to the array are indeed Sub PMCs. Actually, it wasn’t even consistent, some of the insert vtables checked whether the PMC was a Sub, other insert vtables checked whether the PMC satisfied the “invokable” role. While similar, the two checks will allow different PMCs. I found several vtables that were redundant and unnecessary, and I found a few other problems as well.
Combine that with what I know about the shortcomings of the Sub PMC, and nasty code in the associated subsystems (src/sub.c, src/multidispatch.c, etc), and I think we have a major problem on our hands. In this post, which could potentially turn into a long series of posts, I’m going to talk about some of the problems with Subs and the way I plan to fix them. MultiSub is one of the pieces that is going to come along for the ride.
I’m planning to make several fixes to the systems I talk about below, although exactly what I am going to fix and how I am going to do it are still up in the air. Feedback and suggestions, as always, are appreciated. I know that what we have is bad enough to need fixing, even if I don’t currently know all the best ways to proceed.
Want me to prove to you that we have a major problem? Here is the complete, unadulterated list of attributes for the Sub PMC:
ATTR PackFile_ByteCode *seg; /* bytecode segment */
ATTR size_t start_offs; /* sub entry in ops from seg->base.data */
ATTR size_t end_offs;
ATTR INTVAL HLL_id; /* see src/hll.c XXX or per segment? */
ATTR PMC *namespace_name; /* where this Sub is in - this is either
* a String or a [Key] and describes
* the relative path in the NameSpace */
ATTR PMC *namespace_stash; /* the actual hash, HLL::namespace */
ATTR STRING *name; /* name of the sub */
ATTR STRING *method_name; /* method name of the sub */
ATTR STRING *ns_entry_name; /* ns entry name of the sub */
ATTR STRING *subid; /* The ID of the sub. */
ATTR INTVAL vtable_index; /* index in Parrot_vtable_slot_names */
ATTR PMC *multi_signature; /* list of types for MMD */
ATTR UINTVAL n_regs_used[4]; /* INSP in PBC */
ATTR PMC *lex_info; /* LexInfo PMC */
ATTR PMC *outer_sub; /* :outer for closures */
ATTR PMC *eval_pmc; /* eval container / NULL */
ATTR PMC *ctx; /* the context this sub is in */
ATTR UINTVAL comp_flags; /* compile time and additional flags */
ATTR Parrot_sub_arginfo *arg_info; /* Argument counts and flags. */
ATTR PMC *outer_ctx; /* outer context, if a closure */
Have you gone cross-eyed yet? Are you as infuriated by this as I am? If not, continue reading. If so, continue reading for the lulz.
Here’s a question for you: How does Parrot implement closures? Parrot implements closures by taking a Sub, cloning it, and setting a pointer to the parent Sub’s active CallContext in the ->outer_ctx field of the child Sub. In other words, a Closure is not it’s own type of thing. It’s a Sub, but with one extra field set in it. Closures are basically ordinary Subs except for one detail: A closure has an outer lexical scope which it can search through to find values of lexical variables. Why every Sub needs to include that ability is beyond me. Closure should be either a subclass of Sub or, if we want more flexibility, it should be a mixin.
What’s the difference between an ordinary Sub and a vtable override? Well, the vtable override has an index value set in ->vtable_index. What if we have a single Sub that we would like to use for two separate vtable slots? What if we have a single Sub, for something like set_pmc_keyed and set_pmc_keyed_str, and we want PCC to automatically coerce arguments from string to PMC to share a single implementation? The result is major fail. It simply doesn’t work. It’s a reasonable idea, but Parrot absolutely does not and can not support it. At least, not right now.
Let me ask you another question: What is the namespace_stash? And, more importantly, why does the Sub need to know where or how it’s being stored? Keeping track of it’s own contents is the business of the NameSpace PMC, not the job of the Sub PMC. What if we want to reference a single Sub from multiple namespaces? Or, what if we want to reference a single Sub by multiple names within a single namespace? What if we want the Sub not to be automatically stored in any namespace at all? Suddenly, namespace_name isn’t looking too smart either. If your answer to any of these questions above involves cloning the Sub PMC, that answer is just wrong. Why should we have to clone a Sub, just so we can store a reference to it in two separate places? It’s not the job of the data to keep track of the container, it’s the job of the container to keep track of the data.
Similarly, isn’t it the job of the MultiSub to keep track of the Subs and their corresponding signatures? I mean, what if I have this Sub:
.sub 'Foo'
.param pmc all_args :slurpy
...
.end
…And I want that Sub to be called from a MultiSub for multiple different signatures, only redirecting certain variants to specific alternatives? If the MultiSub were some sort of hash or search tree instead of a dumb array, and if it kept track of the signatures associated with each Sub instead of asking the Sub to keep track of it’s own signature, we gain all that flexibility. Also, I suspect, there are performance wins to be had if we break a signature key up into a search tree or search graph and traverse it instead of doing an in-place manhattan sort on sig lists every time we call the MultiSub. It’s absolutely absurd that when you store a Sub in a MultiSub, the Sub tells the MultiSub how to store itself. How untenable and unmanagable is it, in the long run, to have a system where values tell the containers they are stored in how they need to be stored and organized? Very, that’s how much.
Basically, I’m saying we should change this:
push multi_sub, my_sub
To this:
multi_sub[signature] = my_sub
The user can pick the signature, and can reuse a single sub for multiple ones.
When you compile a PIR file with IMCC, IMCC collects all the relevant information together and jams it all into a single place: The Sub. When Parrot loads in a packfile, it reads each Sub entry, uses the namespace information therein to recreate the NameSpace tree, and inserts Subs into the proper namespaces. Then, when we create a class, the Class PMC searches for the NameSpace with the same stringified name, and pulls all the methods out of it Keep in mind that namespaces aren’t supposed to hold methods at all, so the list of methods in the namespace has to be kept separate and hidden until the class (and only the Class) asks for it. At that point, since the NameSpace is itching to dump off the responsibility, it deletes its own copy of the list as soon as it is read. We insert things into the NameSpace that don’t belong there, and we ask the NameSpace to carefully ignore some things, and store other things but to do so in a secret, hidden way. Awesome!
Similarly, when Parrot loads in a packfile and inserts Subs into the NameSpace it’s the job of the NameSpace to automatically and invisibly insert Subs with similar names and the :multi flag set into new MultiSub containers. The Sub tells the NameSpace how the NameSpace must store the Sub, under which names, and in which locations. Then if there’s a :multi involved the Sub tells the MultiSub how to do it’s job too. The Sub sure is bossy, and even if you’re a fan of centralized control in a generalized philosophical way you have to admit that the results here are…less than spectacular. If you set a Sub with the same name but without the flag set, the NameSpace overwrites the old one. But if the flag is set, the two are merged together into a single MultiSub. So here’s yet another question for you:
# What happens here?
my_namespace["foo"] = $P0
In this short example, assuming we don’t know where my_namespace comes from or what it previously contains, what happens? Luckily we have some easy rules to follow to figure this out:
$P0 is any type of PMC except a Sub, a MultiSub, or a NameSpace, it’s stored as a global overwriting any existing global by the name "foo".$P0 is a user-defined subclass of Sub, it’s treated differently in ways I don’t seem to understand. The code is there, but when I try to trace it, I weep.$P0 is a Sub with the :method flag, it will be stored in a separate, secret hash of methods, to be added to the Class of the same name when the class is created. UNLESS the type in question is a built-in type with a PMCProxy metaobject instead of a Class metaobject, then the exact sequence of events is mysterious and uncertain, because built-ins can be instantiated and used before the associated PMCProxy is ever created, so there is no single way to fetch all the methods from the NameSpace at once. I think the implementation of the Parrot default PMC automatically looks in the namespace whether the PMCProxy has been instantiated or not. I don’t know the details, and I really don’t want to know.$P0 has the :multi flag set, it will get merged into a MultiSub, together with a previous Sub of the same name, if any. Unless the previous entry is not also a :multi, then it overwrites. If there is no existing MultiSub PMC or any value of the same name, a new MultiSub PMC is automatically created for it.$P0 has the :vtable flag set, it will also get stored away in a super-secret location, to be grabbed by the Class when necessary, with all the same caveats as I mentioned for the :method flag, above.$P0 is a :method or a :vtable with the extra :nsentry flag set, Then it is stored in the namespace anyway, in addition to being stored in a way that is fetchable by the Class or PMCProxy.$P0 is a NameSpace, it’s stored in the my_namespace as a child, and becomes a searchable part of the NameSpace tree in a way that does not interfere with a non-namespace object of the same name, if any. The exact mechanism for doing this involves creation of large numbers of unnecessary GCable PMCs, and the tears of children.This all sounds like the best, most well-thought-out, best designed and best implemented solution, doesn’t it? And there isn’t a hint of magic or confusion anywhere in sight.
All of those things, every last bit including the problems with the Sub PMC containing too many unnecessary attributes, are all symptoms. The single underlying problem that necessitates all of this crap is that the packfile loader automatically creates NameSpaces and automatically inserts Sub PMCs into them when a packfile is loaded. When you jam a bunch of stuff into the NameSpace automatically, without consideration for where it really belongs, you’re forced to insert a bunch of logic inside the NameSpace to deal with it. Take that away, force the packfile loader to stop jamming data where it doesn’t belong, and suddenly all the crap I mentioned above goes away. Piles and piles of the foulest, most garbage-ridden code I have ever seen evaporates away into a fine mist of unicorn farts. I say good riddence.
So what’s the alternative? Well, 6model doesn’t use the NameSpace as intermediate storage for methods. When you create a class with 6model, you get individual references to the Subs you want and you insert them, by name and static reference, into the Class. Reuse the same Sub as much as you want. We can extend this idea even further too, by applying it to MultiSubs. If you want a MultiSub, create one yourself and insert the functions you want into it. For that matter we can extend the idea all the way to NameSpaces themselves. Parrot shouldn’t automatically create or populate any NameSpace PMCs. None. Not ever. The user can create and populate themselves. For all these things we can either do the creation at runtime, or we can do the creation at compile time and serialize the whole Class/MultiSub/NameSpace into the packfile as well. If we don’t want it, Parrot won’t force it upon us automagically.
HLLs like Winxed or NQP-rx or anything else can be modified to generate the necessary instantiation code in the generated PIR output, instead of relying on Parrot to do it for us. There’s a good chance that this approach could be more performance-friendly, because we would do less moving of data, less calling of methods on startup and packfile loading, and less of other unnecessary operations as well.
I think this sounds like a much better system, personally, and it’s a direction I want to start working towards. The ultimate goals are as follows:
What I want to impress upon you, the reader, with this post is that the Sub PMC is extremely poorly designed. Since it’s the lynchpin of Parrot, the thing that makes control flow work, that’s a pretty bad thing to get so horribly wrong. I don’t want to say that I have all the designs and solutions to fix it just yet, but this issue is squarely in my crosshairs right now and you can expect some movement on this issue in the near future.
Here’s a snippet of Winxed code I was running a few days ago on my machine:
function Foo(int a, int b) { ... }
function Foo(string s) { ... }
function main[main]() {
Foo(1, 2);
Foo("Hello");
}
Notice anything interesting about it? Up until now, Winxed hasn’t supported multiple dispatch. NotFound didn’t really use the feature and users haven’t been asking for it too loudly, so it never got implemented. However, now that I’m pretending to be a Winxed super haxxors, I decided to take a stab at it. As of yesterday evening, I have a working prototype and am doing some testing and tweaking to get a pull request ready.
This isn’t full-featured MMD, yet. Parrot’s MMD system allows you to dispatch based on types and inheritance and there are wildcards. The current Winxed implementation I’m playing with only dispatches on the four primary register types. We read the parameter list and, if there are multiple functions with the same name in the same scope, we convert them to multis.
It’s a simple patch and just the first step towards getting full Multi support. The hardest part about moving forward is not the implementation (I’ll reiterate, Winxed is pretty easy to hack on), but instead picking the syntax we want to use to specify options.
NotFound has been away, and I don’t think that this will get merged in to master or pulled into the Parrot repo before the release tomorrow. If it passes code-review muster, maybe it could be in place shortly therafter. Then, we can start on the next step: Finding a syntax with which we can specify improved type information supported by the MMD system.
As a quick exploration, we could do something like this:
function Foo(Bar.Baz baz, Bar.Fie fie) { ... }
That would be just fine for a multiple dispatch situation, but specifying types in the signature implies that we are doing some kind of type-checking, which I suspect Winxed would not want to do. If we did that, we could automatically promote every single function with type information in the signature to a MultiSub, even if there was only one of them. Right now, the patch only auto-promotes a Sub to a MultiSub if there are more than one of them with the same name. This promotion, and the dispatch mechanisms associated with MultiSub have non-negligible cost. It runs contrary to expectations to think that adding more type information for the compiler would decrease performance of the generated code.
If we keep the logic that we only promote to MultiSub when there are multiple functions with the same name, we could instead insert type checks into an ordinary Sub that has type information but is not auto-promoted. Of course, then the code generator for Sub has to keep track of storage information in the owner namespace, which gets messy quick. The laziest approach would be to not insert any type-check information at all, and allow a parameter which is declared as Bar.Baz baz to be filled by an object of any type without any indication that it does not match what is written. I suspect that’s not what anybody wants.
Perhaps we could do something like adding metadata in tags:
function Foo[multi(Bar.Baz, Bar.Fie)](var baz, var fie) { ... }
But that’s verbose and ugly, and updating the parser to support it is non-trivial. It does have the benefit that you are explicitly telling the compiler that you want it promoted to MultiSub.
Another possible syntax would be this:
multi Foo(Bar.Baz baz, Bar.Fie fie) { ... }
Here, we use the multi keyword if we want to specify type information in the parameter list, to make clear that we only do type checking in a MultiSub, but don’t do it for an ordinary function.
One more than just came to mind would be something like:
function Foo(var[Bar.Baz] baz, var[Bar.Fie] fie) { ... }
Here, it’s clear that the first parameter is still a var and can be any type, but there is the tag there that says it should be considered a specific type in a multidispatch situation.
Anyway, the easy part is done. With my patch you can do basic multi-dispatch on the four register types in Winxed without any new fancy syntax. That’s easy, it’s implemented, and it works. Doing the next step is harder because we are going to need to add new syntax, and finding such a syntax which is functional, attractive, and does not promise things it does not deliver.
Maybe we don’t find any such syntax, and Winxed never gets an easy syntax for class-based multiple dispatch. That’s a little disappointing to think about, but we’ve come pretty far without any MMD support in Winxed, and we will go much further with the little bit provided in my patch. Maybe that takes us far enough for most uses.
This post is extremely late, but it’s still worth mentioning. A few weeks ago I did all the necessary work to get the Rosella FileSystem library listed as stable. “Stable”, as always, doesn’t mean the library is perfect. Instead, it means that the library has a stable design, a generally stable interface, and it’s usable right now by anybody who is interested to try it out.
Parrot offers some tools for working with files and directories. Mostly, these tools involve use of the OS PMC, a dynamically-loaded PMC type that is not built in to Parrot core. For simple IO operations on files Parrot does provide the FileHandle PMC type, but for things like working with the organization of files and directories you need to use OS. Also, the OS PMC doesn’t quite do everything either. There is also a dynamically-loaded ops library that provides a stat wrapper op, in addition to some other convenience ops for working with FileHandles (if the methods on FileHandle to do the exact same things are inexplicably not what you needed).
What Rosella FileSystem library does is provide nicer, friendlier wrappers around all these things. The library itself is pretty heavily inspired by System.IO from the .NET standard library, and some items from Python’s os module too.
Files and directories are all instances of Rosella.FileSystem.Entry. There are two subclasses: Rosella.FileSystem.File and Rosella.FileSystem.Directory. You can create instances of these objects using simple constructors with path strings. Here are some Winxed code examples:
var file = new Rosella.FileSystem.File("foo.txt");
var dir = new Rosella.FileSystem.Directory("bar");
Or, if you want to create a file in a directory, you can do something like this:
var dir = new Rosella.FileSystem.Directory("bar");
var file = new Rosella.FileSystem.File("foo.txt", dir);
All Entry objects have some common methods:
entry.exists() # 1 if it exists. 0 otherwise
entry.delete() # Delete (non-recursive delete for Directories)
entry.rename("baz") # Rename it
string n = entry.short_name() # The short-name of the entry (no path)
Directories add in a few other features:
dir.delete_recursive() # Delete with all contents
dir.create() # Create it, if it doesn't exist
var files = dir.get_files() # Get a list of all File objects in it
var dirs = dir.get_subdirectories() # Get a list of all Directories in it
var entries = dir.get_entries() # Get all entries, File and Directory
var entry = dir["foo.txt"] # Get the entry by name. File or Directory
var entry = dir["baz"] # Same
dir.walk(visitor) # Walk contents, using a Visitor
dir.walk_func(func) # Walk contents, using a Sub on each File
exists dir["foo"] # Determine if "foo" exists
delete dir["foo"] # Delete "foo", if it exists
Files likewise add in some features of their own:
var fh = file.open_read() # Get FileHandle, opened for reading
var fh = file.open_write() # Get FileHandle, opened for writing
var fh = file.open_append() # Get FileHandle, open for write/append
string t = file.read_all_text() # Read all text into a single string
var t = file.read_all_lines() # Read all text, as an array of lines
file.write_all_text(txt) # Write all text to the file (delete existing contents)
file.write_all_lines(t) # Write an array of string lines to file
file.append_text(t) # Append some text
file.copy(dest) # Copy a file to dest
Basically, it’s a very easy object-oriented interface to common file system operations. It’s nothing fancy, and there are some features missing which people might expect, but it’s very useful and, I think, very usable. Also, it’s not a very thick library so while there is some performance overhead it isn’t too much considering the added convenience.
On http://perlcabal.org/syn/, Synopsis 26 is the only one without the HTML page. That’s of course due to the lack of Pod (Pod6) to HTML converter. Today there has been a breakthrough in this embarrasing situation :)
My Pod parser integrated into Rakudo is capable of parsing S26 completely, so this morning I wrote a Pod::To::HTML module for it. Parsing and generating HTML output from S26 takes about 4 minutes on my machine, but the outcoming document is not that bad. You can see some still NYI features, like the lack of formatting codes and correctly interpreting semantic blocks. The first one is a part of my GSoC work, the second one is not but I’ll probably do it anyway, just for the sake of having prettier S26 :)
My HTML-fu is a bit Less Than Awesome, so if anyone knows how to make it any better (and it won’t be cheating as Pod::To::HTML is not a part of my GSoC work), I’m willing to hand out commit bits to anyone interested.
Maybe it’s finally time to read this one. Have the appropriate amount of fun.
I'm far behind on my project, unfortunately, and, upon dukeleto's urging, I've written up a new timeline that I think I will be able to finish before Summer of Code ends.
First, I'll summarize what I have done:
And what I haven't done:
*tap, tap, tap* Is this thing on? It is? Drat, I had hoped it wasn't and I'd have something to blame for the long silence.
Due to the issues I'm about to describe, my old schedule is a little off. The important schedule note is that the official "pencils down" date is August 15th, aka next week. Current plan is to power through as much as I can in the next few days. At this point, I doubt that my branch will be merged into master before the end of GSoC, but I do intend to keep working on it.
The core of my problem is in this line from my last blog post:
On a personal note, the house we were buying has fallen through. I’ll spare the details (the full telling of which involves substantial use of curse words and unfounded allegations of illegal sexual deviancy). Time I was planning to use hacking on Parrot from the comfort of my own house is now going to be spent frantically searching the neighborhood for a new house to buy.
In a previous post I talked about some of the work I was doing with a new :tag syntax. The :tag syntax, and its underlying mechanisms is intended to replace :load and :init flags (and maybe, eventually, others like :main and :postcomp also) with something much more usable and flexible. A big benefit here is that I can use the new PackfileView PMC type to look up Sub PMCs by tag from running PIR code, and execute them directly from PIR, without needing the current mechanism of nested runloops to do it. As I have shown in previous posts, using nested runloops brings a non-trivial performance penalty that we should try to avoid.
In Parrot semantics, the :init subroutines are supposed to execute before the :main Sub executes. This is important because the :init subroutines are typically used to set up things used by the main program, such as class definitions and global data stores. However, the only way to currently guarantee that the :init subs all execute before the :main sub is to execute them separately, each in their own runloop. Basically, we have C code to execute all the :init Subs in a loop, and then we find and execute the :main Sub. If we want to consolidate and do everything from inside a single runloop, one super-main routine needs to call both the :init and :main subs together. This means we need to jump into some kind of standard PIR entryway routine earlier in the startup process. I talked about this kind of system in a post several months ago, and how such a system would look.
My hypothesis is this: If we had a different frontend program that jumped into PIR code as early as possible and did as much processing there as could be done, there are startup performance gains to be had. Embedding API calls, because they need to set up things like error-handling mechanisms and other call-in details, have overhead. Trying to do lots of work through the embedding API was never the intended use of it. Instead, the embedding API tries to expose the tools necessary to jump in to PBC execution, which is where the real power and performance of Parrot is made available. Minimizing the number of embedding API calls that we need to execute prior to user program execution is a good thing. Identical operations can be done from PIR without the call-in and call-out overhead of the embedding API. Also, minimizing the execution of Parrot code in separate runloops, such as all those pesky :init routines that need to execute before user :main does is also a performance win. Of course, this is only a win for programs which use the Parrot frontend or maybe other embedding applications (such as pbc_to_exe fakecutables) which borrow similar ideas.
In the whiteknight/frontend_parrot2 branch on Github, I’ve been working towards exactly this kind of situation. I’ve created a new frontend which attempts to bootstrap into a PIR entry-way program as early as possible. This PIR entry program, which I’ve been calling prt0.pir tries to do as much processing of command-line arguments as possible, including loading of PBC files and compilation of PIR and PASM input files, and other details. In the process I am trying to minimize the amount of C code in frontend/parrot2/main.c and hopefully bring some performance improvements along for the ride. I haven’t done any benchmarking yet because I don’t have all the details in place yet. One thing that I am currently missing is linking prt0.pbc into the parrot executable. Instead, I am currently loading it in at runtime from a separate file, which brings unnecessary overhead. My hacking goals for tonight and tomorrow are to get this issue resolved and start with some serious benchmarking to see if my hypothesis plays out.
Over the weekend I made the switchover in my branch so that the parrot executable is built from the new frontend/parrot2/main.c file instead of the old frontend/parrot/main.c. Miniparrot, the bootstrapping step which is used to compile the config hash, still is built from the old file and is now used to also compile prt0.pir. I see this as being a perfectly acceptable build process, and a great additional use for miniparrot. Once I made this switch and a few small tweaks, I was pleased to see that the build completed, the tests suite ran, and most tests even passed. Of the tests that fail, the majority of them seem to be tests related to backtraces, where a new ”__PARROT_MAIN_ENTRY__” function, the :main function in prt0.pir, is now appended at the top of all backtraces. One final piece of functionality is to find a way to remove that entry, so backtraces continue to look the way they always have looked.
One big change that I did have to make for this new setup is in argument processing. Previously, a :main routine was expected to take a single paramter: An array of command-line strings. For the new frontend, it’s much faster to separate out arguments in C using fast pointer arithmetic, and do processing on lists which have already been sorted in PIR. The new __PARROT_ENTRY_MAIN__ routine takes two parameters. The first is a string array of “system arguments”, things that affect the behavior of Parrot but which are not passed to user code. The second is the set of arguments which are passed to the user code. Here’s how the parrot commandline looks:
./parrot <sys_args> my_file.pir <user_args>
In the C frontend, we break the arguments up into 4 basic categories: 1. Arguments which affect interpreter creation, and therefore need to be parsed out before the interpreter is created. 2. Arguments which are processed in C code, but are processed after the interpreter is created. 3. Arguments which are for system-related stuff, but which can be processed from the PIR entry. 4. Arguments which are supposed to be passed to user code
My goal in this branch is to move as many arguments from category 2 to category 3 as I can. Anything that works can certainly stay where it is, and some of these changes can be made after the initial branch is merged.
To get this branch into mergable shape, which probably won’t happen before 3.7 considering the magnitude of the changes involved, I have a few tasks to finish up with: I need to fix the remaining test failures, fix loading of prt0.pbc into parrot, do some benchmarking to see if it is indeed an improvement (or if it can be made better with some tweaks), and then update the pbc_to_exe tool to use a similar mechanism. Once all that is done, and we’ve tested the hell out of it, we can talk about merging this branch to master.
Once merged, the :init flag will no longer be semantically different from the new :tag("init") syntax, when parrot is used from the commandline executable or from a pbc_to_exe fakecutable. That’s a very important step in the deprecation of the former, and is going to enable us to clean up a pretty big chunk of dirty code in IMCC and the packfile subsystem.
With the recent addition of the as_string method to UnManagedStruct and Ptr PMCs (see my last post) and the get_pointer vtable in ByteBuffer now is easier to pass and get strings from NCI (parrot Native Calll Interface).
To pass a string to a NCI 'p' parameter you just need to create a ByteBuffer and set the string to it, maybe after trans_encoding it, and add the zero-termination required in most usages by pushing a 0 value. ByteBuffer takes care of memory management.
A long time ago in a galaxy far, far away...
It is a period of binary war. Rebel hackers, striking from a hidden base in New Jersey have won their first victory against the evil IMCC Empire.
NotFound is the author of Winxed, the lowish-level Parrot language that I use the most. I find it’s much closer to the underlying machine than NQP-rx is, and when you’re a core developer developing core components, that kind of closeness and predictability is a big plus. I won’t get into a big discussion here about the relative merits of Winxed and NQP, or any other languages, I’m only pointing out that when I need to write code for execution on Parrot, I prefer to use Winxed for most cases.
Because I use it so often, I end up generating lots of feedback for NotFound. To his credit, he has a pretty strong vision of what the language should be, and where my feedback conflicts with his goals, he does things his own way. That’s a good thing. Clarity and consistency of vision is an important thing for any project, programming languages especially.
Two days ago I suggested to him that a more concise syntax for creating anonymous subroutines and closures would be a big benefit. At least, it would be a nice benefit for me because I end up writing lots of them as part of Rosella, and I end up using lots and lots of them in the examples and tests for Rosella.
He expressed some concerns that the new syntax, in addition to the old syntax, would be a confusing duplication, and the meaning of a shorter syntax would not be obvious or particularly readable for new users of the language. And, considering that Winxed itself is a new language, almost all users of it will be new users. Here’s the current syntax for creating an anonymous subroutine in winxed:
var f = function() { say("hello!"); };
f();
After a surprisingly short and easy hacking session last night, I added a proof-of-concept new syntax to the parser that looks like this:
var f = -> { say("hello!"); };
f();
var g = -> (x) { say(x); };
g("hello!");
With the new syntax, if we don’t have any parameters we can leave out the parenthesis entirely. Basically, the new syntax allows us to change the ‘function’ keyword with the operator ‘->’, and optionally omit the parenthesis. It’s an improvement, but not super-duper. Because Winxed allows hash-literals to be defined with curly brackets, we can’t have a syntax of just brackets, like what NQP can do. In NQP-rx:
my &f := { pir::say("Hello!"); };
&f();
NQP-rx can get away with just the brackets to indicate a code literal, because it doesn’t conflict with a syntax for hash literals. Winxed does have the hash literals syntax so it cannot have unambiguous simple closures like this. It’s not really much of a loss, the two extra characters to type, ”->” really isn’t much overhead, and it’s still less than writing ‘function’.
To see the benefit in readability of the new syntax, consider something like an example from the Rosella Query library documentation:
var data = [1, 2, 3, 4, 5, 6, 7, 8, 9];
using Rosella.Query.as_queryable;
int sum = as_queryable(data)
.map(function(i) { return i * i; })
.filter(function(j) { return j % 2; })
.fold(function(s, i) { return s + i; })
.data();
say(sum); # 165
We can rewrite that middle part as:
int sum = as_queryable(data)
.map(->(i) { return i * i; })
.filter(->(j) { return j % 2; })
.fold(->(s, i) { return s + i; })
.data();
I think this is a lot cleaner and more readable. It isn’t as thin as it could be, we’re still typing a few extra characters for situations where we really just want to have a simple expression wrapped in a closure. With a little bit more work tonight, I thinned down the syntax even further, for cases where the closure is going to be a single expression:
var f = ->(i) i * i;
Where the sequence after the ”->” and the parameter list could be a single expression instead of being a block, and the result of that expression would be implicitly returned when the closure was executed. For reference, the same thing in NQP-rx would be:
my &f := -> $i { $i * $i };
So we’re doing pretty good, for these simple cases.
With this newest version of the syntax, the Query example can be reduced to:
int sum = as_queryable(data)
.map(->(i) i * i)
.filter(->(j) j % 2)
.fold(->(s, i) s + i)
.data();
This is just hypothetical, of course. I have written this patch and tested it lightly. It does have some issues, especially with built-in functions and a few other situations where the expression parser gets confused. Also, whether the patch works or not, there’s no expectation that NotFound would accept it into Winxed master. I like it, and I would make immediate use of it, but I can get along just fine without it too.
I’m going to keep playing with my patch and see what else I can do with it. I need to fix some of the brokenness that I mentioned, and play around with the implementation a little bit. It’s easy to play with since the Winxed source is so hackable. I’m also looking at creating a new syntax for easily looking up functions in namespace. Right in winxed, if we wanted to call a function in a different namespace we would have to do:
using Foo.bar;
bar();
For comparison, NQP-rx would allow us to write this in one line:
Foo::bar();
I would like something similar for Winxed, something that would allow us to call a function from another namespace directly with an implicit lookup, without needing a separate, explicit lookup instruction. I don’t know what this syntax would be. I was thinking we could reuse the using syntax like this:
(using Foo.bar)();
But that seems kind of crappy. Maybe we could use something like this:
*Foo.bar();
Winxed does draw inspiration from C++, and the * is a “lookup” in some sense of the word. Again, this is a situation where even if I do write up a patch, it would be up to NotFound to accept it or not. I may be better off waiting for him to make a decision and then I can spend the effort to try and implement it. It’s not super-high priority, but it would help to make some ugly code much more readable, and I’m all about readability.
The last refactor of the NCI subsystem got rid of the 't' type used to pass C strings. This gives us more flexibility but doesn't solve all problems.
Take for example Mysql: we can specify the character set used for the connection with the database, and sometimes we can't use the current locale. We may want to read a table that contains unicode characters out of the range compatible with latin-1 without loses, while using latin-1 locale.
Just a few minutes ago, with a total count of?28 files changed, 1627 insertions and 41 deletions, the podparser branch in which I’ve been doing my GSoC work has been merged into nom, the current development branch of Rakudo and the soon-to-be master. So, what do we get?
=begin pod Some documentation =end pod say $=POD[0].content[0].content;
Some documentation
That’s not very useful per se, so how about this one:
=begin pod
=head1 NAME
A basic pod document
=head2 Running Perl programs
To run a Perl program from the Unix command line:
perl progname.pl
=head2 Things on my desk
There are the following things on my desk right now:
=item A cup of tea
=item A couple of pens
=item A stereo
=item A couple of books
=item A laptop, obviously
=end pod
DOC INIT {
use Pod::To::Text;
say pod2text($=POD);
exit;
}
say "I'm just a simple program";
Now what’s the DOC INIT thing? Let’s see. How about we run the above program:
$ perl6 foo.pl I'm just a simple program
No suprises. Let’s introduce the ?doc switch then:
$ perl6 --doc foo.pl
NAME
A basic pod document
Running Perl programs
To run a Perl program from the Unix command line:
perl progname.pl
Things on my desk
There are the following things on my desk right now:
* A cup of tea
* A couple of pens
* A stereo
* A couple of books
* A laptop, obviously
That’s right. The DOC blocks are executed only when the ?doc command line option has been given. At the moment you have to write them yourself, but maybe even in the nearest few days there will be a default DOC INIT block doing What You mean all by itself. There we go, a perldoc-alike built-in :)
WHY
That’s probably the biggest killer feature in the whole project. Although it’s not yet fully implemented in Rakudo (suprise segfaults here and there, don’t worry, they’re not permanent :)), it looks pretty much like this:
#= our zebra class
class Zebra {
...
}
say Zebra.WHY # prints: "our zebra class"
Yes yes, a documentation inspection in runtime. See the potential?
# what was that sub again? &sort.WHY.say # get the documentation for the sort() builtin
That opens a way for lots of awesome userspace tools too.
So, what’s still not quite there?
What can I say? Pull it, compile it, play with it, report bugs and have the appopriate amount of fun :)
This past week was admittedly dull. I've reached a bit a road block and this time it's blocking the way forwards, backwards, left, and right.
A few days ago I wrote about a new library I have been working on called Rosella Template. Template is a text templating library similar in concept to Liquid, and drawing significant inspiration from the likes of PHP, ASP.NET webforms, and other tools that allow templating textual data. I won’t go into all the details of the library, I talked about it a good deal in my previous post and will talk about it much more when I get closer to declaring it “stable”.
The library is fun to work on all by itself, but it is enabling me to make some very cool tools. Also, new packfile features in Parrot are allowing me to make even more tools. Rosella now has a small selection of utility programs in addition to its collection of libraries, and the number is growing. Today I’m going to talk about some of the fun new things that I am working on. All of them are still marked “unstable”, but as I play with them more, document them, and get some feedback they could move up to “stable” status.
mk_winxed_header.winxed is a utility program for creating forward-declaration include files for Winxed. If you’re doing a lot of hacking with Winxed, this tool is a must. It uses the new packfile features in Parrot to open up a compiled .pbc file, extract records of all the Sub, NameSpace and Class definitions in it and write out a .winxed include file. The usage would be something like this:
winxed mk_winxed_header.winxed foo.pbc > foo.winxed
And, depending on the contents of that .pbc file, the output will look something like this:
namespace Foo {
class Bar;
class Baz;
extern function foobar;
extern function barbaz;
...
}
In your winxed programs, if you are using foo.pbc, you can do this, to make the “type not found at compile time” warnings go away:
$include "foo.winxed"
function main[main]() {
var bar = new Foo.Bar();
var x = Foo.foobar();
}
This routine only uses the parrot functions and currently doesn’t make much use of any Rosella functionality, because I originally wrote it up as a general demonstration of the new Parrot features.
test_template.winxed is a simple tool for creating test-related files from templates. Rosella ships with a few pre-made templates, which are used by this script. test_template.winxed can be used to create either a test file or a working harness file with nothing but a few commandline options. If you want a test harness, you can do something like this:
winxed test_template.winxed harness nqp > t/harness
And like magic you have a basic, working test harness, written in NQP. You can use the argument “winxed” instead to get a harness written in winxed. The two are functionally equivalent.
If you have written and compiled a new class, you can generate tests for it very easily:
winxed test_template.winxed test winxed foo.pbc Foo.Bar > t/foo/Bar.t
The first argument says that we’re making a test, not a harness. The second argument says it should be written in Winxed (can also be “nqp”). The third argument is the name of the .pbc file to read, and the final argument is the fully-qualified name of the class to look up. What you get as the output is a file very similar to this:
// Automatically generated test for Class Rosella.Template.Node.Data
class Test_Foo_Bar
{
function test_sanity()
{
self.assert.is_true(1);
}
function test_new() {
var obj = new Foo.Bar();
self.assert.not_null(obj);
self.assert.instance_of(obj, class Foo.Bar);
}
function method_A() {
self.status.verify("Test Foo.Bar.method_A()");
var obj = new Foo.Bar();
var arg_0 = null;
var arg_1 = null;
var arg_2 = null;
var result = obj.method_A(arg_0, arg_1, arg_2);
}
...
}
function main[main]()
{
load_bytecode("rosella/test.pbc");
load_bytecode("foo.pbc");
using Rosella.Test.test;
test(class Test_Foo_Bar);
}
I’ve only shown one tested method above for brevity, but it will automatically generate a stub test method for every method in your class. Also, notice that it reads the number of expected parameters from each Sub and automatically creates variables for each. There are some kinks to work out here, but this can certainly save a lot of typing!
If you only need to generate a harness, or only need to generate a single test file, then test_template.winxed is the utility for you. If you are starting from scratch and need to create an entire test suite, you want something with a bit more capability. In those cases, use test_all_lib.winxed. test_all_lib.winxed reads in from a .pbc file again, and outputs a test suite for it. It outputs a test file for every single namespace with visible Subs, and every single class it can find. Every one of the files will look similar to the above listing. I won’t try to copy+paste the entire output of the script, but here is a quick example of me running it on the Rosella MockObject library:
<rosella git:master> winxed --nowarn test_all_lib.winxed rosella/mockobject.pbc motest
START NameSpace 'Rosella.MockObject'
Built Subs test file motest/Rosella.MockObject.t
Built Class test file motest/Rosella.MockObject/Controller.t
Built Class test file motest/Rosella.MockObject/Expectation.t
Built Class test file motest/Rosella.MockObject/Factory.t
END NameSpace 'Rosella.MockObject'
START NameSpace 'Rosella.MockObject.Controller'
Built Class test file motest/Rosella.MockObject.Controller/Ordered.t
END NameSpace 'Rosella.MockObject.Controller'
START NameSpace 'Rosella.MockObject.Expectation'
Built Class test file motest/Rosella.MockObject.Expectation/Get.t
Built Class test file motest/Rosella.MockObject.Expectation/Set.t
Built Class test file motest/Rosella.MockObject.Expectation/Method.t
Built Class test file motest/Rosella.MockObject.Expectation/Invoke.t
Built Class test file motest/Rosella.MockObject.Expectation/Will.t
Built Class test file motest/Rosella.MockObject.Expectation/With.t
END NameSpace 'Rosella.MockObject.Expectation'
START NameSpace 'Rosella.MockObject.Expectation.Will'
Built Class test file motest/Rosella.MockObject.Expectation.Will/Return.t
Built Class test file motest/Rosella.MockObject.Expectation.Will/Throw.t
Built Class test file motest/Rosella.MockObject.Expectation.Will/Do.t
END NameSpace 'Rosella.MockObject.Expectation.Will'
START NameSpace 'Rosella.MockObject.Expectation.With'
Built Class test file motest/Rosella.MockObject.Expectation.With/Args.t
Built Class test file motest/Rosella.MockObject.Expectation.With/Any.t
Built Class test file motest/Rosella.MockObject.Expectation.With/None.t
END NameSpace 'Rosella.MockObject.Expectation.With'
For those of you who aren’t keeping count, this utility generated seventeen separate, complete, runnable test files. When I think back to all the time I could have saved having test file skeletons automatically generated instead of having to write them all out by hand…
At the moment the test_all_lib.winxed tool is very early in development and doesn’t output NQP files yet. I need to update all the necessary templates to support NQP, and add in a new argument to the commandline for this. Notice also that Winxed and NQP are not the only two options for this or the other utilities I’ve discussed here. Writing up templates for other languages is usually a pretty quick and easy thing to do.
It’s official, we’re buying a house. Last night I got the last of the down payment together in the form of a certified check. It’s extremely difficult seeing several thousand dollars of hard-fought money being taken out of the savings account in a single fell swoop. Difficult, but at the same time extremely rewarding and reassuring.
Settlement is happening on Thursday afternoon. From now until then my free time is allocated for packing, planning and cooking. We’re cooking meals that can be easily packaged, frozen, and reheated later. These are not only for us on days when our kitchen sundries are all trapped in boxes, for days when we may have to make some necessary kitchen improvements (We bought a house without a dishwasher!), and also for my in-laws. My wife’s mother is having surgery early next month, and we want to have some pre-made food available to help make the recovery period easier. Anything I can cook and freeze today will be a huge help in the days ahead.
Of course, the victim here is the free time I would normally spend on other stuff: Coding, blogging, chatting on #parrot and searching for funny pictures of cats on the internet. All of these things will take something of a back seat for the next few days, and my availability in general will be down. This includes for the Parrot Developer Summit meeting we’re holding online this coming weekend. We haven’t scheduled a definitive time for the meeting yet, but I have to make the conservative assumption that I won’t have 2-3 hours at any point during the weekend to sit focused in front of a computer. Maybe we move so much stuff during the mornings that we are exhausted and need to vegetate in the evenings. Maybe, more likely, the hard work will start as soon as the boxes are all moved in to the apartment but not yet unpacked. I’ll do my best to attend the meeting at least in part, but I make no promises.
Jim Keenan sent out an email to the parrot-dev list asking for people to start thinking and discussing the kinds of things we want to talk about at the summit. In a general sense, the summit is the time when we lay out our development roadmap for the next few months. The development roadmap is the list of things that we are committed to deliver, and have assigned people to work on. It is not, emphatically so, a list of things we would like to have, or a list of things that would be really cool to have. The roadmap is only for the things that we need, are capable of delivering in that timespan, and for which we are committed to deliver. By necessity, the roadmap has tended to be a small list of things, but our track record in delivering them has been pretty good since the new system started. Much better than the old system where the roadmap was little more than hopes and dust, thrown into the wind.
So, since I might not be at PDS in person, I want to lay out my ideas for these questions in blog form. I’ll talk about the things I think Parrot can and should deliver in the next three months, and which projects I would be willing to be assigned to.
GSoC is going very well, and several of our students are on schedule and are generally kicking some code butt. They do need our continued support, however, to ensure that they continue onward and upward and that we get good results at the end of the summer. This isn’t the kind of specific goal that will end up on our roadmap, and it’s not the kind of thing we can really assign to a person. What I want to see is that the students continue to get excellent support from the developer community and that they are given the best opportunities to succeed by the end of the summer.
If you look at the benchmarks comparing the Rakudo nom and master branches, what you see might be a little surprising. The “nom” branch, which is the migration of Rakudo to 6model (among other changes) is significantly faster than master. There are a few reasons for this, the most compelling of which is that 6model allows native-typed attributes. Parrot’s current Object implementation boxes all attributes into PMCs, which is extremely costly and wasteful. 6model also brings in some other efficiencies too, and enables certain optimizations that we just don’t have available in Parrot today.
We want 6model. I want it. And, most importantly, I’m extremely eager to do the necessary work to bring it into Parrot. Given what we know today about the problems with our current object model and the extreme improvements 6model offers, everybody else should be chomping at the bit to make it happen as well. Getting 6model moved into Parrot, and doing it as quickly as possible, should be a big priority, and I would love to see it added to the roadmap. I would love to spearhead the charge, and I think there are a few other developers who would love to help it as well.
In fact, I’m probably going to assemble a team and work on this project whether it’s on the roadmap or not. Plan accordingly.
Thinking about profiling was on our roadmap leading up to the 3.6 release, and we did the bare minimum of that. Christoph, who probably knows the most about it, was busy with Lorito and I, who knows nearly nothing, wasn’t really able to do much in the way of productive advancement. Strictly-speaking, the road map goal was to think about profiling, figure out what was wrong with it and come up with a way forward. Actually implementing some of those things should probably be on the roadmap for this quarter.
6model is going to bring some performance improvements throughout, at least for languages and libraries which aren’t currently using 6model. Rakudo for instance probably won’t see much of a bump, except in streamlining a few corner cases. What will help to benefit Rakudo is adding in some improved profiling tools, so they can find and eliminate bottlenecks. I think we really need to work hard on this issue. I can’t say I’m as eager to work on profiling as I am to work on 6model, but I am willing to do it if hands are needed and if suitable direction is provided. I won’t be designing the new tools, but I am willing to help implement them, since they are so important.
There’s a branch floating around to improve tunings for the GC. PCC is ripe for optimizations and improvements. Several pending deprecations could bring improved code and better performance. PMC- and Object-related code could be significantly streamlined and improved, both before the migration to 6model and afterwards. IMCC could stand to see some optimizations and removal of more cruft. Besides these big areas, there is potential for a lot of streamlining in various hotspots, spread throughout the codebase. I think it’s reasonable, and in fact it’s becoming necessary, that we start to focus on optimizations. I would like to see Parrot 3.9 be at least 10% faster on an assortment of benchmarks on a variety of machines, than 3.6. With a concerted effort and dedicated developer resources, I think we can make it happen. We do need to pick the platforms that we want to target, and the benchmarks we want to track, but that could be done in a week or less. We could write up several benchmarks in NQP and Winxed, both of which would be significantly easier than writing up benchmarks in PIR. Plus, we would be able to get a more comprehensive measurement, including the speed of Parrot execution and the quality of generated code in those widely-used tools.
I am definitely willing to spend some significant amount of time on general-purpose optimization, and the more developers we can devote to this, the better. I would love to see something like this end up on our roadmap, and I think our users would love to see it as well.
This is an ancillary request. I would really like to see some serious thought put in to various bits of the Parrot community infrastructure.
Our Trac installation doesn’t have anything like a captcha or other limiting tools to prevent spam, so we’ve been preventing arbitrary users from creating tickets for bug reports and feature requests. That’s a horrible situation and needs to change. Since it appears to me that there aren’t many good alternatives, that Trac development appears a little anemic, and since upgrading these kinds of things to get new features tends to be a royal pain, I would be perfectly happy to move off Trac if an alternate idea were presented. I’m not pushing this agenda, simply stating that I wouldn’t shed any tears if we left trac behind. Also, Git integration with Trac is apparently broken now. I don’t know how much of a hassle that will be to fix.
Other things could use a look too, even if only to verify that they are what we need: The buildbot infrastructure, smolder, the mailing-lists, the various chat bots that help with our IRC chatroom, the parrot.org website, and the various steps in the monthly release procedure and the things that are required by it. It’s good for us to take inventory, and make sure that we are getting the most utility for the developer hours that we spend to set up and maintain these things.
This isn’t a road map goal per se, but it is something that I think the Parrot community should take the time to consider in the coming months.
So those are five things that I think we need to focus on between now and 3.9: GSoC, 6model, Profiling, Optimization and infrastructure. Those are just my ideas, and I know other developers are going to have other things as well. Of roadmap items that I am interested in working on, 6model definitely tops the list followed by optimization and then profiling.
If I cannot attend PDS, please consider this list of suggestions of things that we should put on the roadmap, and this is me volunteering to work on any and all of them.
My coding has been put on hold this week as I move across town and deal with some shenanigans from the landlord. The good news is that my wife and I should be able to move into our new house on Tuesday or Wednesday - the bad news is that I am currently internet-less until then. I've been working on updating the tutorial which doesn't require any internet access, just time. By this time next week I hope to complete the tutorial, have some more examples, and begin trying to get some examples in NQP or even on Rakudo.
The release quotation contest is over. We received four submissions, each of which was correct. In order of receipt, they were from:
Each of the four submissions appears to have met the do-not-use-the-Internet condition for researching the answer. In fact, each of the four appear to have gotten the answer the old-fashioned way:
They remembered it!
I’ve talked on a few occasions about some of the ugliness in IMCC and the packfiles subsystem. Yesterday I merged in my newest cleanup effort, a branch called whiteknight/pbc_pbc. That branch removed some dead code and added some new details to support some ongoing deprecations. It added, among other things, a new variant of the load_bytecode op that has semantics much closer to what we want them to be in the future. Here are some examples:
load_bytecode "foo.pbc" # old style
$P0 = load_bytecode "foo.pbc" # new style
The first variant is magical and does a lot of work. It can take a filename argument which is either a .pbc, a .pir, or a .pasm file. It does some really ugly extension detection, and will compile the file if necessary. It loads it directly otherwise. Once it compiles or loads the file, it uses the horrible and deprecated do_sub_pragmas function to loop over all subs and execute the ones marked :load. Finally, it adds the name of the file (without the file extension) to a cache so we don’t try to load the same file again.
The second variant is much simpler. It only takes .pbc files, it loads them, and it returns the PackfileView for it. Then the user can do anything she wants with respect to finding and executing :load or :init functions or anything else. This version does not currentls have a cache to prevent multiple loads of the same file, but we’re working to come up with a good way to add it. The new opcode will have cache behavior, one way or another, by the end of next week.
Today I started yet another branch, whiteknight/imcc_tag to start taking this idea to the logical next level: User-defined function pragmas. Here’s an example of a code file (“test.pir”) that a user could create:
.sub 'Foo' :tag("load", "init")
say "Foo!"
.end
.sub 'Bar' :tag("init", "something-else")
say 'Bar!'
.end
Notice the new :tag syntax, instead of the old :load and :init flags. Here is some code that makes use of it:
$P0 = load_bytecode "test.pbc"
$P1 = $P0.'subs_by_flag'("something-else")
$P2 = $P1[0]
$P2() # "Bar!"
It should be pretty clear: This is much more flexible and usable than the current system. Even better than that, this code works today in my branch. The changes to IMCC syntax happened yesterday and were much easier than I expected. I filled in most of the structural details this morning, and those also went pretty quickly. I tracked down a few bugs and did some quick ad hoc testing before pushing my changes for the world to see.
The way I implemented tags was through a list of index pairs. The first index in the pair is an index into the PMC constants table. The second is an index into the STRING constants table. Any PMC therefore can be mapped to any string in the table, and automatic deduplication of string constants means the lookup for all Subs with a given tag is very fast: a tight loop over an array of integers. For most cases, I suspect it’s faster than the old-style flag lookups, although I haven’t done any benchmarks yet. Avoiding a loop over all pmc constants and calling VTABLE_isa on each to see if it’s a "sub" or not and then checking the Sub flags should produce big savings. Of couse, load_bytecode operations are relatively uncommon so it won’t add up to anything too substantial for most programs in terms of saved wall-clock time.
I’ve got a lot of work to do in this branch still. I need to update the various packfile PMCs to be able to read and write these new tags, and I need to test the crap out of it. I don’t know when I’ll be ready to merge, but I’m hoping to have it in before 3.7 so I can get in the deprecation notice for :load and :init as early as possible.
Eventually, I would like to be able to replace most of the built-in Sub pragma flags with custom strings. This will lead to a huge cleanup of ugly bit-twiddling code, some speedups (again, they will be modest), and lots of cool new flexibility.
In personal news, we’re finally buying a new house, and are making settlement this thursday. This week will probably be taken up by packing, moving, paperwork, and other pleasantries of the process. Real hard-core hacking probably won’t happen much for the next two weeks at least.
We're halfway through the summer, now. I have not made nearly as much progress as I had previously hoped. Partly this is due to the difficulty of the problem, and partl\
y it is because I have not spent enough time on it. I think I'm currently at a point where I'm only a few mostly simple transformations away from producing a Determinis\
tic Push-Down Automaton which will be capable of parsing a LR(0) grammar. Whether implementing those last few mostly simple transformations will show this belief to be \
correct or not, I won't know until I do so.
Here are my next few goals:
I've been doing a lot less work than I had planned to. It's mostly my own fault, but such is life.
Basically, I ran out of money. Uni ended late, GSoC started early and the GSoC midterm was late this year. I had to borrow some and get a temporary full-time job for a couple of weeks. Now I have a part-time job (2 days a week), and I have the rest of the time free for GSoC work.
But not all is lost! Here's some puffins to make us all feel better:
Parrot 3.6 is out the door this week, thanks to release manager Jim Keenan. It was, like so many releases before it, a mostly uneventful release. As always, the myriad steps in the release manager guide, including the steps which are poorly written, ommitted, or misleading, caused the usual headaches. I won’t harp on those failings here.
Any stable release is a big deal, and the X.6 release each year is no different. It provides us with a nice median between the X.0 release and the next one in January. In this case, it’s the mid point between 3.0 and 4.0. Now is a really great time to ask ourselves two questions: What have we accomplished since 3.0, and what do we hope to accomplish by 4.0?
In terms of Parrot core repositor, lots has happened in the past six months. Here’s a quick rundown of some of the biggest changes:
This of course is all not to mention much of the development that has occured in related projects: NQP got 6model to help spur on the next leg of Rakudo development, Rosella has been growing like a weed and now has several users, Winxed is changing and improving rapidly, Several other ecosystem projects have been springing up and growing, etc. We also got a huge bump from the GCI program, and are knee-deep in a very productive GSoC program now too. It’s been a pretty good six months, all things considered.
So where are we heading? Well, I have a heck of a lot of stuff I want to do personally, but not all of it is expected to get completed by 4.0. Also, other people are working on projects with the same caveat. Here’s what I reasonably expect to happen by 4.0:
There are also a few wildcards: Threading and concurrency have been something of a hot topic recently, and I think the pressure is on for us to provide something to support it. Exactly what we will provide is up in the air, but I have to imagine we provide something by 4.0, or are well on our way towards it. JIT, depending on how much of Lorito we can get merged, could make an appearance as well. We could be working on a JIT compiler to compile M0 bytecode, even if most of the system doesn’t use M0 yet. The twin factors of 6model and Lorito will, I’m sure, dramatically change the landscape in terms of PMC types and the object metamodel. Expect to see at least some major refactoring, rewriting, and improving of existing PMC types. Expect to see a lot of deprecation tickets get submitted in the coming months.
Performance is a bit of a tricky beast. I expect 6model to bring a nice boost for code that is OO-heavy and could benefit from native-typed attributes. I am also convinced, and am prepared to prove, that some of the PCC refactors I’ve been talking about lately will bring some performance improvements of their own. JIT is, I think, the big mountain on the horizon, so anything we do until then is probably peanuts in comparison. However, we can make real progress now, if we aren’t expecting to move the world.
In addition to these core changes, I think we are going to see a lot of work happen throughout the ecosystem. We’re seeing something of a bump with relation to numerical computing projects, and we’re also seeing several new tools for compiler-building being exercised. We have Python and JavaScript compilers in the works as part of GSoC, a perennial interest in a Ruby compiler, growing interest in a PHP compiler, a new (and growing) R compiler, and several other projects popping up in various places. Do not be surprised in the least if, by the 4.0 release, one or more of these high level languages is good enough for most general-purpose programming tasks for Parrot. Also, and this goes without saying, expect to see Rakudo continue progressing at a phenominal rate.
The next few months are going to be busy and hopefully exciting as well. I think we’re going to get in some very important new features and changes that we’ve been wanting for a long time.
We are proud to announce Parrot 3.6.0, also known as "P?jaros del Caribe". Parrot is a virtual machine aimed at running all dynamic languages.
Parrot 3.6.0 is available on Parrot's FTP site, or by following the download instructions. For those who want to hack on Parrot or languages that run on top of Parrot, we recommend our organization page on GitHub, or you can go directly to the official Parrot Git repo on Github.
This may seem like an unusual blog post: why not a post to parrot-dev? Well, I've struggled the past few weeks as a newbie, largely with language syntax. But this problem is different: this is an interesting problem, one that I think shows that language/compiler design is more than just mastering basic language syntax and getting something to run approximately.
For readers unfamiliar with squaak, I think it is a teaching tool, not a finished high-level language (HLL) that would likely be used to get some job done.
Just a small re-cap from my last post - not every possible function signature that you might want to call through NCI comes built into Parrot. When you try to invoke a function that does not have a generated NCI thunk you will get a run-time error. GMP had a number of functions that were not covered by the built in NCI thunks so I installed libffi to get around this problem. Jay++ and dukeleteo++ have both started projects that will use NCI to some extent and are running into this problem as well. So I decided to tackle how to get around this problem without requiring libffi.
Today you'll be making a delivery to...
Okay seriously, besides Futurama being the greatest show ever, I actually do have good news. Last week was a very successful week. I was able to implement several commands within just a few days. They're still a bit "rough around the edges" but they work none the less.
Based on conversations yesterday with whiteknight, dukeleto, Notfound, bubaflub, sorear, cotto_work, soh_cah_toa, and others, I've decided that my initial design will use Resizable*Array for everything (even vectors of length 1). And for now, I'll support Integer, Float, and String. For reasons relating to the R language, I'll want my own logical (Boolean) using integers, but I'm not going there, yet.
Winxed 1.0.0, first version with version number, is out.
Now that winxed is bundled with parrot and the new parrot supported release is about to launch, is time to have a way to check the version used and get information about it.
Unless some serious bug appears, 1.0.0 will be bundled with the next parrot stable release. Eventual bugfix releases will be numbered 1.0.x
In the git repository the tag for this release is RELEASE_1_0_0. Next releases will follow the same schema.
The new command line option --version gives the current version number.
NQR stands for 'Not Quite R'
GSL stands for the GNU Scientific Library, and I hope to provide at least some low-level bindings for Parrot.
See https://github.com/NQRCore for more information.
Nom and nqp now have a new regular expression engine (currently known as “QRegex”) that I’ve implemented over the past week.
As progress continued on the new “nom” branch of Rakudo since my last posting, it was becoming increasingly evident that regular expression support would end up being the next major blocker. I think we were all expecting that nom would initially use the same regular expression engine that nqp (and nqp-rx) have traditionally used. However, as I starting working on this, it began to look as though the amount of effort and frustration involved would end up being almost as large as what would be needed to make a cleaner implementation up front, and would leave a quite messy result.
So, last week I started on designing and implementing a new engine. Today I’m happy to report that nom is now using the new QRegex engine for its pattern matching, and that making a new engine was undoubtedly a far better choice than trying to patch in the old one in an ugly manner.
So far only nom’s runtime is using the new regex engine; the nqp and rakudo parsers are still using the older (slow) one, so I don’t have a good estimate of the speed improvement yet. The new engine still needs protoregexes and a couple of other features before it can be used in the compilers, and I hope to complete that work in the next couple of days. Then we’ll have a good idea about the relative speed of the new engine.
I’m expecting QRegex to be substantially faster than the old one, for a variety of reasons. First, it should make far fewer method calls than the old version, and method calls in Parrot can definitely be slow. As an example I did some profiling of the old engine a couple of weeks ago, and the “!mark_fail” method accounted for something like 60% or more of the overall method calls needed to perform the parse.
Qregex does its backtracking and other core operations more directly, without any method calls for backtracking. So I expect that this one change will reduce the number of method calls involved in parsing by almost a factor of 3. Other common operations have also eliminated the method call overhead of the previous engine.
The new engine also uses a fixed-width encoding format internally, which means that we no longer pay a performance penalty for matching on unicode utf-8 strings. This will also enable us to eventually use the engine to do matching on bytes and graphemes as well as codepoints.
I also found quite a few places where I could drastically reduce the number of GCables being created. In some cases the old engine would end up creating multiple GCables for static constants, the new engine avoids this. A couple of new opcodes will enable QRegex to do substring comparisons without having to create new STRING gcables, which should also be a dramatic improvement.
I’ve already prototyped some code (not yet committed) that will integrate a parallel-NFA and longest-token-matching (LTM) into QRegex, so we’ll see even more speed improvement.
And did I mention the new engine is implemented in NQP instead of PIR? (Although it definitely has a lot of PIR influence in the code generation, simply by virtue of what it currently has to do to generate running code.)
Ultimately I’m expecting the improvements already put into QRegex to make it at least two to three times faster than its predecessor, and once the NFA an LTM improvements are in it ought to be even faster than that. And I’ve already noted new places ripe for optimizations… but I’m going to wait for some new profiles before doing too much there.
Another key feature of the new engine is that the core component is now a NQP role instead of a class. This means that it’s fairly trivial for any HLL to make use of the engine and have it produce match objects that are “native” to the HLL’s type system, instead of having to be wrapped. The wrapping of match objects in the old version of Rakudo was always a source of bugs and problems, that we can now avoid. Credit goes to Jonathan Worthington for 6model, which enables QRegex to do this, and indeed the ability to implement the engine using roles was what ultimately convinced me to go this route.
While I’ve been working on regexes, Moritz Lenz, Will Coleda, Tadeusz So?nierz, Solomon Foster, and others have continued to add features to enable nom to pass more of the spectest suite. As of this writing nom is at 244 test files and 7,047 tests… and that’s before we re-enable those tests that needed regex support. The addition of regexes to nom should unblock even more tests and features.
Some of the features added to nom since my previous post on July 2:
* Regexes
* Smart matching of lists, and other list/hash methods and functions
* Fixes to BEGIN handling and lexicals
* Implementation of nextsame, callsame, nextwith, callwith
* More introspection features
* Methods for object creation (.new, .bless, .BUILD, etc.)
* ‘is rw’ and return value type checking traits on routines
* Auto-generation of proto subs
* Junctions
* Backtraces
We’ve also done some detailed planning for releases that will transition Rakudo and Rakudo Star from the old compiler to the new one; I’ll be writing those plans up in another post in the next day or two.
Pm
|
When you need perl, think perl.org
|
|