Editor's Note: this Apocalypse is out of date and remains here for histor
ic reasons. See Syn
opsis 02 for the latest information.
Table Of Contents
|
| Larry Wall will give his annual entertaining talk on the
state of the Perl world, covering both Perl 5 and
Perl 6 at this Year's Open Source Convention.
Don't miss this rare opportunity to hear the creator of
Perl, patch, and run share his insights.
|
| |
Here's Apocalypse 2, meant to be read in conjunction with Chapter 2
of the Camel Book. The basic assumption is that if Chapter 2 talks
about something that I don't discuss here, it doesn't change in Perl 6.
(Of course, it could always just be an oversight. One might say that
people who oversee things have a gift of oversight.)
Before I go further, I would like to thank all the victims, er,
participants in the RFC process. (I beg special forgiveness from
those whose brains I haven't been able to get inside well enough to
incorporate their ideas). I would also like to particularly thank
Damian Conway, who will recognize many of his systematic ideas here,
including some that have been less than improved by my meddling.
Here are the RFCs covered:
RFC PSA Title
--- --- -----
Textual
005 cdr Multiline Comments for Perl
102 dcr Inline Comments for Perl
Types
161 adb Everything in Perl Becomes an Object
038 bdb Standardise Handling of Abnormal Numbers Like Infinities and NaNs
043 bcb Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars
192 ddr Undef Values ne Value
212 rrb Make Length(@array) Work
218 bcc C<my Dog $spot> Is Just an Assertion
Variables
071 aaa Legacy Perl $pkg'var Should Die
009 bfr Highlander Variable Types
133 bcr Alternate Syntax for Variable Names
134 bcc Alternative Array and Hash Slicing
196 bcb More Direct Syntax for Hashes
201 bcr Hash Slicing
Strings
105 aaa Remove "In string @ must be \@" Fatal Error
111 aaa Here Docs Terminators (Was Whitespace and Here Docs)
162 abb Heredoc Contents
139 cfr Allow Calling Any Function With a Syntax Like s///
222 abb Interpolation of Object Method Calls
226 acr Selective Interpolation in Single Quotish Context
237 adc Hashes Should Interpolate in Double-Quoted Strings
251 acr Interpolation of Class Method Calls
252 abb Interpolation of Subroutines
327 dbr C<\v> for Vertical Tab
328 bcr Single Quotes Don't Interpolate \' and \\
Files
034 aaa Angle Brackets Should Not Be Used for File Globbing
051 ccr Angle Brackets Should Accept Filenames and Lists
Lists
175 rrb Add C<list> Keyword to Force List Context (like C<scalar>)
Retracted
010 rr Filehandles Should Use C<*> as a Type Prefix If Typeglobs Are Eliminated
103 rr Fix C<$pkg::$var> Precedence Issues With Parsing of C<::>
109 rr Less Line Noise - Let's Get Rid of @%
245 rr Add New C<empty> Keyword to DWIM for Clearing Values
263 rr Add Null() Keyword and Fundamental Data Type
Perl 6 programs are notionally written in Unicode, and assume Unicode
semantics by default even when they happen to be processing other
character sets behind the scenes. Note that when we say that Perl is
written in Unicode, we're speaking of an abstract character set, not
any particular encoding. (The typical program will likely be written
in UTF-8 in the West, and in some 16-bit character set in the East.)
I admit to being prejudiced on this one -- I was unduly influenced
at a tender age by the rationale for the design of Ada, which made
a good case, I thought, for leaving multiline comments out of the
language.
But even if I weren't blindly prejudiced, I suspect I'd look at the
psychology of the thing, and notice that much of the time, even in languages
that have multiline comments, people nevertheless tend to use them like this:
/*
* Natter, natter, natter.
* Gromish, gromish, gromish.
*/
The counterargument to that is, of course, that people don't always do
that in C, so why should they have to do it in Perl? And if there were
no other way to do multiline comments in Perl, they'd have a stronger
case. But there already is another way, albeit one rejected by this RFC
as ``a workaround.''
But it seems to me that, rather than adding another kind of comment or
trying to make something that looks like code behave like a comment,
the solution is simply to fix whatever is wrong with POD so
that its use for commenting can no longer be considered a workaround.
Actual design of POD can be put off till Apocalypse 26, but we can
speculate at this point that the rules for switching back and forth
between POD and Perl are suboptimal for use in comments. If so, then
it's likely that in Perl 6 we'll have a rule like this: If a =begin
MUMBLE transitions from Perl to POD mode then the corresponding
=end MUMBLE should transition back (without a =cut directive).
Note that we haven't defined our MUMBLEs yet, but they can be set
up to let our program have any sort of programmatic access to the
data that we desire. For instance, it is likely that comments of
this kind could be tied in with some sort of literate (or at least,
semiliterate) programming framework.
I have never much liked inline comments -- as commonly practiced they
tend to obfuscate the code as much as they clarify it. That being
said, ``All is fair if you predeclare.'' So there should be nothing
preventing someone from writing a lexer regex that handles them,
provided we make the lexer sufficiently mutable. Which we will.
(As it happens, the character sequence ``/*'' will be unlikely to
occur in standard Perl 6. Which I guess means it is likely to
occur in nonstandard Perl 6. :-)
A pragma declaring nonstandard commenting would also allow people to use
/* */ for multiline comments, if they like. (But I still think
it'd be better to use POD directives for that, just to keep the text
accessible to the program.)
The basic change here is that, rather than just supporting scalars,
arrays and hashes, Perl 6 supports opaque objects as a fourth
fundamental data type. (You might think of them as pseudo-hashes
done right.) While a class can access its object attributes any way it
likes, all external access to opaque objects occurs through methods,
even for attributes. (This guarantees that attribute inheritance
works correctly.)
While Perl 6 still defaults to typeless scalars, Perl will be able
to give you more performance and safety as you give it more type
information to work with. The basic assumption is that homogenous
data structures will be in arrays and hashes, so you can declare the
type of the scalars held in an array or hash. Heterogenous structures
can still be put into typeless arrays and hashes, but in general Perl 6
will encourage you to use classes for such data, much as C encourages
you to use structs rather than arrays for such data.
One thing we'll be mentioning before we discuss it in detail is
the notion of ``properties.'' (In Perl 5, we called these ``attributes,''
but we're reserving that term for actual object attributes these days,
so we'll call these things ``properties.'') Variables and values can
have additional data associated with them that is ``out of band'' with
respect to the ordinary typology of the variable or value. For now,
just think of properties as a way of adding ad hoc attributes to a
class that doesn't support them. You could also think of it as a
form of class derivation at the granularity of the individual object, without
having to declare a complete new class.
This is essentially a philosophical RFC that is rather short on detail.
Nonetheless, I agree with the premise that all Perl objects should
act like objects if you choose to treat them that way. If you
choose not to treat them as objects, then Perl will try to go along
with that, too. (You may use hash subscripting and slicing syntax
to call attribute accessors, for instance, even if the attributes
themselves are not stored in a hash.) Just because Perl 6 is more
object-oriented internally, does not mean you'll be forced to
think in object-oriented terms when you don't want to. (By and large, there will be a few places where OO-think is more required in Perl
6 than in Perl 5. Filehandles are more object-oriented in Perl 6,
for instance, and the special variables that used to be magically
associated with the currently selected output handle are better
specified by association with a specific filehandle.)
This is likely to slow down numeric processing in some locations. Perhaps
it could be turned off when desirable. We need to be careful not to
invent something that is guaranteed to run slower than IEEE floating
point. We should also try to avoid defining a type system that makes
translation of numeric types to Java or C# types problematic.
That being said, standard semantics are a good thing, and should be
the default behavior.
This RFC suggests that a pragma enables the feature, but I think it
should probably be tied to the run-time type system, which means it's
driven more by how the data is created than by where it happens to
be stored or processed. I don't see how we can make it a
pragma, except perhaps to influence the meaning of ``int'' and ``num''
in actual declarations further on in the lexical scope:
use bigint;
my int $i;
might really mean
my bigint $i;
or maybe just
my int $i is bigint;
since representation specifications might just be considered part
of the ``fine print.'' But the whole subject of lexically scoped
variable properties specifying the nature of the objects they contain
is a bit problematic. A variable is a sort of mini-interface, a
contract if you will, between the program and the object in question.
Properties that merely influence how the program sees the object
are not a problem -- when you declare a variable to be constant,
you're promising not to modify the object through that variable,
rather than saying something intrinsically true about the object.
(Not that there aren't objects that are intrinsically constant.)
Other property declarations might need to have some say in how
constructors are called in order to guarantee consistency between the
variable's view of the object, and the nature of the object itself.
In the worst case we could try to enforce consistency at run time,
but that's apt to be slow. If every assignment of a Dog object to a
Mammal variable has to check to see whether Dog is a Mammal, then
the assignment
is going to be a dog.
So we'll have to revisit this when we're defining the relationship between
variable declarations and constructors. In any event, if we don't make
Perl's numeric types automatically promote to big representations, we
should at least make it easy to specify it when you want that to happen.
I've rejected this one, because I think something that's undefined should
be considered just that, undefined. I think the standard semantics are useful for catching many kinds of errors.
That being said, it'll hopefully be easy to modify the standard
operators within a particular scope, so I don't think we need to think
that our way to think is the only way to think, I think.
Here's an oddity, an RFC that the author retracted, but that I accept,
more or less. I think length(@array) should be equivalent to
@array.length(), so if there's a length method available, it
should be called.
The question is whether there should be a length method at all,
for strings or arrays. It almost makes more sense for arrays than it
does for strings these days, because when you talk about the length
of a string, you need to know whether you're talking about
byte length or character length. So we may split up the traditional
length function into two, in which case we might end up with:
$foo.chars
$foo.bytes
@foo.elems
Or some such. Whatever the method names we choose, differentiating
them would be more powerful in supplying context. For instance,
one could envision calling @foo.bytes to return the byte length of
all the strings. That wouldn't fly if we overloaded the method name.
Even chars($foo) might not be sufficiently precise, since, depending on
how you're processing Unicode, you might want to know how long the string
is in actual characters, not counting combining characters that don't
take extra space. But that's a topic for later.
I expect that a declaration of the form:
my Dog $spot;
is merely an assertion that you will not use $spot inconsistently with
it being a Dog. (But I mean something different by ``assertion'' than this
RFC does.) This assertion may or may not be tested at every assignment
to $spot, depending on pragmatic context. This bare declaration does
not call a constructor; however, there may be forms of declaration that
do. This may be necessary so that the variable and the object can pass
properties back and forth, and in general, make sure they're consistent
with each other. For example, you might declare an array with
a multidimensional shape, and this shape property needs to be visible
to the constructor, if we don't want to have to specify it redundantly.
On the other hand, we might be able to get assignment sufficiently
overloaded to accomplish the same goal, so I'm deferring judgment
on that. All I'm deciding here is that a bare declaration without
arguments as above does not invoke a constructor, but merely tells
the compiler something.
Built-in object types will be in all uppercase: INTEGER, NUMBER,
STRING, REF, SCALAR, ARRAY, HASH,
REGEX and CODE.
Corresponding to at least some of these, there will also be lowercase
intrinsic types, such as int, num, str and ref. Use of
the lowercase typename implies you aren't intending to do anything fancy
OO-wise with the values, or store any run-time properties, and thus
Perl should feel free to store them compactly. (As a limiting case,
objects of type bit can be stored in one bit.) This distinction
corresponds roughly to the boxed/unboxed distinction of other computer
languages, but it is likely that Perl 6 will attempt to erase the
distinction for you to the extent possible. So, for instance, an int
may still be used in a string context, and Perl will convert it for
you, but it won't cache it, so the next time you use it as a string,
it will have to convert again.
The declared type of an array or hash specifies the type of each
element, not the type of an array or hash as a whole. This is
justified by the notion that an array or hash is really just a strange
kind of function that (typically) takes a subscript as an argument
and returns a value of a particular type. If you wish to associate
a type with the array or hash as a whole, that involves setting a
tie property. If you find yourself wishing to declare different
types on different elements, it probably means that you should either
be using a class for the whole heterogenous thing, or at least declare
the type of array or hash that will be a base class of all the objects
it will contain.
Of course, untyped arrays and hashes will be just as acceptable as
they are currently. But a language can only run so fast when you
force it to defer all type checking and method lookup till run time.
The intent is to make use of type information where it's useful, and
not require it where it's not. Besides performance and safety, one other place
where type information is useful is in writing interfaces to other
languages. It is postulated that Perl 6 will provide enough optional
type declaration syntax that it will be unnecessary to write XS-style
glue in most cases.
I agree. I was unduly influenced by Ada syntax here, and it was
a mistake. And although we're adding a properties feature into Perl 6
that is much like Ada's attribute feature, we won't make the mistake
of reintroducing a syntax that drives highlighting editors nuts.
We'll try to make different mistakes this time.
I basically agree with the problem this RFC is trying to solve, but I
disagree with the proposed solution. The basic problem is that,
while the idiomatic association of $foo[$bar] with @foo rather
than $foo worked fine in Perl 4,
when we added recursive data structures to Perl 5, it started getting
in the way notationally, so that initial funny character was trying to
do too much in both introducing the ``root'' of the reference, as well as
the context to apply to the final subscript. This necessitated odd
looking constructions like:
$foo->[1][2][3]
This RFC proposes to solve the dilemma by unifying scalar variables
with arrays and hashes at the name level. But I think people like
to think of $foo, @foo and %foo as separate variables, so
I don't want to break that. Plus, the RFC doesn't unify &foo,
while it's perfectly possible to have a reference to a function as
well as a reference to the more ordinary data structures.
So rather than unifying the names, I believe all we have to do is
unify the treatment of variables with respect to references. That is,
all variables may be thought of as references, not just scalars.
And in that case, subscripts always dereference the reference implicit
in the array or hash named on the left.
This has two major implications, however. It means that Perl
programmers must learn to write @foo[1] where they used to write
$foo[1]. I think most Perl 5 people will be able to get used to
this, since many of them found the current syntax a bit weird in the
first place.
The second implication is that slicing needs a new notation, because
subscripts no longer have their scalar/list context controlled by the
initial funny character. Instead, the context of the subscript will
need to be controlled by some combination of:
-
Context of the entire term.
-
Appearance of known list operators in the subscript, such as comma or range.
-
Explicit syntax casting the inside of the subscript to list or scalar context.
-
Explicit declaration of default behavior.
One thing that probably shouldn't enter into it is the run-time type
of the array object, because context really needs to be calculated
at compile time if at all possible.
In any event, it's likely that some people will want subscripts to default
to scalars, and other people will want them to default to lists. There are
good arguments for either default, depending on whether you think more like
an APL programmer or a mere mortal.
There are other larger implications. If composite variables are
thought of as scalar references, then the names @foo and %foo are
really scalar variables unless explicitly dereferenced. That means
that when you mention them in a scalar context, you get the equivalent
of Perl 5's \@foo and \%foo. This simplifies the prototyping system
greatly, in that an operator like push no longer needs to specify
some kind of special reference context for its first argument -- it
can merely specify a scalar context, and that's good enough to
assume the reference generation on its first argument. (Of course,
the function signature can always be more specific if it wants to.
More about that in future installments.)
There are also implications for the assignment operator, in that it has
to be possible to assign array references to array variables without accidentally
invoking list context and copying the list instead of the reference to
the list. We could invent another assignment operator to distinguish the
two cases, but at the moment it looks as though bare variables and slices
will behave as lvalues just as they do in Perl 5, while lists in parentheses
will change to a binding of the right-hand arguments more closely
resembling the way Perl 6 will bind formal arguments to actual arguments
for function calls. That is to say,
@foo = (1,2,3);
will supply an unbounded list context to the right side, but
(@foo, @bar) = (@bar, @foo)
will supply a context to the right side that requests two scalar
values that are array references. This will be the default for
unmarked variables in an lvalue list, but there will be an easy
way to mark formal array and hash parameters to slurp the rest of
the arguments with list context, as they do by default in Perl 5.
(Alternately, we might end up leaving the ordinary list assignment
operator with Perl 5 semantics, and define a new assignment operator
such as := that does signatured assignment. I can argue that one
both ways.)
Just as arrays and hashes are explicitly dereferenced via subscripting
(or implicitly dereferenced in list context), so too functions are
merely named but not called by &foo, and explicitly dereferenced
with parentheses (or by use as a bare name without the ampersand
(or both)). The Perl 5 meanings of the ampersand are no longer in
effect, in that ampersand will no longer imply that signature matching
is suppressed -- there will be a different mechanism for that. And since
&foo without parens doesn't do a call, it is no longer possible to
use that syntax to automatically pass the @_ array -- you'll have to
do that explicitly now with foo(@_).
Scalar variables are special, in that they may hold either references
or actual ``native'' values, and there is no special dereference
syntax as there is for other types. Perl 6 will attempt to hide the distinction
as much as possible. That is, if $foo contains a native integer,
calling the $foo.bar method will call a method on the built-in type.
But if $foo contains a reference to some other object, it will call the
method on that object. This is consistent with the way we think about
overloading in Perl 5, so you shouldn't find this behavior surprising.
It may take special syntax to get at any methods of the reference
variable itself in this case, but it's OK if special cases are special.
This RFC has a valid point, but in fact we're going to do just the
opposite of what it suggests. That is, we'll consider the funny
characters to be part of the name, and use the subscripts for context.
This works out better, because there's only one funny character, but
many possible forms of dereferencing.
We're definitely killing Perl 5's slice syntax, at least as far
as relying on the initial character to determine the context of the
subscript. There are many ways we could reintroduce a slicing syntax,
some of which are mentioned in this RFC, but we'll defer the decision
on that till Apocalypse 9 on Data Structures, since the interesting
parts of designing slice syntax will be driven by the need to slice
multidimensional arrays.
For now we'll just say that arrays can have subscript signatures much
like functions have parameter signatures. Ordinary one-dimensional
arrays (and hashes) can then support some kind of simple slicing syntax
that can be extended for more complicated arrays, while allowing
multidimensional arrays to distinguish between simple slicing and
complicated mappings of lists and functions onto subscripts in a
manner more conducive to numerical programming.
On the subject of hash slices returning pairs rather than values, we could
distinguish this with special slice syntax, or we could establish the
notion of a hashlist context that tells the slice to return pairs rather
than just values. (We may not need a special slice syntax for that if
it's possible to typecast back and forth between pair lists and ordinary
lists.)
This RFC makes three proposals, which we'll consider separately.
Proposal 1 is ``that a hash in scalar context evaluate to the number
of keys in the hash.'' (You can find that out now, but only by using
the keys() function in scalar context.) Proposal 1 is OK if we change ``scalar context'' to ``numeric context,'' since in scalar
context a hash will produce a reference to the hash, which just
happens to numify to the number of entries.
We must also realize that some implementations of hash might have
to go through and count all the entries to return the actual number.
Fortunately, in boolean context, it suffices to find a single entry
to determine whether the hash contains anything. However, on hashes
that don't keep track of the number of entries, finding even one entry
might reset any active iterator on the hash, since some implementations
of hash (in particular, the ones that don't keep track of the
number of entries) may only supply a single iterator.
Proposal 2 is ``that the iterator in a hash be reset through an
explicit call to the reset() function.'' That's fine, with the
proviso that it won't be a function, but rather a method
on the HASH class.
Proposal 3 is really about sort recognizing pairs and doing the
right thing. Defaulting to sorting on $^a[0] cmp $^b[0] is
likely to be reasonable, and that's where a pair's key would be found.
However, it's probable that the correct solution is simply to provide a
default string method for anonymous lists that happens to produce
a decent key to sort on when cmp requests a string representation
of either of its arguments. The sort itself should probably just
concentrate on memoizing the returned strings so they don't have to
be recalculated.
This RFC proposes to use % as a marker for special hash slicing
in the subscript. Unfortunately, the % funny character will not
be available for this use, since all hash refs will start with %.
Concise list comprehensions will require some other syntax within
the subscript, which will hopefully generalize to arrays as well.
Various special punctuation variables are gone in Perl 6,
including all the deprecated ones. (Non-deprecated variables will be
replaced by some kind of similar functionality that is likely to be
invoked through some kind of method call on the appropriate object.
If there is no appropriate object, then a named global variable
might provide similar functionality.)
Freeing up the various bracketing characters allows us to use them
for other purposes, such as interpolation of expressions:
"$(expr)" # interpolate a scalar expression
"@(expr)" # interpolate a list expression
$#foo is gone. If you want the final subscript of an array, and [-1]
isn't good enough, use @foo.end instead.
Other special variables (such as the regex variables) will change from
dynamic scoping to lexical scoping. It is likely that even $_
and @_ will be lexically scoped in Perl 6.
In Perl 5, lexical scopes are unnamed and unnameable. In Perl 6,
the current lexical scope will have a name that is visible within the
lexical scope as the pseudo class MY, so that such a scope can, if
it so chooses, delegate management of its lexical scope to some other
module at compile time. In normal terms, that means that when you use
a module, you can let it import things lexically as well as packagely.
Typeglobs are gone. Instead, you can get at a variable object through
the symbol table hashes that are structured much like Perl 5's. The
variable object for $MyPackage::foo is stored in:
%MyPackage::{'$foo'}
Note that the funny character is part of the name. There is no longer
any structure in Perl that associates everything with the name ``foo''.
Perl's special global names are stored in a special package named
``*'' because they're logically in every scope that does not hide them.
So the unambiguous name of the standard input filehandle is $*STDIN,
but a package may just refer to $STDIN, and it will default to
$*STDIN if no package or lexical variable of that name has been
declared.
Some of these special variables may actually be cloned for each lexical
scope or each thread, so just because a name is in the special global
symbol table doesn't mean it always behaves as a global across all
modules. In particular, changes to the symbol table that affect how
the parser works must be lexically scoped. Just because I install
a special rule for my cool new hyperquoting construct doesn't mean
everyone else should have to put up with it. In the limiting case,
just because I install a Python parser, it shouldn't force other modules
into a maze of twisty little whitespace, all alike.
Another way to look at it is that all names in the ``*'' package are
automatically exported to every package and/or outer lexical scope.
Underscores will be allowed between any two digits within a number.
Fine.
Fine.
I think I like option (e) the best: remove whitespace equivalent to the terminator.
By default, if it has to dwim, it should dwim assuming that hard tabs
are 8 spaces wide. This should not generally pose a problem, since
most of the time the tabbing will be consistent throughout anyway, and
no dwimming will be necessary. This puts the onus on people using
nonstandard tabs to make sure they're consistent so that Perl doesn't
have to guess.
Any additional mangling can easily be accomplished by a user-defined operator.
Creative quoting will be allowed with lexical mutataion, but we can't
parse foo(bar) two different ways simultaneously, and I'm unwilling to
prevent people from using parens as quote characters. I don't see how we
can reasonably have new quote operators without explicit declaration.
And if the utility of a quote-like operator is sufficient, there should
be little relative burden in requiring such a declaration.
The form of such a declaration is left to the reader as an exercise
in function property definition. We may revisit the question later
in this series. It's also possible that a quote operator such as
qx// could have a corresponding function name like quote:qx
that could be invoked as a function.
I've been hankering for methods to interpolate for a long time,
so I'm in favor of this RFC. And it'll become doubly important as
we move toward encouraging people to use accessor methods to refer to
object attributes outside the class itself.
I have one ``but,'' however. Since we'll switch to using . instead
of ->, I think for sanity's sake we may have to require the parentheses,
or ``$file.$ext'' is going to give people fits. Not to mention
``$file.ext''.
This proposal has much going for it, but there are also difficulties,
and I've come close to rejecting it outright simply because the
single-quoting policy of Perl 5 has been successful. And I
think the proposal in this RFC for \I...\E is ugly.
(And I'd like to kill \E anyway, and use bracketed scopings.)
However, I think there is a major ``can't get there from here'' that we
could solve by treating interpolation into single quotes as something
hard, not something easy. The basic problem is that it's too easy
to run into a \$ or \@ (or a \I for that matter) that wants
to be taken literally. I think we could allow the interpolation of
arbitrary expressions into single-quoted strings, but only if we limit
it to an unlikely sequence where three or more characters are necessary
for recognition. The most efficient mental model would seem to
be the idea of embedding one kind of quote in another, so I think this:
\q{stuff}
will embed single-quoted stuff, while this:
\qq{stuff}
will embed double-quoted stuff. A variable
could then be interpolated into a single-quoted string by saying:
\qq{$foo}
I agree with this RFC in principle, but we can't define the default
hash stringifier in terms of variables that are going away in Perl 6,
so the RFC's proposal of using $" is right out.
All objects should have a method by which they produce readable
output. How this may be overridden by user preference is open to debate.
Certainly, dynamic scoping has its problems. But lexical override
of an object's preferences is also problematic. Individual object properties
appear to give a decent way out of this. More on that below.
On printf formats, I don't see any way to dwim that %d isn't an
array, so we'll just have to put formats into single quotes in general.
Those format strings that also interpolate variables will be able to use
the new \qq{$var} feature.
Note for those who are thinking we should just stick with
Perl 5 interpolation rules: We have to allow % to introduce
interpolation now because individual hash values are no longer named
with $foo{$bar}, but rather %foo{$bar}. So we might as well
allow interpolation of complete hashes.
Class method calls are relatively rare (except for constructors, which
will be rarely interpolated). So rather than scanning for identifiers
that might introduce a class, I think we should just depend on expression
interpolation instead:
"There are $(Dog.numdogs) dogs."
I think subroutines should interpolate, provided they're introduced
with the funny character. (On the other hand, how hard is $(sunset
$date) or @(sunset $date)? On the gripping hand, I like the
consistency of & with $, @ and %.)
I think the parens are required, since in Perl 6,
scalar &sub will just return a reference, and require parens if you
really want to deref the sub ref. (It's true that a subroutine can be called
without parens when used as a list operator, but you can't interpolate
those without a funny character.)
For those worried about the use of & for signature checking suppression, we
should point out that & will no longer be the way to suppress
signature checking in Perl 6, so it doesn't matter.
I think the opportunity cost of not reserving \v for future use is
too high to justify the small utility of retaining compatibility with
a feature virtually nobody uses anymore. For instance, I almost used
\v and \V for switching into and out of verbatim (single-quote)
mode, until I decided to unify that with quoting syntax and use
\qq{} and \q{} instead.
I think hyperquotes will be possible with a declaration of your quoting
rules, so we're not going to change the basic single-quote rules (except
for supporting \q).
I'd like to get rid of the gratuitously ugly \E as an end-of-scope
marker. Instead, if any sequence such as \L, \U or \Q
wishes to impose a scope, then it must use curlies around that scope:
\L{stuff}, \U{stuff} or
\Q{stuff}.
Any literal curlies contained in stuff must be backslashed.
(Curlies as syntax (such as for subscripts) should nest correctly.)
There will be no barewords in Perl 6. Any bare name that is a
declared package name will be interpreted as a class object that
happens to stringify to the package name. All other bare names
will be interpreted as subroutine or method calls. For nonstrict
applications, undefined subroutines will autodefine themselves to
return their own name. Note that in ${name} and friends, the name
is considered autoquoted, not a bareword.
Use of brackets to disambiguate
"${foo[bar]}"
from
"${foo}[bar]"
will no longer be supported. Instead, the expression parser will always
grab as much as it can, and you can make it quit at a particular point
by interpolating a null string, specified by \Q:
"$foo\Q[bar]"
Special tokens will turn into either POD directives or lexically
scoped OO methods under the MY pseudo-package:
Old New
--- ---
__LINE__ MY.line
__FILE__ MY.file
__PACKAGE__ MY.package
__END__ =begin END (or remove)
__DATA__ =begin DATA
I think heredocs will require quotes around any identifier, and we
need to be sure to support << qq(END) style quotes. Space is now
allowed before the (required) quoted token. Note that custom quoting
is now possible, so if you define a fancy qh operator for your
fancy hyperquoting algorithm, then you could say <<qh(END) .
It is still the case that you can say <<"" to grab everything
up to the next blank line. However, Perl 6 will consider any line
containing only spaces, tabs, etc., to be blank, not just the ones that
immediately terminate with newline.
In Perl 5, a lot of contextual processing was done at run-time,
and even then, a given function could only discover whether it was in
void, scalar or list context. In Perl 6, we will extend the notion of
context to be more amenable to both compile-time and run-time analysis.
In particular, a function or method can know (theoretically even at
compile time) when it is being called in:
Void context
Scalar context
Boolean context
Integer context
Numeric context
String context
Object context
List context
Flattening list context (true list context).
Non-flattening list context (list of scalars/objects)
Lazy list context (list of closures)
Hash list context (list of pairs)
(This list isn't necessarily exhaustive.)
Each of these contexts (except maybe void) corresponds to a way in
which you might declare the parameters of a function (or the left side
of a list assignment) to supply context to the actual argument list
(or right side of a list assignment). By default, parameters will
supply object context, meaning individual parameters expect to be
aliases to the actual parameters, and even arrays and hashes don't do
list context unless you explicitly declare them to. These aren't cast
in stone yet (or even Jello), but here are some ideas for possible
parameter declarations corresponding to those contexts:
Scalar context
Boolean context bit $arg
Integer context int $arg
Numeric context num $arg
String context str $arg
Object context $scalar, %hash, Dog @canines, &foo
List context
Flattening list context *@args
Non-flattening list context $@args
Lazy list context &@args
Hash list context *%args
(I also expect unary * to force flattening of arrays in rvalue contexts.
This is how we defeat the type signature in Perl 6, instead of relying on the
initial ampersand. So instead of Perl 5's &push(@list), you could just
say push *@list, and it wouldn't matter what push's parameter signature
said.)
It's also possible to define properties to modify formal arguments,
though that can get clunky pretty quickly, and I'd like to have
a concise syntax for the common cases, such as the last parameter
slurping a list in the customary fashion. So the signature for the
built-in push could be
sub push (@array, *@pushees);
Actually, the signature might just be (*@pushees), if push is really
a method in the ARRAY class, and the object is passed implicitly:
class ARRAY;
sub .push (*@pushees);
sub .pop (;int $numtopop);
sub .splice (int $offset, int $len, *@repl);
But we're getting ahead of ourselves.
By the way, all function and method parameters (other than the object
itself) will be considered read-only unless declared with the rw
property. (List assignments will default the other way.) This will
prevent a great deal of the wasted motion current Perl implementations
have to go through to make sure all function arguments are valid
lvalues, when most of them are in fact never modified.
Hmm, we're still getting ahead of ourselves. Back to contexts.
References are no longer considered to be ``always true'' in Perl 6.
Any type can overload its bit() casting operator, and any type that
hasn't got a bit() of its own inherits one from somewhere else, if
only from class UNIVERSAL. The built-in bit methods have the expected
boolean semantics for built-in types, so arrays are still true if
they have something in them, strings are true if they aren't "" or &qu;
ot;0",
etc.
Another RFC rescued from the compost pile. In Perl 6, type names will
identify casting functions in general. (A casting function merely forces
context -- it's a no-op unless the actual context is different.) In Perl 6,
a list used in a scalar context will automatically turn itself into a
reference to the list rather than returning the last element. (A subscript
of [-1] can always be used to get the last element explicitly, if that's
actually desired. But that's a rarity, in practice.) So it works out
that the explicit list composer:
[1,2,3]
is syntactic sugar for something like:
scalar(list(1,2,3));
Depending on whether we continue to make a big deal of the list/array
distinction, that might actually be spelled:
scalar(array(1,2,3));
Other casts might be words like hash (supplying a pairlist context)
and objlist (supplying a scalar context to a list of expressions).
Maybe even the optional sub keyword could be considered a cast on
a following block that might not otherwise be considered a closure in
context. Perhaps sub is really spelled lazy. In which case, we
might even have a lazylist context to supply a lazy context to a list
of expressions.
And of course, you could use standard casts like int(), num(),
and str(), when you want to be explicit about such contexts at
compile time. (Perl 5 already has these contexts, but only at run
time.) Note also that, due to the relationship between unary functions
and methods, $foo.int, $foo.num, and $foo.str will be just
a different way to write the same casts.
Lest you worry that your code is going to be full of casts, I should
point out that you won't need to use these casts terribly often
because each of these contexts will typically be implied by the
signature of the function or method you're calling. (And Perl will
still be autoconverting for you whenever it makes sense.) More on
that in Apocalypse 6, Subroutines. If not sooner.
So, while boolean context might be explicitly specified by writing:
if (bit $foo)
or
if ($foo.bit)
you'd usually just write it as in Perl 5:
if ($foo)
Based on some of what we've said, you can see that we'll have
the ability to define various kinds of lazily generated lists.
The specific design of these operators is left for subsequent
Apocalypses, however. I will make one observation here, that I think
some of the proposals for how array subscripts are generated should
be generalized to work outside of subscripts as well. This may place
some constraints on the general use of the : character in places
where an operator is expected, for instance.
As mentioned above, we'll be having several different kinds of list
context. In particular, there will be a hash list context that assumes
you're feeding it pairs, and if you don't feed it pairs, it will
assume the value you feed it is a key, and supply a default value.
There will likely be ways to get hashes to default to interesting
values such as 0 or 1.
In order to do this, the => operator has to at least mark its
left operand as a key. More likely, it actually constructs a pair object
in Perl 6. And the { foo => $bar } list composer will be required to
use => (or be in a hashlist context), or it will instead be interpreted
as a closure without a sub. (You can always use an explicit sub or
hash to cast the brackets to the proper interpretation.)
I've noticed how many programs use qw() all over the place (much
more frequently than the input operator, for instance), and I've
always thought qw() was kind of ugly, so I'd like to replace it with
something prettier. Since the input operator is using up a pair
of perfectly good bracketing characters for little syntactic gain,
we're going to steal those and make them into a qw-like list composer.
In ordinary list context, the following would be identical:
@list = < foo $bar %baz blurch($x) >;
@list = qw/ foo $bar %baz blurch($x) /; # same as this
@list = ('foo', '$bar', '%baz', 'blurch($x)'); # same as this
But in hashlist context, it might be equivalent to this:
%list = < foo $bar %baz blurch($x) >;
%list = (foo => 1, '$bar' => 1, '%baz' = 1, blurch => $x); # same as this
Basically, file handles are just objects that can be used as iterators,
and don't belong in this chapter anymore.
Indeed, they won't be. In fact, angle brackets won't be used for input
at all, I suspect. See below. Er, above.
There is likely to be no need for an explicit input operator in Perl 6,
and I want the angles for something else. I/O handles are a subclass
of iterators, and I think general iterator variables will serve the purpose
formerly served by the input operator, particularly since they can be
made to do the right Thing in context. For instance, to read from
standard input, it will suffice to say
while ($STDIN) { ... }
and the iterator will know it should assign to $_, because it's in a Boolean
context.
I read this RFC more as requesting a generic way to initialize an iterator
according to the type of the iterator. The trick in this case is to
prevent the re-evaluation of the spec every time -- you don't want
to reopen the file every time you read a line from it, for instance. There
will be standard ways to suppress evaluation in Perl 6, both from the
standpoint of the caller and the callee. In any case, the model is that
an anonymous subroutine is passed in, and called only when appropriate.
So an iterator syntax might prototype its argument to be an anonymous sub,
or the user might explicitly pass an anonymous sub, or both. In any event,
the sub keyword will be optional in Perl 6, so things like:
while (file {LIST}) { ... }
can be made to defer evaluation of LIST to the appropriate moment (or moments,
if LIST is in turn generating itself on the fly). For appropriate parameter
declarations I suppose even the brackets could be scrapped.
Variables and values of various types have various kinds of data
attributes that are naturally associated with them by virtue of their type.
You know a dog comes equipped with a wag, hopefully attached to a tail.
That's just part of doghood.
Many times, however, you want the equivalent of a Post-It(r) note, so
you can temporarily attach bits of arbitrary information to some unsuspecting
appliance that (though it wasn't designed for it) is nevertheless the
right place to put the note. Similarly, variables and values in Perl 6
allow you to attach arbitrary pieces of information known as ``properties.''
In essence, any object in Perl can have an associated hash containing
these properties, which are named by the hash key.
Some of these properties are known at compile time, and don't actually
need to be stored with the object in question, but can actually be stored instead
in the symbol table entry for the variable in question. (Perl still makes it
appear as though these values are attached to the object.) Compile-time
properties can therefore be attached to variables of any type.
Run-time properties really are associated with the object in question,
which implies some amount of overhead. For that reason, intrinsic data
types like int and num may or may not allow run-time properties.
In cases where it is allowed, the intrinsic type must generally be
promoted to its corresponding object type (or wrapped in an object
that delegates back to the original intrinsic for the actual value).
But you really don't want to promote an array of a million bits to an
array of a million objects just because you had the hankering to put a
sticky note on one of those bits, so in those cases it's likely to be
disallowed, or the bit is likely to be cloned instead of referenced,
or some such thing.
Properties may also be attached to subroutines.
In general, you don't set or clear properties directly -- instead you
call an accessor method to do it for you. If there is no method of
that name, Perl will assume there was one that just sets or clears a
property with the same name as the method. However, using accessor
methods to set or clear properties allows us to define synthetic
properties. For instance, there might be a real constant property
that you could attach to a variable. Certain variables (such as
those in a function prototype) might have constant set by default.
In that case, setting a synthetic property such as rw might clear
the underlying constant property.
A property may be attached to the foregoing expression by means of the ``is''
keyword. Here's a compile-time property set on a variable:
my int $pi is constant = 3;
Here's a run-time property set on a return value:
return 0 is true;
Whether a property is applied to a variable at compile time or a value
at run-time depends on whether it's in lvalue or rvalue context.
(Variable declarations are always in lvalue context even when you
don't assign anything to them.)
The ``is'' works just like the ``.'' of a method call, except that
the return value is the object on the left, not the return value of
the method, which is discarded.
As it happens, the ``is'' is optional in cases where an operator is already
expected. So you might see things like:
my int $pi constant = 3;
return 0 true;
In this case, the methods are actually being parsed as postfix
operators. (However, we may make it a stricture that you may omit the
is only for predeclared property methods.)
Since these actually are method calls, you can pass arguments in addition
to the object in question:
my int @table is dim(366,24,60);
Our examples above are assuming an argument of (1):
my int $pi is constant(1) = 3;
return 0 is true(1);
Since the ``is'' is optional in the common cases, you can stack
multiple properties without repeating the ``is.''
my int $pi is shared locked constant optimize($optlevel) = 3;
(Note that these methods are called on the $pi variable at compile
time, so it behooves you to make sure everything you call is defined.
For instance, $optlevel needs to be known at compile-time.)
Here are a list of property ideas stolen from Damian. (I guess that
makes it intellectual property theft.) Some of the names have been
changed to protect the (CS) innocent.
# Subroutine attributes...
sub name is rw { ... } # was lvalue
my sub rank is same { ... } # was memoized
$snum = sub is optimize(1) { ... }; # "is" required here
# Variable attributes...
our $age is constant = 21; # was const
my %stats is private;
my int @table is dim(366,24,60);
$arrayref = [1..1000000] is computed Purpose('demo of anon var attrs');
sub choose_rand (@list is lazy) { return $list[rand @list] }
# &@list notation is likely
$self = $class.bless( {name=>$name, age=>$age} is Initialized );
# Reference attributes...
$circular = \$head is weak;
# Literal attributes...
$name = "Damian" is Note("test data only");
$iohandle = open $filename is dis(qw/para crlf uni/) or die;
$default = 42 is Meaning(<<OfLife);
The Answer
OfLife
package Pet is interface;
class Dog inherits('Canine') { ... }
print $data{key is NoteToSelf('gotta get a better name for this key')};
(I don't agree with using properties for all of these things, but it's
pretty amazing how far into the ground you can drive it.)
Property names should start with an identifier letter (which includes
Unicode letters and ideographs). The parsing of the arguments (if any)
is controlled by the signature of the method in question. Property
method calls without a ``.'' always modify their underlying property.
If called as an ordinary method (with a ``.''), the property value
is returned without being modified. That value could then be modified by a run-time property. For instance, $pi.constant
would return 1 rather than the value of $pi, so we get:
return $pi.constant is false; # "1 but false" (not possible in Perl 5)
On the other hand, if you omit the dot, something else happens:
return $pi constant is false; # 3 but false (and 3 is now very constant)
Here are some more munged Damian examples:
if (&name.rw) { ... }
$age++ unless $age.constant;
$elements = return reduce $^ * $^, *@table.dim;
last if ${self}.Initialized;
print "$arrayref.Purpose() is not $default.Meaning()\n";
print %{$self.X}; # print hash referred to by X attribute of $self
print %{$self}.X; # print X attribute of hash referred to by $self
print %$self.X; # print X attribute of hash referred to by $self
As with the dotless form, if there is no actual method corresponding to
the property, Perl pretends there's a rudimentary one returning the actual
property.
Since these methods return the properties (except when overridden by
dotless syntax), you can temporize a property just as you can any method,
provided the method itself allows writing:
temp $self.X = 0;
Note that
temp $self is X = 0;
would assign to 0 to $self instead. (Whether it actually makes sense to set the
compile-time X property at run time on the $self variable is anybody's guess.)
Note that by virtue of their syntax, properties cannot be set by
interpolation into a string. So, happily:
print "My $variable is foobar\n";
does not attempt to set the foobar property on $variable.
The ``is'' keyword binds with the same precedence as ``.'', even when it's
not actually there.
Note that when you say $foo.bar, you get $foo's compile-time
property if there is one (which is known at compile time, duh).
Otherwise it's an ordinary method call on the value (which looks for
a run-time property only if a method can't be found, so it shouldn't
impact ordinary method call overhead.)
To get to the properties directly without going through the method interface,
use the special btw method, which returns a hash ref to the properties
hash.
$foo.btw{constant}
Note that synthetic properties won't show up there!
None of the property names in this Apocalypse should be taken as final.
We will decide on actual property names as we proceed through the series.
Well, that's it for Apocalypse 2. Doubtless there are some things I
should have decided here that I didn't yet, but at least we're making
progress. Well, at least we're moving in some direction or other.
Now it's time for us to dance the Apocalypso, in honor of Jon Orwant and
his new wife.