Kawa (like other Free Software projects) has no lack of tasks and projects to work on. Here are some ideas.
The ones marked (GSoC) are probably most suitable for a Google Summer of Code project, in being a reasonable size, self-contained, and not depending on other tasks.
(GSoC)
Kawa has convenient syntax to allocate and initialize objects, but it gets messier it you want to initialize multiple objects that reference each other. Likewise for a single object “tree” which contains links to the root. In this example, we will looks at two vectors, but the feature is more useful for tree structures. Assume:
(define-constant list1 [1 2 list2]) (define-constant list2 ['a 'b list1])
The compiler translates this to:
(define-constant list1 (let ((t (object[] length: 3))) ;; allocate native Java array (set! (t 0) 1) (set! (t 1) 2) (set! (t 2) list2) (FVector:makeConstant t))) (define-constant list2 (let ((t (object[] length: 3))) ;; allocate native Java array (set! (t 0) 'a) (set! (t 1) 'b) (set! (t 2) list1) (FVector:makeConstant t)))
The problem is that list2
has not been created when
we evaluate the initializing expression for list
.
We can solve the problem by re-writing:
(define-private tmp1 (object[] length: 3)) (define-constant list1 (FVector:makeConstant tmp1) (define-private tmp2 (object[] length: 3)) (define-constant list2 (FVector:makeConstant tmp2) (set! (tmp1 0) 1) (set! (tmp1 1) 2) (set! (tmp1 2) list2) (set! (tmp2 0) 1) (set! (tmp2 1) 2) (set! (tmp2 2) list1)
The complication is that the code for re-writing vector and object constructors
is spread out (depending on the result type), and not where we
deal with initializing the variables.
One solution is to introduce an inlineable helper function $build$
defined as:
(define ($build$ raw-value create init) (let ((result (create raw-value)) (init raw-value result) result))
Then we can re-write the above code to:
(define-constant list1 ($build$ (object[] length: 3) (lambda (raw) (FVector:makeConstant raw)) (lambda (raw result) ($init-raw-array$ raw 1 2 list2)))) (define-constant list2 ($build$ (object[] length: 3) (lambda (raw) (FVector:makeConstant raw)) (lambda (raw result) ($init-raw-array$ raw 'a 'b list1))))
Note that the call to $build$
, as well as the generated lambda
expressions, are all easily inlineable.
Now assume if at the top-level body
if there is a sequence of define-constant
definitions initialized with calls to $build$
.
Now it is relatively easy to move all the init
calls after all
alloc
and create
expressions.
The $init-raw-array$
calls are expanded after the code has been re-ordered.
The project includes both implementing the above framework, as well as updating type-specific (and default) object creation to use the framework. It would also be good to have compiler warnings if accessing an uninitialized object.
(GSoC)
When developing and testing it is useful for the REPL to support hot-swapping (replacing functions on-the-fly) and debugging. The main goal being able to smoothly reload changed modules (files or functions), and have other modules not break. Debugging (such as setting breakpoints) would not be a priority for this project, but could be a follow-on project. Skills: Should be experienced with Java, and interested in learning about JVM TI and similar low-level parts of the platform. Difficulty: Challenging, but you can study how Java-9’s new jshell uses the JVM TI.
(GSoC - this is related to the previous item)
Kawa does a lot of optimizations and inlining. This conflicts with being able to “reload” a module into an already-running interactive environment.
We could add an option to load a module in “reloadable” mode.
Kawa already patches an old function object (a ModuleMethod
)
so existing references to the function get automatically updated.
However, there are problems if the “signature” of the function
changes - for example if the return type (declared or inferred)
becomes more general. In those cases the best thing is to
re-compile any code that depends on the modified function.
Reloading a module that defines a class is even trickier, at least if there are existing instances that should work as the updated class. We can handle the special case where only method bodies change: In reloadable mode, each method body is compiled to a separate function, the actual body indirects to the function. We must also recognize when compiling a new version of the same class, which requires a textual comparison between the old and new versions, or a structural comparison between the old class and the new code.
When it comes to top-level variables, an issue is when to re-evaluate the initializing expression. It is reasonable to do so if and only if the expression is modified, which again requires a textual comparison.
(GSoC)
The traditional way to access native (C/C++) functions is using JNI, but it’s very awkward. JNA and JNR are much easier to use. This project would design and implement an easy-to-use Kawa wrapper for for JNR. You should study existing JNR wrappers, such as that for JRuby. Difficulty: Medium. Need to study existing wrappers and "foreign function interfaces" (in multiple languages) and design one suitable for Kawa. Some Scheme (Kawa) experience would be helpful.
(GSoC)
Kawa supports units (such as cm^2
for square centimeters)
and quantities (such as 4cm^2
).
We would like to integrate these into the type system, both
for performance and compile-time type checking.
For syntax we can use a pseudo-parameterized type quantity
. For example:
(define a1 ::quantity[cm^2] 4cm^2)
(* 2.0 a1) ;; ⇒ 8cm^2
(+ 2.0 a1) ;; compile-time error
The run-time type of the variable a1
should be
a primitive double
, without object allocation.
Of course when a1
is converted to an object, we
create a Quantity
, not a Double
.
We can build on Kawa’s existing framework for
non-standard primitive types such as character
and ulong
.
Skills: Need good Java experience, and somewhat familiar with the
Java Virtual Machine.
You will need to become comfortable reading javap
output.
Difficulty: Modest.
The Kawa compiler currently uses reflection to determine properties
(such as exported function definitions) from referenced classes.
It would be better to read class files.
This should not be too difficult, since the gnu.bytecode
library
abstracts over class information read by reflection or class reading.
(GSoC - maybe)
We’d like a command for compiling a list of Java and Scheme
source files that may have mutual dependencies. A good way
to do this is to hook into javac
, which is quite extensible
and pluggable.
One could do something like:
Read the “header" of each Kawa source file, to determine the name of the generated main class.
Enter these class names into the javac tables as “uncompleted” classes.
Start compiling the Java files. When this requires the members of the Kawa classes, switch to the Kawa files. From javac, treat these as pre-compiled .class files. I.e. we treat the Kawa compiler as a black box that produces Symbols in the same way as reading class files. At this point we should only need the initial “scan” phase on Kawa.
If necessary, finish compiling remaining Kawa files.
This approach may not immediately provide as robust mixed-language support as is ideal, but it is more amenable to incremental improvement than a standalone stub-generator.
This project is good if you know or want to learn how javac
works.
Java 7 supports MethodHandles which are meant to provide better performance (ultimately) for dynamic languages. See JSR 292 and the Da Vinci Machine Project. Kawa makes limited use of MethodHandles, and no use of invokedynamic. There is more to be done. For example, we can start by optimizing arithmetic when the types are unknown at compile-time. They could make implementing generic functions (multimethods) more efficient. At some point we want to compile lambdas in the same way as Java 8 does. This can potentially be more efficient than Kawa’s current mechanism.
Remi Forax’s vmboiler is a small library on top of ASM that generates optimistically typed bytecodes. It could be useful for ideas.
(GSoC)
Kawa has some limited support for parameterized types, but it’s not used much. Improve type inferencing. Support definition of parameterized classes. Better use of parameterized types for sequence class. Support wildcards. (It might be better to have wild-carding be associated with declarations, as in Scala or proposed for Java, rather than uses.) See also http://openjdk.java.net/jeps/8043488.
(GSoC)
Kawa doesn’t have true function types: Parameter and result types are only handled for “known” functions. The general case with optional and keyword parameter is complicated, but simple fixed-arity procedure types would be very useful.
The following syntax is suggested:
procedure[(T1
..Tn
)Tr
]
T1
through T1
are types of the parameters, and
Tr
is the type of the result.
For example: procedure[(vector int) string]
.
We call this a typed-procedure type (in contrast to plain procedure
).
If a value has a typed-procedure type then its run-time representation
is a just a MethodHandle
. If such a procedure is called,
the generated bytecode is to just call its invokeExact
method.
The argument expressions are converted (and type-checked) the same
way as if we were calling a statically-known procedure.
Note that passing an int
argument
of to procedure[(vector int) string]
value does not
require allocating an object to “box” the int
;
we can pass a plain int
as-is.
Thus using typed-procedure types can lead to major speed-up.
For example the lib-test.scm
should become much faster.
Converting a known procedure to a typed-procedure type is usually
just a matter of creating a MethodHandle
that references the
method implementing the procedure. Some glue code may be needed
if the types aren’t identical, or if the procedure is a closure.
Converting a type-procedure value p
to generic value (such
as untyped procedure
or object
) can be though of as
wrapping it in a lambda
:
((lambda (arg1::vector arg2::int)::string (p arg1 arg2))
Coercing a generic value or an untyped procedure to a typed-procedure would need to generate a method whose signature matches the typed-procedure type, and in the body of the method use a generic apply.
Coercing from one typed-procedure type to a different typed-procedure type is a combination of the above techniques (as if converting first to object and then to the target type), though some optimizations are worth doing.
Adding varargs support can be done later.
We need a fall-back mechanism for platforms (such as Android)
that don’t support MethodHandle
s. The easiest is to
just treat a typed-procedure type as plain procedure
at run-time,
though we still want the compile-time type-checking,
Currently being worked on.
Add support for full continuations, which is the major feature missing for Kawa to qualify as a “true Scheme”. One way to implement continuations is to add a add that converts the abstract syntax tree to continuation-passing-style, and then expand the existing full-tail-call support to manage a stack. There are other ways to solve the problem. This may benefit from Faster tailcalls.
The TreeList class is a data structure for “flattened” trees. It is used for XML-style nodes, for multiple values, and for the full-tail-call API. The basic concept is fine, but it could do with some re-thinking to make make random-access indexing fast. Also, support for updating is insufficient. (This needs someone into designing and hacking on low-level data-structures, along with lots of profiling and testing.)
C# recently added asynch
and await
keywords
for asynchronous programming. Kawa’s recently improved support for lazy programming
seems like a good framework for equivalent functionality:
Instead of an asynch
method that returns a Task<T>
,
the Kawa programmer would write a function that returns a lazy[T]
.
This involves some design work, and modifying the compiler to
rewrite the function body as needed.
This is related to full continuations, as the re-writing is similar.
Currently being worked on.
Improvements to the read-eval-print console. In addition to a traditional Swing console, it would be useful to support using a web browser as a remote terminal, possibly using web-sockets. (This allows “printing” HTML-expressions, which can be a useful way to learn and experiment with web technologies.) See here for an article on the existing Swing REPL, along with some to-do items. Being able to hide and show different parts of the output might be nice. Being able to link from error messages to source might be nice. Better handling of redefinitions is discussed here in the context of JavaXF Script; this is a general REPL issue, mostly independent of the GUI for it.
An interesting possibility is to use the IPython framework. There are existing ports for Scala: either IScala or Scala Notebook.
(GSoC, for some subset)
It would be nice to update the XQuery (Qexo) support to some subset of XQuery 3.0.
Kawa supports a small subset of the Common Lisp language, but it supports a much larger subset of core Common Lisp concepts and data structures, some designed with Common Lisp functionality in mind. Examples include packages, arrays, expanded function declarations, type specifications, and format. A lot could be done to improve the Common Lisp support with modest effort. Some Common Lisp features could also be useful for Scheme: Documentation strings (or markup) as Java annotations, better MOP-like introspection, and generic methods a la defmethod (i.e. with multiple definition statements, possibly in separate files, as opposed to the current make-procedure) all come to mind. Being able to run some existing Common Lisp code bases with at most modest changes should be the goal. One such package to start with might be an existing test framework, perhaps FivaAM. Full Common Lisp compatibility is nice, but let’s walk before we can run.
(GSoC, for some subset)
A lot of work is needed to make JEmacs useful. One could try to import a useful package and see what works and what fails. Or one may look at basic editing primitives. Enhancements may be needed to core Emacs Lisp language primitives (enhancing Common Lisp support may help), or to the display engine.
Emacs now supports lexical bindings - we should do the same.
There is some Kawa support for Eclipse (Schemeway), and possibly other IDEs (NetBeans, IntelliJ). But many improvements are desirable. REPL improvements may be a component of this.
Kawa-Scheme support for the NetBeans IDE would be useful. One could perhaps build on the Clojure plugin.
Kawa-Scheme support for the Eclipse IDE would be useful. Probably makes sense to enhance SchemeWay. It may also make sense to build on the Dynamic Languages Toolkit, possibly making use of Schemeide, though DLTk seems more oriented towards interpreted non-JVM-based languages.
Hop is an interesting design for integrating server-side and client-side programming using a Scheme dialect. These ideas seem like they would port quite well to Kawa.
(GSoC)
Support localization by extending the SRFI_109 syntax, in the manner of (and compatible with) GNU gettext. I.e. optionally specify a localization key (to use as an index in the translation database); if there is no key specified, default to using the literal parts of the string.
Implement a “bind” mechanism similar to that of JavaFX Script. The idea is that when you initialize a variable or field, instead of initializing it to a fixed value, you bind it to an expression depending on other variables. We install “listeners” on those variables, so when those variables change, we update the bound variable. This feature is useful in many applications, but the initial focus could be GUI programming and perhaps web programming.
(GSoC. Possibly a bit light for a full Summer project, but can be extended or combined with other projects.)
Exact decimal arithmetic is a variation of exact rational arithmetic,
but may be more user-friendly. In particular, printing using
decimals is generally nicer than fractions.
It is also sometimes useful to specify an explicit scale,
so we can distinguish 0.2 from 0.20.
We can use the Java BigDecimal
class, but run into problems
with division - for example (/ 1.0 3.0)
.
We should implement a subclass of RatNum
that generalizes
BigDecimal
to also handle repeating decimals.
We need a lexical syntax for repeating decimals.
Possible ideas: 0._81_
or 0.#81
.
If a Scheme number literal is specified as exact and has either
a decimal point or an exponent (for example #e1.25
), then it should
read as an exact decimal, not a fraction.
(GSoC)
Kawa currently implements “optimistic” typing: The compiler only
complains if an expression has no values in common with the target type
- for example, if assigning a string
expression to an integer
variable.
It would be interesting to experiment with a
--strict-typing
option (which would never be the default):
Strict typing would only allow “widening” conversions - i.e.
that the expression type be a subtype of the target type.
For example it would complain if assigning a number
to an integer
unless you used an explicit cast.
To make this easier to work with we’d make use
of the dynamic
type, similar to
what
C#
does: Any expression can be converted
to or from dynamic
without the compiler complaining.
Similarly, if x
is dynamic
then x:name
is allowed by the compiler regardless of name
, with all checking
being deferred to run-time. If a variable is declared without a type,
it should default to dynamic
. The dynamic
type
is represented in the VM as object
but with an annotation
(like we do with character
).
The type-checker might need some changes to better distinguish implicit conversions from explicit casts.