summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/embedding.rst161
-rw-r--r--docs/language.rst451
-rw-r--r--docs/security.rst115
3 files changed, 727 insertions, 0 deletions
diff --git a/docs/embedding.rst b/docs/embedding.rst
new file mode 100644
index 0000000..0bb9dcc
--- /dev/null
+++ b/docs/embedding.rst
@@ -0,0 +1,161 @@
+##################
+Embedding Actinide
+##################
+
+Actinide is designed to be embedded into larger Python programs. It's possible
+to call into Actinide, either by providing code to be evaluated, or by
+obtaining builtin functions and procedures from Actinide and invoking them.
+
+The ``Session`` class is the basic building block of an Actinide integration.
+Creating a session creates a number of resources associated with Actinide
+evaluation: a symbol table for interning symbols, and an initial top-level
+environment to evaluate code in, pre-populated with the Actinide standard
+library.
+
+Executing Actinide programs in a session consists of two steps: reading the
+program in from a string or an input port, and evaluating the resulting forms.
+The following example illustrates a simple infinite loop:
+
+.. code-block:: python
+
+ import actinide
+
+ session = actinide.Session()
+ program = session.read('''
+ (begin
+ ; define the factorial function
+ (define (factorial n)
+ (fact n 1))
+
+ ; define a tail-recursive factorial function
+ (define (fact n a)
+ (if (= n 1)
+ a
+ (fact (- n 1) (* n a))))
+
+ ; call them both
+ (factorial 100))
+ ''')
+
+ # Compute the factorial of 100
+ result = session.eval(program)
+
+As a shorthand for this common sequence of operations, the Session exposes a
+``run`` method:
+
+.. code-block:: python
+
+ print(*session.run('(factorial 5)')) # prints "120"
+
+Callers can inject variables, including new builtin functions, into the initial
+environment using the ``bind``, ``bind_void``, ``bind_fn``, and
+``bind_builtin`` methods of the session.
+
+To bind a simple value, or to manually bind a wrapped builtin, call
+``session.bind``:
+
+.. code-block:: python
+
+ session.bind('var', 5)
+ print(*session.run('var')) # prints "5"
+
+To bind a function whose return value should be ignored, call ``bind_void``.
+This will automatically determine the name to bind the function to:
+
+.. code-block:: python
+
+ session.bind_void(print)
+ session.run('(print "Hello, world!")') # prints "Hello, world!" using Python's print fn
+
+To bind a function returning one value (most functions), call ``bind_fn``. This
+will automatically determine the name to bind to:
+
+.. code-block:: python
+
+ def example():
+ return 5
+
+ session.bind_fn(example)
+ print(*session.run('(example)')) # prints "5"
+
+Finally, to bind a function returning a tuple of results, call
+``bind_builtin``. This will automatically determine the name to bind to:
+
+.. code-block:: python
+
+ def pair():
+ return 1, 2
+
+ session.bind_builtin(pair)
+ print(*session.run('(pair)')) # prints "1 2"
+
+Actinide functions can return zero, one, or multiple values. As a result, the
+``result`` returned by ``session.eval`` is a tuple, with one value per result.
+
+Actinide can bind Python functions, as well as bound and unbound methods, and
+nearly any other kind of callable. Under the hood, Actinide uses a thin adapter
+layer to map Python return values to Actinide value lists. The ``bind_void``
+helper ultimately calls that module's ``wrap_void`` to wrap the function, and
+``bind_fn`` calls ``wrap_fn``. (Tuple-returning functions do not need to be
+wrapped.) If you prefer to manually bind functions using ``bind``, they must be
+wrapped appropriately. An equivalent set of methods, ``macro_bind``,
+``macro_bind_void``, ``macro_bind_fn``, and ``macro_bind_builtin`` bind values
+to entries in the top-level macro table, instead of the top-level environment,
+and allow extension of the language's syntax.
+
+Finally, Actinide can bind specially-crafted Python modules. If a module
+contains a top-level symbol named ``An`` (for the informal chemical symbol for
+the actinide series), it can be passed to the session's ``bind_module`` method.
+The symbol must be bound to an instance of the ``Registry`` class from the
+``actinide.builtin`` module:
+
+.. code-block:: python
+
+ from actinide.builtin import Registry
+ An = Registry()
+
+ five = An.bind('five', 5)
+
+ @An.void
+ def python_print(*args):
+ print(*args)
+
+ @An.fn
+ def bitwise_and(a, b):
+ return a & b
+
+ @An.builtin
+ def two_values():
+ return 1, "Two"
+
+ # @An.macro_bind, @An.macro_void, @An.macro_fn, and @An.macro_builtin follow
+ # the same pattern.
+
+Going the other direction, values can be extracted from bindings in the session
+using the ``get`` method:
+
+.. code-block:: python
+
+ session.run('(define x 8)')
+ print(session.get('x')) # prints "8"
+
+If the extracted value is a built-in function or an Actinide procedure, it can
+be invoked like a Python function. However, much like ``eval`` and ``run``,
+Actinide functions returne a tuple of results rather than a single value:
+
+.. code-block:: python
+
+ session.run('''
+ (begin
+ ; Set a variable
+ (define x 5)
+
+ ; Define a function that reads the variable
+ (define (get-x) x))
+ ''')
+
+ get_x = session.get('get-x')
+ print(*get_x()) # prints "5"
+
+This two-way binding mechanism makes it straightforward to define interfaces
+between Actinide and the target domain.
diff --git a/docs/language.rst b/docs/language.rst
new file mode 100644
index 0000000..0296ec1
--- /dev/null
+++ b/docs/language.rst
@@ -0,0 +1,451 @@
+#################################
+The Actinide Programming Language
+#################################
+
+.. highlight:: scheme
+
+*****
+Forms
+*****
+
+* The basic unit of Actinide syntax.
+
+* The basic unit of Actinide evaluation.
+
+* A program is evaluated by *reducing* forms to produce a final value, applying
+ side effects during the reduction.
+
+* What does a form look like?
+
+ * Simple forms: literals (and intro to types)
+
+ * Integers: an optional leading - sign for negative numbers, followed by a
+ sequence of digits and underscores. Represent base-ten values with no
+ fractional part. Examples: ``10``, ``-2_049``.
+
+ * Decimals: an optional leading - sign for negative numbers, followed
+ by a sequence of digits and underscores, followed by a dot, followed
+ by a sequence of digits and underscores. Represent base-ten values
+ which may have a factional part. Examples: ``10.0``, ``-2_049.501_2``.
+
+ * Strings: any sequence of characters other than ``"`` or ``\``,
+ enclosed in matching ``"`` marks. The sequences ``\"`` and ``\\`` are
+ "escape" sequences, representing a ``"`` or ``\`` character
+ respectively. Examples: ``"Hello, world."``, ``"😡💩🚀"``, ``"Quoth
+ the raven, \"Four Oh Four.\""``.
+
+ * Booleans: the special values meaning "true" and "false." Preferred
+ for boolean logic operations. They are ``#t`` for true, and ``#f``
+ for false.
+
+ * Simple forms: symbols
+
+ * Sequences of letters, numbers, and symbols not surrounded by quotes,
+ and which are not themselves integers, decimals, or booleans.
+
+ * Represent variables and constants in the program's source.
+
+ * Evaluation of variables is covered later in this document.
+
+ * Examples: ``hello``, ``+``, ``my-🚀``.
+
+ * Special mention: comments
+
+ * Cannot appear *inside* a string.
+
+ * Starts with a ``;``, continues to the end of the line.
+
+ * Dropped from the program entirely.
+
+ * Examples: ``; this function is extremely spicy.``
+
+* Compound forms: conses and dotted pairs
+
+ * An opening ``(``, followed by a head subform, followed by spaces,
+ followed by a ``.``, followed by a tail subform, followed by a closing
+ ``)``.
+
+ * Represents a *cons* with the given head and tail.
+
+ * A dotted pair which appears as the head of a dotted pair does not require
+ an additional dot.
+
+ * Examples: ``(1 . 2)``, ``(1 2 . 3)``, ``(('ll . 'lr) . ('rl . 'rr))``
+
+* Compound forms: lists
+
+ * An opening ``(``, followed by a sequence of "subforms" separated by
+ spaces, followed by a closing ``)``.
+
+ * Subforms may be any form, including another list.
+
+ * Represent function application as well as special
+
+* Compound forms: quotes, quasiquotes, and unqoute forms.
+
+ * The forms ``'``, `````, ``,``, and ``,@`` represent "quote" forms.
+
+ * These are discussed in detail in the Macros section, later on.
+
+**********
+Evaluation
+**********
+
+* When an Actinide program is run, its top-level form is evaluated.
+
+* Evaluation of a form *reduces* the form to a simpler form - ideally, into a
+ list of atomic values.
+
+* Every kind of form can be evaluated.
+
+
+******************
+Simple Expressions
+******************
+
+* Evaluation of literals:
+
+ * Integers, decimals, strings, booleans evaluate to themselves. The number
+ ``1`` or the string ``"Hello, world"`` cannot be reduced any further.
+
+* Evaluation of symbols:
+
+ * The symbol is replaced by a value from the current *environment*.
+
+ * This is analogous to variable expansion in other languages.
+
+ * How do environments get their values? Defines and function application,
+ covered below.
+
+ * Certain symbols are *built in* and are automatically given values in the
+ top-level environment before the program starts.
+
+********************
+Function Application
+********************
+
+* Evaluation of lists:
+
+ * A list that starts with the symbols ``begin``, ``values``, ``if``,
+ ``lambda``, ``define``, ``define-macro``, or ``quote`` is a *special
+ form*, which are covered separately.
+
+ * Any other list represents a function application.
+
+* The subforms representing the list elements are evaluated first, left to
+ right.
+
+* If the first subform evaluates to a procedure
+
+* During application:
+
+ * A new *child* environment is created, with the names of the arguments
+ bound to the values from the function application expression. The
+ environment captured by the procedure is the *parent* of this new
+ environment: any name not found in the *child* environment will be looked
+ up in the parent environment instead.
+
+ * The body of the function is run as a program, in the newly-created
+ environment.
+
+ * The result of the last form in the function is the result of the function
+ application.
+
+*************
+Special Forms
+*************
+
+Lists that begin with one of the following symbols are evaluated specially.
+
+* ``begin``: A ``begin`` form evaluates a sequence of subforms, reducing to the
+ result of the last subform in the sequence. Example:
+
+ ::
+
+ (begin
+ ; define a function...
+ (define (f) 1)
+ ; ...and call it
+ (f))
+
+ The forms whose results are discarded are still evaluated for their side
+ effects.
+
+* ``values``: A ``values`` form evaluates a sequence of subforms, then reduces
+ to those values in the context of the containing form. This allows functions
+ to return multiple values. Example:
+
+ ::
+
+ (begin
+ (define (two x) (values x x))
+ (= (two 53)))
+
+ The ``two`` function returns two values, which are placed in the argument
+ positions for the ``=`` function. This program reduces to ``#t`` if run,
+ and defines ``two`` as a side effect.
+
+* ``if``: An ``if`` form must include a ``cond`` subform producing exactly one
+ value, and either one or two consequent subforms (named ``true`` and
+ ``false`` subforms in this document).
+
+ * The ``if`` form first evaluates the ``cond`` subform.
+
+ * If it evaluates to a true value (``#t``, a non-zero integer, a non-zero
+ decimal, a non-empty string, or a non-nil ``cons``), then the ``if``
+ form evaluates the ``true`` subform.
+
+ * If the ``cond`` subform evaluates to a false value (any other value),
+ then the ``if`` form evaluates the ``false`` subform.
+
+ * If the ``if`` form does not have a ``false`` subform, the ``if`` form
+ evaluates to ``nil`` when the ``cond`` subform evaluates to a false
+ value.
+
+ * Examples: ``(if #t 1)`` (always equal to ``1``), ``(if some-var "okay"
+ "failure")``.
+
+* ``lambda``: A ``lambda`` form defines a procedure, and evaluates to a
+ procedure value which can be used to apply the newly-defined procedure.
+
+ * Must include a ``formals`` subform, which is generally a list of argument
+ names (as symbols). If the formals subform is a bare symbol, or a dotted
+ pair whose tail is a symbol, the function has variable arity, and all
+ arguments not assigned to a name from the formals list are collected into
+ a list and bound to that symbol.
+
+ * May include a sequence of body subforms, which are evaluated in order (as
+ if by ``begin``) whenever the function is applied.
+
+ * Functions capture the environment in effect when they are defined.
+ Symbols within the function body can refer to names defined in the
+ surrounding lexical context.
+
+ * Function bodies are evaluated in a new environment for each application,
+ with the symbols representing the arguments bound to the corresponding
+ values in the function application form.
+
+ * Examples:
+
+ ::
+
+ (lambda () 1)
+
+ This defines a constant function (which takes no arguments) whose
+ evaluation is always 1.
+
+ ::
+
+ (begin
+ (define x 5)
+ (lambda () x))
+
+ This defines a constant function whose evaluation is always the value of
+ ``x`` in the top-level environment (initially 5).
+
+ ::
+
+ (lambda (a b) (+ a b))
+
+ This defines a binary function (which takes two arguments) whose
+ evaluation is the sum of those arguments. This is a simple replacement
+ for the ``+`` function itself, but it illustrates the idea that functions
+ can include other functions.
+
+ ::
+
+ (lambda (a . b) b)
+
+ This defines a function which takes one or more arguments, whose
+ evaluation is the list of arguments other than the first.
+
+* ``define``: A ``define`` form sets the value of a new binding in the current
+ environment. This has two forms:
+
+ * ``(define symbol value)``: evaluates the ``value`` subform, and binds the
+ result to ``symbol`` in the current environment. Example:
+
+ ::
+
+ (begin
+ ; Bind x to a value
+ (define x 5)
+ ; Expands x in the same environment
+ x)
+
+ This program evaluates to ``5``.
+
+ * ``(define (name formals...) body...)``: defines a function and binds it
+ to ``name`` in the current environemnt.
+
+ This is expanded to an equivalent ``lambda`` form, within a ``define``
+ form binding the resulting procedure to ``name``. For example:
+
+ ::
+
+ (define (f a b) (+ a b))
+
+ is equivalent to
+
+ ::
+
+ (define f
+ (lambda (a b) (+ a b)))
+
+* ``define-macro``: This has the same syntaxes as the ``define`` form, but it
+ binds values to a special "macro table" which is used to transform code prior
+ to evaluation. Macros are described later in this document.
+
+* ``quote``: A ``quote`` form must have exactly one form in argument position.
+ It evaluates to exactly the argument form, without evaluating it. For example:
+
+ ::
+
+ (quote (+ 1 2))
+
+ evaluates to the list ``(+ 1 2)``. Quote forms are the easiest way to obtain
+ unevaluated symbols as values, and are an integral part of the Actinide macro
+ system.
+
+*******************
+Loops and Recursion
+*******************
+
+* To loop, a function must recurse. Actinide has no looping primitives other
+ than function application.
+
+* Actinide guarantees that functions that recurse in tail position, either
+ directly or indirectly, can recurse indefinitely.
+
+* What is tail position?
+
+ * Function bodies: the final form of the function is in tail position with
+ respect to the function.
+
+ * ``begin`` forms: the final subform is in tail position with respect to
+ the ``begin`` form.
+
+ * ``if`` forms: the ``true`` subform is in tail position with respect to
+ the ``if`` form if the ``cond`` subform reduces to a true value. The
+ ``false`` subform is in tail position with respect to the ``if`` form if
+ the ``cond`` subform reduces to a false value.
+
+ * If a form is in tail position with respect to its containing form, it is
+ in tail position with respect to *that* form's containing form, and so
+ on, out to the nearest ``lambda`` body or to the top level of the program.
+
+* Example:
+
+ * A simple, non-tail recursive factorial:
+
+ ::
+
+ (define (factorial n)
+ (if (= n 1)
+ 1
+ (* n (factorial (- n 1)))))
+
+ The ``factorial`` function *is not* called in tail position with respect
+ to the body of the ``factorial`` function: After reducing that function
+ application, the reduction of the outer ``factorial`` application still
+ needs to apply the ``*`` function to the result.
+
+ Attempting to evaluate ``(factorial 1000)`` fails due to limits on call
+ depth: ``maximum recursion depth exceeded while calling a Python object``
+
+ ::
+
+ (define (fact n a)
+ (if (= n 1)
+ a
+ (fact (- n 1) (* n a))))
+
+ The ``fact`` function *is* called in tail position with respect to the
+ body of ``fact``. Specifically, it is in tail position with respect to
+ the ``if`` form whenever ``n`` is not equal to ``1``, and the ``if`` form
+ is in tail position with respect to the body of the ``fact`` function.
+
+ Evaluating ``(fact 1000 1)`` correctly computes the factorial of ``1000``
+ on any machine with enough memory to store the result.
+
+******
+Macros
+******
+
+* Before Actinide evaluates a program, it *expands* a program.
+
+* Expansion replaces macros (defined by ``define-macro``, as above).
+
+* A *macro* is an Actinide procedure, as with ``lambda``, which accepts forms
+ as arguments and reduces to a new form.
+
+* Macros can be used to define new syntax.
+
+* Macro expansion is recursive: the result of expanding a macro is expanded
+ again, which allows macros to produce macro forms.
+
+* Example: The ``let-one`` macro defines a single local variable, with a known
+ value, and evaluates a body form in a temporary environment with that
+ variable bound.
+
+ ::
+
+ (define-macro (let-one binding body)
+ (begin
+ (define name (head binding))
+ (define val (head (tail binding)))
+ `((lambda (,name) ,body) ,val))))
+
+ To use this macro, apply it as if it were a function:
+
+ ::
+
+ (let-one (x 1) x)
+
+ The macro procedure accepts the forms ``(x 1)`` and ``x``, unevaluated, as
+ arguments, and substitutes them into a *quasiquoted* form, which is used as a
+ template. The three *unquoted* parts (``,name``, ``,body``, and ``val``) are
+ replaced by evaluating the symbols in the context of the macro procedure, and
+ expand to the relevant parts of the input forms.
+
+ The returned form is approximately
+
+ ::
+
+ ((lambda (x) x) 1)
+
+ and evaluates as such.
+
+ This program evaluates to 1, but *does not* bind ``x`` in the top-level
+ environment.
+
+* Actinide macros are *not hygienic*. A quoted symbol in the macro body will be
+ evaluated in the location where the macro is expanded, with full access to
+ the environment at that location. Similarly, symbols defined in the macro
+ will be fully visible to code running in the environment where the macro is
+ expanded.
+
+* Macros often use quote notation to build the returned form. Quote notation is
+ ultimately a sequence of ``quote`` forms. However, Actinide supports
+ *quasiquote* notation to simplify the creation of nested quoted forms
+ containing unquoted parts.
+
+ * A quasiquote form begins with `````. If the form contains no unquoted
+ parts, this will quasiquote each subform, terminating by quoting each
+ symbol or literal form and constructing a new list with the resulting
+ quoted forms. ```(a b)`` expands to ``('a 'b)``.
+
+ * Within a quasiquote form, an *unquote* form prevents the following form
+ from being quoted. An unquote form begins with ``,``, followed by a
+ single form (often, but not always, a single symbol). ```(a ,b c)``
+ expands to ``('a b 'c)``.
+
+ * Within a quasiquote form, an *unquote-splicing* form prevents the
+ following form from being quoted. An unquote-splicing form begins with
+ ``,@``, followed by a single form, which must evaluate to a list. The
+ elements of that list are grafted into the resulting form. Given
+ ``(define x (list 1 2))``, the form ```(a ,@x b)`` expands to ``('a 1 2
+ 'b)``.
+
+* Macros defined inside of a function body are not visible to the top-level
+ expander.
diff --git a/docs/security.rst b/docs/security.rst
new file mode 100644
index 0000000..4a42a2d
--- /dev/null
+++ b/docs/security.rst
@@ -0,0 +1,115 @@
+########
+Security
+########
+
+In the README, I made this strong claim:
+
+ It provides minimal safety features, but the restricted set of builtins
+ ensures that Actinide programs cannot gain access to the outside context of
+ the program. The worst they can do is waste CPU time, fill up RAM, and
+ drain your battery.
+
+This document expands on the underlying design choices used to support that goal.
+
+**************
+Specific Goals
+**************
+
+Out of the box, an Actinide ``Session`` must not:
+
+* Read information that is not part of its top-level environment or macro
+ table, which was not passed to it as an argument to a method or constructor
+ by the enclosing program.
+
+* Write information to any location other than its top-level environment, its
+ macro table, or the return value of a method or constructor called by the
+ enclosing program.
+
+* Cause system calls that perform IO of any kind.
+
+These goals imply that any externally-visible side effects of Actinide
+evaluation are entirely the responsibility of the enclosing program. A "pure"
+Actinide environment can only convert electricity into waste heat.
+
+*****
+Types
+*****
+
+Actinide has a carefully-selected list of built-in types, none of which expose
+primitive operations that can gain filesystem access or gain access to
+information outside of the Actinide session without without outside help.
+
+Atomic values other than symbols use the least-capable Python representations
+possible:
+
+* Python ``int`` objects, used to represent Actinide integers.
+
+* Python ``decimal.Decimal`` objects, used to represent Actinide decimals.
+
+* Python ``str`` objects, used to represent Actinide strings.
+
+* Python ``bool`` objects, used to represent Actinide booleans.
+
+* The Python ``None`` constant, used to represent the empty list.
+
+The remaining built-in types are represented using classes:
+
+* Instances of the ``actinide.types.Symbol`` class, used to represent Actinide
+ symbols. This class is a wrapper around a ``str`` and provides only exactly
+ the operations needed to act like a sybmol: access to the underlying string,
+ equality, hashing, and useful Python ``__str__`` and ``__repr__``
+ implementations for debugging.
+
+* Instances of the ``actinide.types.Cons`` class, representing Lisp conses and
+ list cells. This class is a ``namedtuple`` with two fields and no additional
+ methods. (This implies that conses are immutable, unlike most lisps, but this
+ is a semantic consideration and not a security consideration.)
+
+* Instances of the ``actinide.ports.Port`` class, which wraps an arbitrary
+ ``file`` object to restrict the operations available. The only built-in
+ mechanism for creating a ``Port`` creates one which wraps a string; it is not
+ possible to open port to a file, shared memory segment, pipe, process,
+ socket, or to any other OS resource using only built-in functions and types.
+
+* Instances of the ``actinide.types.Procedure`` class, which is considerably
+ more complex than the other types. This class handles the mechanics of
+ compiling and executing procedures, and allows calling procedure bodies from
+ arbitrary Python programs.
+
+ However, ``Procedure`` delegates all of its actual operations back to the
+ Actinide runtime, and has effectively the same capabilities as a top-level
+ Session.
+
+Finally, built-in functions are represented using Python's own function and
+method handle types. The only operations exposed on these are to call them as
+Actinide functions, or to extract a string representation including their names.
+
+*********
+Functions
+*********
+
+Actinide's built-in functions have been selected and implemented for simplicity
+of analysis. Actinide avoids exposing complex operations, and _particularly_
+avoids exposing any of Python's metaprogramming tools.
+
+Actinide - intentionally - exposes considerably fewer functions than a full
+Scheme implementation. Many Scheme functions are inappropriate for Actinide's
+use cases, but furthermore, keeping the function set small makes it much easier
+to analyze the security properties of the built-in functions.
+
+No built-in function unwraps any type other than as follows:
+
+* ``(string x)`` and ``(display x)``, where ``x`` is a Symbol, unwrap the
+ string contained in the symbol object.
+
+* ``(head c)``, ``(tail c)``, and ``(uncons c)``, where ``c`` is a Cons,
+ unwraps the head, tail, or both the head and the tail of a Cons.
+
+******
+Errors
+******
+
+Actinide does not support the full Scheme exception-handling system. Instead,
+operations which produce Python exceptions cause the computation to abort at
+that point. The exception passes up through the implementation untouched,
+allowing the enclosing program to determine how to handle it.