diff options
| author | Owen Jacobson <owen@grimoire.ca> | 2017-11-18 21:15:06 -0500 |
|---|---|---|
| committer | Owen Jacobson <owen@grimoire.ca> | 2017-11-18 21:15:06 -0500 |
| commit | 3cd4656282bdc779a2000a8d23c12b647a51796b (patch) | |
| tree | 70cfe92d657012602024f7a30576ac45ca41481d /docs | |
| parent | a7446365153e3a9bfe28d9b7129555756bdcfe3d (diff) | |
Sphinx setup
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/embedding.rst | 161 | ||||
| -rw-r--r-- | docs/language.rst | 451 | ||||
| -rw-r--r-- | docs/security.rst | 115 |
3 files changed, 727 insertions, 0 deletions
diff --git a/docs/embedding.rst b/docs/embedding.rst new file mode 100644 index 0000000..0bb9dcc --- /dev/null +++ b/docs/embedding.rst @@ -0,0 +1,161 @@ +################## +Embedding Actinide +################## + +Actinide is designed to be embedded into larger Python programs. It's possible +to call into Actinide, either by providing code to be evaluated, or by +obtaining builtin functions and procedures from Actinide and invoking them. + +The ``Session`` class is the basic building block of an Actinide integration. +Creating a session creates a number of resources associated with Actinide +evaluation: a symbol table for interning symbols, and an initial top-level +environment to evaluate code in, pre-populated with the Actinide standard +library. + +Executing Actinide programs in a session consists of two steps: reading the +program in from a string or an input port, and evaluating the resulting forms. +The following example illustrates a simple infinite loop: + +.. code-block:: python + + import actinide + + session = actinide.Session() + program = session.read(''' + (begin + ; define the factorial function + (define (factorial n) + (fact n 1)) + + ; define a tail-recursive factorial function + (define (fact n a) + (if (= n 1) + a + (fact (- n 1) (* n a)))) + + ; call them both + (factorial 100)) + ''') + + # Compute the factorial of 100 + result = session.eval(program) + +As a shorthand for this common sequence of operations, the Session exposes a +``run`` method: + +.. code-block:: python + + print(*session.run('(factorial 5)')) # prints "120" + +Callers can inject variables, including new builtin functions, into the initial +environment using the ``bind``, ``bind_void``, ``bind_fn``, and +``bind_builtin`` methods of the session. + +To bind a simple value, or to manually bind a wrapped builtin, call +``session.bind``: + +.. code-block:: python + + session.bind('var', 5) + print(*session.run('var')) # prints "5" + +To bind a function whose return value should be ignored, call ``bind_void``. +This will automatically determine the name to bind the function to: + +.. code-block:: python + + session.bind_void(print) + session.run('(print "Hello, world!")') # prints "Hello, world!" using Python's print fn + +To bind a function returning one value (most functions), call ``bind_fn``. This +will automatically determine the name to bind to: + +.. code-block:: python + + def example(): + return 5 + + session.bind_fn(example) + print(*session.run('(example)')) # prints "5" + +Finally, to bind a function returning a tuple of results, call +``bind_builtin``. This will automatically determine the name to bind to: + +.. code-block:: python + + def pair(): + return 1, 2 + + session.bind_builtin(pair) + print(*session.run('(pair)')) # prints "1 2" + +Actinide functions can return zero, one, or multiple values. As a result, the +``result`` returned by ``session.eval`` is a tuple, with one value per result. + +Actinide can bind Python functions, as well as bound and unbound methods, and +nearly any other kind of callable. Under the hood, Actinide uses a thin adapter +layer to map Python return values to Actinide value lists. The ``bind_void`` +helper ultimately calls that module's ``wrap_void`` to wrap the function, and +``bind_fn`` calls ``wrap_fn``. (Tuple-returning functions do not need to be +wrapped.) If you prefer to manually bind functions using ``bind``, they must be +wrapped appropriately. An equivalent set of methods, ``macro_bind``, +``macro_bind_void``, ``macro_bind_fn``, and ``macro_bind_builtin`` bind values +to entries in the top-level macro table, instead of the top-level environment, +and allow extension of the language's syntax. + +Finally, Actinide can bind specially-crafted Python modules. If a module +contains a top-level symbol named ``An`` (for the informal chemical symbol for +the actinide series), it can be passed to the session's ``bind_module`` method. +The symbol must be bound to an instance of the ``Registry`` class from the +``actinide.builtin`` module: + +.. code-block:: python + + from actinide.builtin import Registry + An = Registry() + + five = An.bind('five', 5) + + @An.void + def python_print(*args): + print(*args) + + @An.fn + def bitwise_and(a, b): + return a & b + + @An.builtin + def two_values(): + return 1, "Two" + + # @An.macro_bind, @An.macro_void, @An.macro_fn, and @An.macro_builtin follow + # the same pattern. + +Going the other direction, values can be extracted from bindings in the session +using the ``get`` method: + +.. code-block:: python + + session.run('(define x 8)') + print(session.get('x')) # prints "8" + +If the extracted value is a built-in function or an Actinide procedure, it can +be invoked like a Python function. However, much like ``eval`` and ``run``, +Actinide functions returne a tuple of results rather than a single value: + +.. code-block:: python + + session.run(''' + (begin + ; Set a variable + (define x 5) + + ; Define a function that reads the variable + (define (get-x) x)) + ''') + + get_x = session.get('get-x') + print(*get_x()) # prints "5" + +This two-way binding mechanism makes it straightforward to define interfaces +between Actinide and the target domain. diff --git a/docs/language.rst b/docs/language.rst new file mode 100644 index 0000000..0296ec1 --- /dev/null +++ b/docs/language.rst @@ -0,0 +1,451 @@ +################################# +The Actinide Programming Language +################################# + +.. highlight:: scheme + +***** +Forms +***** + +* The basic unit of Actinide syntax. + +* The basic unit of Actinide evaluation. + +* A program is evaluated by *reducing* forms to produce a final value, applying + side effects during the reduction. + +* What does a form look like? + + * Simple forms: literals (and intro to types) + + * Integers: an optional leading - sign for negative numbers, followed by a + sequence of digits and underscores. Represent base-ten values with no + fractional part. Examples: ``10``, ``-2_049``. + + * Decimals: an optional leading - sign for negative numbers, followed + by a sequence of digits and underscores, followed by a dot, followed + by a sequence of digits and underscores. Represent base-ten values + which may have a factional part. Examples: ``10.0``, ``-2_049.501_2``. + + * Strings: any sequence of characters other than ``"`` or ``\``, + enclosed in matching ``"`` marks. The sequences ``\"`` and ``\\`` are + "escape" sequences, representing a ``"`` or ``\`` character + respectively. Examples: ``"Hello, world."``, ``"😡💩🚀"``, ``"Quoth + the raven, \"Four Oh Four.\""``. + + * Booleans: the special values meaning "true" and "false." Preferred + for boolean logic operations. They are ``#t`` for true, and ``#f`` + for false. + + * Simple forms: symbols + + * Sequences of letters, numbers, and symbols not surrounded by quotes, + and which are not themselves integers, decimals, or booleans. + + * Represent variables and constants in the program's source. + + * Evaluation of variables is covered later in this document. + + * Examples: ``hello``, ``+``, ``my-🚀``. + + * Special mention: comments + + * Cannot appear *inside* a string. + + * Starts with a ``;``, continues to the end of the line. + + * Dropped from the program entirely. + + * Examples: ``; this function is extremely spicy.`` + +* Compound forms: conses and dotted pairs + + * An opening ``(``, followed by a head subform, followed by spaces, + followed by a ``.``, followed by a tail subform, followed by a closing + ``)``. + + * Represents a *cons* with the given head and tail. + + * A dotted pair which appears as the head of a dotted pair does not require + an additional dot. + + * Examples: ``(1 . 2)``, ``(1 2 . 3)``, ``(('ll . 'lr) . ('rl . 'rr))`` + +* Compound forms: lists + + * An opening ``(``, followed by a sequence of "subforms" separated by + spaces, followed by a closing ``)``. + + * Subforms may be any form, including another list. + + * Represent function application as well as special + +* Compound forms: quotes, quasiquotes, and unqoute forms. + + * The forms ``'``, `````, ``,``, and ``,@`` represent "quote" forms. + + * These are discussed in detail in the Macros section, later on. + +********** +Evaluation +********** + +* When an Actinide program is run, its top-level form is evaluated. + +* Evaluation of a form *reduces* the form to a simpler form - ideally, into a + list of atomic values. + +* Every kind of form can be evaluated. + + +****************** +Simple Expressions +****************** + +* Evaluation of literals: + + * Integers, decimals, strings, booleans evaluate to themselves. The number + ``1`` or the string ``"Hello, world"`` cannot be reduced any further. + +* Evaluation of symbols: + + * The symbol is replaced by a value from the current *environment*. + + * This is analogous to variable expansion in other languages. + + * How do environments get their values? Defines and function application, + covered below. + + * Certain symbols are *built in* and are automatically given values in the + top-level environment before the program starts. + +******************** +Function Application +******************** + +* Evaluation of lists: + + * A list that starts with the symbols ``begin``, ``values``, ``if``, + ``lambda``, ``define``, ``define-macro``, or ``quote`` is a *special + form*, which are covered separately. + + * Any other list represents a function application. + +* The subforms representing the list elements are evaluated first, left to + right. + +* If the first subform evaluates to a procedure + +* During application: + + * A new *child* environment is created, with the names of the arguments + bound to the values from the function application expression. The + environment captured by the procedure is the *parent* of this new + environment: any name not found in the *child* environment will be looked + up in the parent environment instead. + + * The body of the function is run as a program, in the newly-created + environment. + + * The result of the last form in the function is the result of the function + application. + +************* +Special Forms +************* + +Lists that begin with one of the following symbols are evaluated specially. + +* ``begin``: A ``begin`` form evaluates a sequence of subforms, reducing to the + result of the last subform in the sequence. Example: + + :: + + (begin + ; define a function... + (define (f) 1) + ; ...and call it + (f)) + + The forms whose results are discarded are still evaluated for their side + effects. + +* ``values``: A ``values`` form evaluates a sequence of subforms, then reduces + to those values in the context of the containing form. This allows functions + to return multiple values. Example: + + :: + + (begin + (define (two x) (values x x)) + (= (two 53))) + + The ``two`` function returns two values, which are placed in the argument + positions for the ``=`` function. This program reduces to ``#t`` if run, + and defines ``two`` as a side effect. + +* ``if``: An ``if`` form must include a ``cond`` subform producing exactly one + value, and either one or two consequent subforms (named ``true`` and + ``false`` subforms in this document). + + * The ``if`` form first evaluates the ``cond`` subform. + + * If it evaluates to a true value (``#t``, a non-zero integer, a non-zero + decimal, a non-empty string, or a non-nil ``cons``), then the ``if`` + form evaluates the ``true`` subform. + + * If the ``cond`` subform evaluates to a false value (any other value), + then the ``if`` form evaluates the ``false`` subform. + + * If the ``if`` form does not have a ``false`` subform, the ``if`` form + evaluates to ``nil`` when the ``cond`` subform evaluates to a false + value. + + * Examples: ``(if #t 1)`` (always equal to ``1``), ``(if some-var "okay" + "failure")``. + +* ``lambda``: A ``lambda`` form defines a procedure, and evaluates to a + procedure value which can be used to apply the newly-defined procedure. + + * Must include a ``formals`` subform, which is generally a list of argument + names (as symbols). If the formals subform is a bare symbol, or a dotted + pair whose tail is a symbol, the function has variable arity, and all + arguments not assigned to a name from the formals list are collected into + a list and bound to that symbol. + + * May include a sequence of body subforms, which are evaluated in order (as + if by ``begin``) whenever the function is applied. + + * Functions capture the environment in effect when they are defined. + Symbols within the function body can refer to names defined in the + surrounding lexical context. + + * Function bodies are evaluated in a new environment for each application, + with the symbols representing the arguments bound to the corresponding + values in the function application form. + + * Examples: + + :: + + (lambda () 1) + + This defines a constant function (which takes no arguments) whose + evaluation is always 1. + + :: + + (begin + (define x 5) + (lambda () x)) + + This defines a constant function whose evaluation is always the value of + ``x`` in the top-level environment (initially 5). + + :: + + (lambda (a b) (+ a b)) + + This defines a binary function (which takes two arguments) whose + evaluation is the sum of those arguments. This is a simple replacement + for the ``+`` function itself, but it illustrates the idea that functions + can include other functions. + + :: + + (lambda (a . b) b) + + This defines a function which takes one or more arguments, whose + evaluation is the list of arguments other than the first. + +* ``define``: A ``define`` form sets the value of a new binding in the current + environment. This has two forms: + + * ``(define symbol value)``: evaluates the ``value`` subform, and binds the + result to ``symbol`` in the current environment. Example: + + :: + + (begin + ; Bind x to a value + (define x 5) + ; Expands x in the same environment + x) + + This program evaluates to ``5``. + + * ``(define (name formals...) body...)``: defines a function and binds it + to ``name`` in the current environemnt. + + This is expanded to an equivalent ``lambda`` form, within a ``define`` + form binding the resulting procedure to ``name``. For example: + + :: + + (define (f a b) (+ a b)) + + is equivalent to + + :: + + (define f + (lambda (a b) (+ a b))) + +* ``define-macro``: This has the same syntaxes as the ``define`` form, but it + binds values to a special "macro table" which is used to transform code prior + to evaluation. Macros are described later in this document. + +* ``quote``: A ``quote`` form must have exactly one form in argument position. + It evaluates to exactly the argument form, without evaluating it. For example: + + :: + + (quote (+ 1 2)) + + evaluates to the list ``(+ 1 2)``. Quote forms are the easiest way to obtain + unevaluated symbols as values, and are an integral part of the Actinide macro + system. + +******************* +Loops and Recursion +******************* + +* To loop, a function must recurse. Actinide has no looping primitives other + than function application. + +* Actinide guarantees that functions that recurse in tail position, either + directly or indirectly, can recurse indefinitely. + +* What is tail position? + + * Function bodies: the final form of the function is in tail position with + respect to the function. + + * ``begin`` forms: the final subform is in tail position with respect to + the ``begin`` form. + + * ``if`` forms: the ``true`` subform is in tail position with respect to + the ``if`` form if the ``cond`` subform reduces to a true value. The + ``false`` subform is in tail position with respect to the ``if`` form if + the ``cond`` subform reduces to a false value. + + * If a form is in tail position with respect to its containing form, it is + in tail position with respect to *that* form's containing form, and so + on, out to the nearest ``lambda`` body or to the top level of the program. + +* Example: + + * A simple, non-tail recursive factorial: + + :: + + (define (factorial n) + (if (= n 1) + 1 + (* n (factorial (- n 1))))) + + The ``factorial`` function *is not* called in tail position with respect + to the body of the ``factorial`` function: After reducing that function + application, the reduction of the outer ``factorial`` application still + needs to apply the ``*`` function to the result. + + Attempting to evaluate ``(factorial 1000)`` fails due to limits on call + depth: ``maximum recursion depth exceeded while calling a Python object`` + + :: + + (define (fact n a) + (if (= n 1) + a + (fact (- n 1) (* n a)))) + + The ``fact`` function *is* called in tail position with respect to the + body of ``fact``. Specifically, it is in tail position with respect to + the ``if`` form whenever ``n`` is not equal to ``1``, and the ``if`` form + is in tail position with respect to the body of the ``fact`` function. + + Evaluating ``(fact 1000 1)`` correctly computes the factorial of ``1000`` + on any machine with enough memory to store the result. + +****** +Macros +****** + +* Before Actinide evaluates a program, it *expands* a program. + +* Expansion replaces macros (defined by ``define-macro``, as above). + +* A *macro* is an Actinide procedure, as with ``lambda``, which accepts forms + as arguments and reduces to a new form. + +* Macros can be used to define new syntax. + +* Macro expansion is recursive: the result of expanding a macro is expanded + again, which allows macros to produce macro forms. + +* Example: The ``let-one`` macro defines a single local variable, with a known + value, and evaluates a body form in a temporary environment with that + variable bound. + + :: + + (define-macro (let-one binding body) + (begin + (define name (head binding)) + (define val (head (tail binding))) + `((lambda (,name) ,body) ,val)))) + + To use this macro, apply it as if it were a function: + + :: + + (let-one (x 1) x) + + The macro procedure accepts the forms ``(x 1)`` and ``x``, unevaluated, as + arguments, and substitutes them into a *quasiquoted* form, which is used as a + template. The three *unquoted* parts (``,name``, ``,body``, and ``val``) are + replaced by evaluating the symbols in the context of the macro procedure, and + expand to the relevant parts of the input forms. + + The returned form is approximately + + :: + + ((lambda (x) x) 1) + + and evaluates as such. + + This program evaluates to 1, but *does not* bind ``x`` in the top-level + environment. + +* Actinide macros are *not hygienic*. A quoted symbol in the macro body will be + evaluated in the location where the macro is expanded, with full access to + the environment at that location. Similarly, symbols defined in the macro + will be fully visible to code running in the environment where the macro is + expanded. + +* Macros often use quote notation to build the returned form. Quote notation is + ultimately a sequence of ``quote`` forms. However, Actinide supports + *quasiquote* notation to simplify the creation of nested quoted forms + containing unquoted parts. + + * A quasiquote form begins with `````. If the form contains no unquoted + parts, this will quasiquote each subform, terminating by quoting each + symbol or literal form and constructing a new list with the resulting + quoted forms. ```(a b)`` expands to ``('a 'b)``. + + * Within a quasiquote form, an *unquote* form prevents the following form + from being quoted. An unquote form begins with ``,``, followed by a + single form (often, but not always, a single symbol). ```(a ,b c)`` + expands to ``('a b 'c)``. + + * Within a quasiquote form, an *unquote-splicing* form prevents the + following form from being quoted. An unquote-splicing form begins with + ``,@``, followed by a single form, which must evaluate to a list. The + elements of that list are grafted into the resulting form. Given + ``(define x (list 1 2))``, the form ```(a ,@x b)`` expands to ``('a 1 2 + 'b)``. + +* Macros defined inside of a function body are not visible to the top-level + expander. diff --git a/docs/security.rst b/docs/security.rst new file mode 100644 index 0000000..4a42a2d --- /dev/null +++ b/docs/security.rst @@ -0,0 +1,115 @@ +######## +Security +######## + +In the README, I made this strong claim: + + It provides minimal safety features, but the restricted set of builtins + ensures that Actinide programs cannot gain access to the outside context of + the program. The worst they can do is waste CPU time, fill up RAM, and + drain your battery. + +This document expands on the underlying design choices used to support that goal. + +************** +Specific Goals +************** + +Out of the box, an Actinide ``Session`` must not: + +* Read information that is not part of its top-level environment or macro + table, which was not passed to it as an argument to a method or constructor + by the enclosing program. + +* Write information to any location other than its top-level environment, its + macro table, or the return value of a method or constructor called by the + enclosing program. + +* Cause system calls that perform IO of any kind. + +These goals imply that any externally-visible side effects of Actinide +evaluation are entirely the responsibility of the enclosing program. A "pure" +Actinide environment can only convert electricity into waste heat. + +***** +Types +***** + +Actinide has a carefully-selected list of built-in types, none of which expose +primitive operations that can gain filesystem access or gain access to +information outside of the Actinide session without without outside help. + +Atomic values other than symbols use the least-capable Python representations +possible: + +* Python ``int`` objects, used to represent Actinide integers. + +* Python ``decimal.Decimal`` objects, used to represent Actinide decimals. + +* Python ``str`` objects, used to represent Actinide strings. + +* Python ``bool`` objects, used to represent Actinide booleans. + +* The Python ``None`` constant, used to represent the empty list. + +The remaining built-in types are represented using classes: + +* Instances of the ``actinide.types.Symbol`` class, used to represent Actinide + symbols. This class is a wrapper around a ``str`` and provides only exactly + the operations needed to act like a sybmol: access to the underlying string, + equality, hashing, and useful Python ``__str__`` and ``__repr__`` + implementations for debugging. + +* Instances of the ``actinide.types.Cons`` class, representing Lisp conses and + list cells. This class is a ``namedtuple`` with two fields and no additional + methods. (This implies that conses are immutable, unlike most lisps, but this + is a semantic consideration and not a security consideration.) + +* Instances of the ``actinide.ports.Port`` class, which wraps an arbitrary + ``file`` object to restrict the operations available. The only built-in + mechanism for creating a ``Port`` creates one which wraps a string; it is not + possible to open port to a file, shared memory segment, pipe, process, + socket, or to any other OS resource using only built-in functions and types. + +* Instances of the ``actinide.types.Procedure`` class, which is considerably + more complex than the other types. This class handles the mechanics of + compiling and executing procedures, and allows calling procedure bodies from + arbitrary Python programs. + + However, ``Procedure`` delegates all of its actual operations back to the + Actinide runtime, and has effectively the same capabilities as a top-level + Session. + +Finally, built-in functions are represented using Python's own function and +method handle types. The only operations exposed on these are to call them as +Actinide functions, or to extract a string representation including their names. + +********* +Functions +********* + +Actinide's built-in functions have been selected and implemented for simplicity +of analysis. Actinide avoids exposing complex operations, and _particularly_ +avoids exposing any of Python's metaprogramming tools. + +Actinide - intentionally - exposes considerably fewer functions than a full +Scheme implementation. Many Scheme functions are inappropriate for Actinide's +use cases, but furthermore, keeping the function set small makes it much easier +to analyze the security properties of the built-in functions. + +No built-in function unwraps any type other than as follows: + +* ``(string x)`` and ``(display x)``, where ``x`` is a Symbol, unwrap the + string contained in the symbol object. + +* ``(head c)``, ``(tail c)``, and ``(uncons c)``, where ``c`` is a Cons, + unwraps the head, tail, or both the head and the tail of a Cons. + +****** +Errors +****** + +Actinide does not support the full Scheme exception-handling system. Instead, +operations which produce Python exceptions cause the computation to abort at +that point. The exception passes up through the implementation untouched, +allowing the enclosing program to determine how to handle it. |
