Testing fixes.

* Add a top-level test that roundtrips sequences of tokens. (This found a real bug. Thanks, Hypothesis!) * Remove type conversion from the tokenizer. This simplifies the code, and makes testing considerably easier. * Fix some bugs in string literal parsing (again: Thanks, Hypothesis!) Document the test cases, and the case-by-case strategy, better. This also involved prying apart some tests that cover multiple cases. Stop treating empty strings as if they were EOFs. (Thanks, Hypothesis!) fixup! Stop treating empty strings as if they were EOFs. (Thanks, Hypothesis!) Remove type conversion from the tokenizer. It turns out that this made the tokenizer harder to test, because it was doing too many things. The tokenizer now _only_ divides the input port into tokens, without parsing or converting those tokens. Fix up tests for fuck
author: Owen Jacobson <owen@grimoire.ca> 2017-11-10 01:25:30 -0500
committer: Owen Jacobson <owen@grimoire.ca> 2017-11-11 01:16:01 -0500
commit: e4fb8604aa2fc572a3aeeace1c32de7339d346b5 (patch)
tree: 9f58493ab73ada22943bf009b3a910c3236dca8d /tests/tokens.py
parent: f33c395f833567b665d14fe0c577799605e8091e (diff)
1 files changed, 90 insertions, 0 deletions
diff --git a/tests/tokens.py b/tests/tokens.py
new file mode 100644
index 0000000..0027fb2
--- /dev/null
+++ b/tests/tokens.py
@@ -0,0 +1,90 @@
+from hypothesis.strategies import just, one_of, characters, text, lists, tuples
+from hypothesis.strategies import composite, recursive
+
+# Generators for token families
+
+# Generates the `(` token.
+def open_parens():
+    return just('(')
+
+# Generates the ')' token.
+def close_parens():
+    return just(')')
+
+# Generates characters that are legal, unescaped, inside of a string.
+def string_bare_characters():
+    return characters(blacklist_characters='\\"')
+
+# Generates legal string escape sequences.
+def string_escaped_characters():
+    return one_of(just('"'), just('\\')).map(lambda c: '\\' + c)
+
+# Generates single-character string representations, including escapes.
+def string_characters():
+    return one_of(string_bare_characters(), string_escaped_characters())
+
+# Generates arbitrary string bodies (strings, without leading or trailing
+# quotes)
+def string_body():
+    return text(string_characters())
+
+# Generates legal strings.
+def strings():
+    return tuples(just('"'), string_body(), just('"')).map(lambda t: ''.join(t))
+
+# Generates characters which are legal within a symbol.
+def symbol_characters():
+    return characters(blacklist_characters=' \t\n();"')
+
+# Generates legal symbols.
+def symbols():
+    return text(symbol_characters(), min_size=1)
+
+# Generates single whitespace characters.
+def whitespace_characters():
+    return one_of(just('\n'), just(' '), just('\t'))
+
+# Generates a single token.
+def tokens():
+    return one_of(symbols(), strings(), open_parens(), close_parens())
+
+# Generates at least one character of whitespace.
+def whitespace():
+    return text(whitespace_characters(), min_size=1)
+
+# Generates characters which can legally appear inside of a comment (anything
+# but a newline).
+def comment_characters():
+    return characters(blacklist_characters='\n')
+
+# Generates a (possibly-empty) comment, terminated with a trailing newline.
+def comments():
+    return tuples(just(';'), text(comment_characters()), just('\n')).map(lambda t: ''.join(t))
+
+# Generates sequences which can be inserted between arbitrary pairs of tokens
+# without changing their meaning.
+def intertokens():
+    return one_of(comments(), whitespace())
+
+# Generate a pair such that the second element is a token, and joining the
+# elements with an empty string produces a string that tokenizes to the second
+# element.
+def spaced_tokens():
+    def spaced(strategy):
+        return tuples(intertokens(), strategy)
+    def unspaced(strategy):
+        return tuples(one_of(just(''), intertokens()), strategy)
+    def spaced_symbols():
+        return spaced(symbols())
+    def spaced_strings():
+        return unspaced(strings())
+    def spaced_open_parens():
+        return unspaced(open_parens())
+    def spaced_close_parens():
+        return unspaced(close_parens())
+
+    return one_of(spaced_symbols(), spaced_strings(), spaced_open_parens(), spaced_close_parens())
+
+# Generats a list of pairs as per spaced_token().
+def spaced_token_sequences():
+    return lists(spaced_tokens())
author	Owen Jacobson <owen@grimoire.ca>	2017-11-10 01:25:30 -0500
committer	Owen Jacobson <owen@grimoire.ca>	2017-11-11 01:16:01 -0500
commit	e4fb8604aa2fc572a3aeeace1c32de7339d346b5 (patch)
tree	9f58493ab73ada22943bf009b3a910c3236dca8d /tests/tokens.py
parent	f33c395f833567b665d14fe0c577799605e8091e (diff)