Background
Members of the PL lab at Tufts regularly implement embedded domain-specific languages (eDSLs) in Haskell:
- Karl Cronburg: ANTLR-like parsing eDSL
- Matt Ahrens: cyber-physical systems eDSL
- Jeanne-Marie Musca: multi-union data types (extensible records) eDSL
The problem
Embedded DSLs are difficult to design, implement, and maintain over time. eDSL creators want to:
- Preserve — the correctness of the design D in the implementation I
- Extend — both D and I to support new features
- Adhere — exclusively to D in I
- Recognize — source spans in I as features in D
Definitions
- Design D — a set of features, f ∈ D
- Implementation I — a set of source code spans, (s, e) ∈ I (start to end, in characters)
- impl(f, (s,e)) — span (s, e) implements feature f
- annot(s,e) — span (s, e) is a supporting annotation
- hidden(s,e) — the representation of (s, e) is hidden (inaccessible)
Proposed solution
Provide eDSL creators with a methodology that makes explicit where costs originate, relative to both the design D and the implementation I.
Running example: Type-Level Naturals (TLN)
Mark (a hypothetical language designer) wants to create an eDSL in Haskell with the ability to annotate Haskell types with natural numbers. He writes down the desired features:
- Human-readable natural number weights
- Attach weights to arbitrary expressions
- Sum weights across function application
He implements TLN using GHC's DataKinds and TypeFamilies metaprogramming facilities:
data Nat = Z | S Nat
type Weighted (w :: Nat) a = a
type family
(w1 :: Nat) :+: (w2 :: Nat) :: Nat
type instance Z :+: m = m
type instance (S n) :+: m = S (n :+: m)
($$) :: Weighted w1 (a -> b)
-> Weighted w2 a
-> Weighted (w1 :+: w2) b
($$) f a = f a
The TLN map
With the implementation in hand, Mark hand-draws a Markedly map depicting how meta-information flows through the code — connecting each element of I (Nat, Weighted, w :: Nat, $$, :+:) to the features in D it implements.
The map reveals which language elements are load-bearing for each design goal, and which are supporting infrastructure.
PEAR metrics
Markedly defines four cost metrics for a (D, I) pair:
- P (Preserve) — how many features in D are compromised in I?
∃f ∈ D | ∀(s,e) ∈ I, ¬impl(f, (s,e)) - E (Extend) — how many spans in I have inaccessible intermediate representations?
∃(s,e) ∈ I | hidden(s,e) - A (Adhere) — how many spans in I are merely supporting annotations for a feature in D?
∃(s,e) ∈ I, f ∈ D | impl(f,(s,e)) && annot(s,e) - R (Recognize) — is this source span in I present in D?
∃(s,e) ∈ I | ∀f ∈ D, ¬impl(f,(s,e))
TLN PEAR costs
Applying the metrics to the TLN implementation yields:
Element P E A R
Nat 1 · · ·
Weighted · · 1 ·
w :: Nat · · · 1
$$ · · 1 ·
:+: · · · 1
──────────────────────────────
cost(D,I) = 1 · 2 4
A weak or vague design D incurs high A and R costs. Prototypes of I sacrifice P in favor of E.
Takeaways
- Markedly fits into an iterative design-and-implementation workflow
- The map makes the cost of each design decision traceable and auditable
- A weak or vague D incurs high Adhere and Recognize costs
- Prototype implementations sacrifice Preserve in favor of Extend
Future work
- Markedly user study
- Source span highlighting IDE support
- Git integration
- Extensible (multi-union) data types
User study design
Programmers with functional programming experience will implement eDSLs over several months. Half will be given Markedly maps and asked to recompute PEAR costs after substantial progress.
Hypothesis: Markedly-enabled programmers will implement more design-preserving, extensible, adherent, and recognizable eDSLs.