Markedly: Mapping eDSL Implementation Costs

Matthew P. Ahrens, Karl Cronburg, and Jeanne-Marie Musca — META 2017. — Back to overview — PDF slides — Poster version

Background

Members of the PL lab at Tufts regularly implement embedded domain-specific languages (eDSLs) in Haskell:

Karl Cronburg: ANTLR-like parsing eDSL
Matt Ahrens: cyber-physical systems eDSL
Jeanne-Marie Musca: multi-union data types (extensible records) eDSL

The problem

Embedded DSLs are difficult to design, implement, and maintain over time. eDSL creators want to:

Preserve — the correctness of the design D in the implementation I
Extend — both D and I to support new features
Adhere — exclusively to D in I
Recognize — source spans in I as features in D

Definitions

Design D — a set of features, f ∈ D
Implementation I — a set of source code spans, (s, e) ∈ I (start to end, in characters)
impl(f, (s,e)) — span (s, e) implements feature f
annot(s,e) — span (s, e) is a supporting annotation
hidden(s,e) — the representation of (s, e) is hidden (inaccessible)

Proposed solution

Provide eDSL creators with a methodology that makes explicit where costs originate, relative to both the design D and the implementation I.

Running example: Type-Level Naturals (TLN)

Mark (a hypothetical language designer) wants to create an eDSL in Haskell with the ability to annotate Haskell types with natural numbers. He writes down the desired features:

Human-readable natural number weights
Attach weights to arbitrary expressions
Sum weights across function application

He implements TLN using GHC's DataKinds and TypeFamilies metaprogramming facilities:

data Nat = Z | S Nat
type Weighted (w :: Nat) a = a
type family
  (w1 :: Nat) :+: (w2 :: Nat) :: Nat
type instance Z :+: m = m
type instance (S n) :+: m = S (n :+: m)
($$) :: Weighted w1 (a -> b)
     -> Weighted w2 a
     -> Weighted (w1 :+: w2) b
($$) f a = f a

The TLN map

With the implementation in hand, Mark hand-draws a Markedly map depicting how meta-information flows through the code — connecting each element of I (Nat, Weighted, w :: Nat, $$, :+:) to the features in D it implements.

The map reveals which language elements are load-bearing for each design goal, and which are supporting infrastructure.

PEAR metrics

Markedly defines four cost metrics for a (D, I) pair:

P (Preserve) — how many features in D are compromised in I?
∃f ∈ D | ∀(s,e) ∈ I, ¬impl(f, (s,e))
E (Extend) — how many spans in I have inaccessible intermediate representations?
∃(s,e) ∈ I | hidden(s,e)
A (Adhere) — how many spans in I are merely supporting annotations for a feature in D?
∃(s,e) ∈ I, f ∈ D | impl(f,(s,e)) && annot(s,e)
R (Recognize) — is this source span in I present in D?
∃(s,e) ∈ I | ∀f ∈ D, ¬impl(f,(s,e))

TLN PEAR costs

Applying the metrics to the TLN implementation yields:

Element       P   E   A   R
Nat           1   ·   ·   ·
Weighted      ·   ·   1   ·
w :: Nat      ·   ·   ·   1
$$            ·   ·   1   ·
:+:           ·   ·   ·   1
──────────────────────────────
cost(D,I)  =  1   ·   2   4

A weak or vague design D incurs high A and R costs. Prototypes of I sacrifice P in favor of E.

Takeaways

Markedly fits into an iterative design-and-implementation workflow
The map makes the cost of each design decision traceable and auditable
A weak or vague D incurs high Adhere and Recognize costs
Prototype implementations sacrifice Preserve in favor of Extend

Future work

Markedly user study
Source span highlighting IDE support
Git integration
Extensible (multi-union) data types

User study design

Programmers with functional programming experience will implement eDSLs over several months. Half will be given Markedly maps and asked to recompute PEAR costs after substantial progress.

Hypothesis: Markedly-enabled programmers will implement more design-preserving, extensible, adherent, and recognizable eDSLs.