Cellular:

A proposal for better spreadsheets

James Geddes

May 2016

The West Coast Mainline débâcle

Photo: Aaron Roberts https://www.flickr.com/photos/aaronsrailwayphots/ (CC BY-NC 2.0)
Photo: Aaron Roberts https://www.flickr.com/photos/aaronsrailwayphots/ (CC BY-NC 2.0)

It's not just government


Carmen M. Reinhart, Kenneth S. Rogoff (2010). "Growth in a Time of Debt". American Economic Review 100 (2): 573–78.


“George Osborne’s favourite ‘godfathers of austerity’ economists admit to making error in research” — the Mirror

“The Rogoff-Reinhart data scandal reminds us economists aren’t gods” — The Guardian

“The Excel Depression” — Paul Krugman

And some limitations are fundamental

Source: DECC, Performance and Impact of the Feed-In Tariff Scheme: Review of Evidence
Source: DECC, Performance and Impact of the Feed-In Tariff Scheme: Review of Evidence
Cartoon by Randall Munroe, http://xkcd.com/1667/ (cc by-nc 2.5)
Cartoon by Randall Munroe, http://xkcd.com/1667/ (cc by-nc 2.5)

Background to spreadsheet modelling

The challenge of doing better

Cellular

Uncertainty

Background to spreadsheet modelling

VisiCalc: the first ‘killer app’

Screenshot of VisiCalc on the Apple II. Photo: apple2history.org
Screenshot of VisiCalc on the Apple II. Photo: apple2history.org

Excel has been dominant for 20 years

A history of spreadsheets. Source: Google ngrams.
A history of spreadsheets. Source: Google ngrams.

The challenge of doing better

Reinhart and Rogoff’s spreadsheet. Source: Thomas Herndon, Michael Ash, and Robert Pollin via Mike Konczal
Reinhart and Rogoff’s spreadsheet. Source: Thomas Herndon, Michael Ash, and Robert Pollin via Mike Konczal

How do we make good software anyway?

The art of programming is the art of organising complexity [...] — Edsger W. Dijkstra (1930–2002)

Controlling complexity is the essence of computer programming. — Brian Kernighan (1942–)

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. — C. A. R. Hoare (1934–)

Modularity and abstraction

Complex Modular Abstract
Decompose the system into separate, weakly-coupled parts, with well-defined interfaces between them.

“Abstraction principle: Each significant piece of functionality in a program should be implemented in just one place in the source code. Where similar functions are carried out by distinct pieces of code, it is generally beneficial to combine them into one by abstracting out the varying parts.”

— Benjamin C. Pierce, Types and Programming Languages

Spreadsheets kind of support modularity

  • Worksheets are often used to separate concerns

  • Columns are often used to group elements of a repeated calculation

  • But: it's hard to define the "interfaces"

    =IFERROR(INDEX(INDIRECT($C10 & ".Outputs[" & this.Year & "]"),
               MATCH(G$5, INDIRECT($C10 & ".Outputs[Vector]"), 0)), 0)

Spreadsheets do not support abstraction

  • Operations are repeated

  • Large formulae are repeated

    =IFERROR(INDEX(INDIRECT($C10 & ".Outputs[" & this.Year & "]"),
                   MATCH(G$5, INDIRECT($C10 & ".Outputs[Vector]"), 0)), 0)
  • No structured data, higher-order functions, ...

So why do people use spreadsheets?

  1. Spreadsheets are incremental

    • Start small, gradually get better
    • Even the basic stuff is useful (eg, making tables)
    • No big jumps in learning
  2. Spreadsheets are immediate

    • Changes are immediately reflected in the output
    • Faster learning, quicker debugging
  3. ... and, perhaps, because abstraction is not easy

The plan

  1. Observe that spreadsheets are already a lot like a simplistic programming language

  2. Write this language explicitly, so that spreadsheets are written as programs

    • whilst keeping the benefit/cost tradeoff positive!
  3. Generalise the language to include more powerful features

    • without losing the ease of use!

Cellular

Every spreadsheet is a term graph

   |       A         |      B     |
===+=================+============+
 1 | Salary          |      21000 |
 2 | Allowance       |      11000 |
 3 | Taxable income  |  = B1 - B2 |
 4 | Basic rate      |        0.2 |
 5 | Tax payable     |  = B3 * B4 |

Very simple term graph: only values and built-in functions. No user-defined types, no lambda abstractions, no first-class functions, no iteration or recursion.

Term graphs are programs

Topological sort:
Then write out each step:
B1 := 21000
B2 := 11000
B3 := B1 - B2
B4 := 0.20
B5 := B3 * B4

Poor choice of surface syntax for users: too many definitions. But perhaps useful as a target language which can then be used to generate any particular spreadsheet (Excel, Numbers, ...)

A possible language hierarchy

cell Full language, supporting abstraction and modularity. Ideally, a superset of nocell.
nocell A language with the same expressive power as a spreadsheet (and no more) but a more convenient surface syntax.
grid An intermediate representation of a spreadsheet, from which a number of backends could produce specific formats.

(No probabilistic elements yet.)

What's the surface syntax of cell/nocell?

  • Perhaps a conventional function syntax?

    times(0.20, minus(21000, 11000))
  • But spreadsheet modellers tend to think in terms of a "data-flow" model: Here's some data; here's the operation; here's the result; here's an operation on that ...

  • cf. the R package ‘magrittr’, which allows one to write, eg,

    data %>% filter() %>% summarise()
  • => Can we retain this style of interactive / incremental programming?

Alternative forms of surface syntax

Why not just write the spreadsheet as if you were writing a spreadsheet?

Postfix Spreadsheet-y
21000      ; Salary
11000      ; Personal allowance
-          ; = Taxable income
0.20       ; Basic rate
*          ; = Tax payable 
21000      ; Salary
11000 -    ; less, Personal allowance
=          ; Taxable income
 0.20 *    ; Basic rate
=          ; Tax payable
  • No names (‘;’ means ‘comment’)

  • Order parallels order of construction of spreadsheet

  • Lends itself to building “collections of data” in an interactive manner (see later)

  • But postfix languages haven't caught on. (Why not?)
  • Reminiscent of a printing calculator

  • ‘=’ here means “don’t do anything, but let me label this cell in the comments”

  • Other ideas for usable syntax:
    • ‘=’ means “function application” (like Excel?!)
    • ‘(...)’ means “... is in infix notation”

Open questions

  1. Does cell “compile” to nocell?

    • Unwrap loops, expand functions, ...
  2. Or is it a “nocell-constructing” language?

    • “Spreadsheets as values”

    • It would be ideal if, every time the user hits ‘return’, the spreadsheet-so-far is generated, and that spreadsheet is the same as the one obtained by compiling/running the program so far.

Layout: the other half of the challenge

Summary

cell nocell grid
Semantics Functional, first-class functions, rich set of data types, user-defined data types, Purely functional, primitive types only (numeric, string, boolean, perhaps vector), built-in functions only, names. Like nocell, but every expression assigned to a cell. Names changed to named ranges.
Surface syntax ?
100
200
300
sum
A1 := 100
A2 := 200
A3 := 300
A4 := SUM(A1, A2, A3)
Other Structures annotated with information to inform layout. Cells annotated with information to determine cell style.

Adding uncertainty to models

PPL = Probabilistic Programming Language
PPL = Probabilistic Programming Language
  • Use an existing language for stochastic simulation and inference, restricting class of models to those expressible in nocell

  • Add stochastic data with a few new VBA functions (eg, =NORMAL(0,1))

  • Output of PPL modelled as Monte Carlo runs on separate worksheet, with named ranges, eg, =MEDIAN(param1)

Challenges

  • Must make at least some tasks easier, eg,

    • Formatting tables (always a pain), "LaTeX for spreadsheets";

    • Make probabilistic calculations possible.

  • Must extend the "natural" language of spreadsheets

  • Must allow incremental transition from concrete to abstract

  • Must be interactive

    • eg, a "REPL" that generates a spreadsheet each time

Other proposals

  • User-centred functions (SPJ)

  • Gencel (templates for Excel)

  • ModelSheet Authoring (specific kinds of models)

  • Tabular

  • Improv

  • Scenarios (FW)

All of these use the spreadsheet as the GUI

UNUSED SLIDES

Other kinds of models

  • Maximisation

  • Dynamic simulation

  • Agent-based simulation

  • Equilibrium-finding

  • “Full” Bayesian inference