In 2020, our digital world and the software we use to create it are a towering structure, built upon countless layers of abstraction and building blocks — just think about all the translations and interactions that occur from loading a webpage. Whilst abstraction is undoubtedly a great thing, it only works if we’re building on solid ground; if the lower levels are stable and fast. What does that mean in practice? It means low-level, compiled languages, which can be heavily optimised and leveraged to make the most of computer hardware. One of the giants in this area was Frances Allen, who recently passed away in early August. Described by IBM as “a pioneer in compiler organization and optimization algorithms,” she made numerous significant contributions to the field.
Early Days
Trained as a maths teacher, Allen worked at a high school in New York for two years after graduating. She went back to complete a Masters in mathematics and was recruited by IBM Research on campus. Though planning to only stay long enough to pay off her debt and quickly return to teaching, she found herself staying with IBM for the rest of her career, even becoming the first female IBM fellow in 1989.
Allen’s first role at IBM was teaching internal engineers and scientists how to use FORTRAN — apparently a difficult sell to people who at the time were used to programming in assembly, which did just fine thank you very much. In an interview, Allen talks about the resistance from scientists who thought it wasn’t possible for a compiled language to produce code that was good enough.
After teaching, Allen began working on the compiler for a 100 kW “supercomputer” called Stretch. With 2048 kB of memory, the goal of Stretch was to be 100 times faster than any other system available at the time. Though this ultimately failed (to the dismay of a few clients, one finding Stretch took 18 hours to produce their 24 hour weather forecast), it caught the attention of the NSA.
Because of this, IBM designed a coprocessor addon, Harvest, specifically for codebreaking at the NSA. Harvest ended up being larger than Stretch itself, and Allen spent a year leading a team inside the NSA, working on highly classified projects. The team didn’t find out many things about what they were working on until they were leaked to the press (it was spying on the Soviet Union — no prizes for guessing).
An engineering feat, Harvest used a unique streaming architecture for code-breaking: information loaded onto IBM Tractor tape was continuously fed into memory, processed and output in real time, with perfectly synchronised IO and operations. Harvest could process 3 million characters a second and was estimated by the NSA to be 50-200 times faster than anything else commercially available. The project was extremely successful and was used for 14 years after installation, an impressive feat given the pace of technological advancement at the time.
Speed is of the Essence
The success of the project was in large part due to Allen’s work on the optimisations performed by its compiler. Compiler optimisations are magic. Some of us think of compilers as simple “source code in, machine code out” boxes, but much of their complexity lies in the entirely automatic suite of optimisations and intermediate steps they use to ensure your code runs as swiftly as possible. Of course, this was important for the limited hardware at the time, but the techniques that Allen helped develop are present in modern compilers everywhere. The New York Times quotes Graydon Hoare (the creator of Rust and one of today’s most famed compiler experts) as saying that Allen’s work is in “every app, every website, every video game or communication system, every government or bank computer, every onboard computer in a car or aircraft”.
So what do compiler optimisations actually look like? Allen wrote many influential papers on the subject, but “A catalogue of optimizing transformations” which she co-authored with John Cocke in 1972 was particularly seminal. It aims to “systematize the potpourri of optimizing transformations that a compiler can make to a program”. It has been said that compilers that implement just the eight main techniques from this paper can achieve 80% of best-case performance. Here are some of the most basic ideas:
- Procedure integration: replacing calls to subprocedures with inline code where possible, avoiding saving/restoring registers
- Loop unrolling: flattening loops by writing out statements explicitly, avoiding unnecessary comparison conditions
- CSE (common subexpression elimination): eliminating redundant computations which calculate values already available
- Code Motion: moving subexpressions out of loops where it is safe to do so
- Peephole optimisation: replacing known instruction combinations with more efficient variants
Some of these might seem obvious to us now, but formalising and standardising these ideas at the time had great impact.
Parallelism
Allen’s last major project for IBM was PTRAN, the Parallel Translator. This was a system for automatic parallelism, a special type of compiler optimisation. The aim was to take programs that weren’t written with parallelism in mind and translate them for execution on parallel architectures. This concept of taking sequentially written code and automatically extracting features from it to be run in parallel led to the extensive use of dependency graphs, now a standard representation in compilers. One of the recurring themes throughout Allen’s career was her ability to take highly technical problems and abstract them into maths — often graphs and sets — and solve them precisely. On this project Allen led a team of young engineers, churning out industry leading papers and compilers for parallelism for 15 years.
IBM Academy and Beyond
In 1995 Allen became president of the IBM Academy, an internal steering group of IBM’s very top technical minds. She was able to use the position to advocate in two areas: mentoring and women in technology. In interviews, she frequently talked about how she didn’t have a mentor, and how important it is for people starting out in tech. Her visibility as an expert in the field inspired others — at its peak in the 70s/80s, half of the IBM experimental compiler group were women. Her advocacy for women in tech never ceased, even as she described a drop in participation after the early days of computing:
Later, as computing emerged as a specialized field, employers began to require engineering credentials, which traditionally attracted few women. But the pendulum is swinging back as women enter the field from other areas such as medical informatics, user interfaces and computers in education.
In 2006 Allen received the Turing Award (considered the Nobel Prize for computing) — the first woman to do so.
So the next time you fire up gcc, write anything in a language above assembly, or even use any software at all, remember that Frances Allen’s ideas are invisibly at play.
No comments:
Post a Comment