Lazy Lists Based on Generator and Iterator

If you know Haskell, you are familiar with the following expressions

1
2
3

repeat 1 -- => [1, 1, 1, 1, 1,...]  
cycle "abc" -- => "abcabcabc..."  
[1, 3..] -- => [1, 3, 5, 7, ...]

Several of the above expressions produce infinite lists. For those of you who are used to mainstream programming voices may be confused as to how the concept of infinity can be expressed inside a finite amount of memory. The main reason is that Haskell is a language that uses an inert evaluation strategy by default, and the part that is not used is just an expression inside the memory and does not really do the computation.

If we just look at the above expressions, many of you may say that it doesn’t feel like there’s anything magical about it, and it doesn’t seem to do anything. Let’s look at the following code again.

The fibonacci series in Haskell.

`1`	`fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)`

Here fibonacci itself is an inert structure, so in the calculation, it will first count the two 1s in front of the list, to get 1 : 1..., and then how to express the fib(n) = fib(n - 1) + fib(n - 2) property of fibonacci? We can notice that n - 1 and n - 2 are exactly one place apart in the series, so n can be seen as the result of the sum of the wrong places in the series.

Let’s look at a sieve method for finding prime numbers. If you are not familiar with the sieve method, you can click wiki to see how the algorithm works. The following code is a simple implementation of Haskell.

1
2
3

primes = 2 : filter isPrime [3, 5..]  
  where
    isPrime x = all (\p -> x `mod` p > 0) (takeWhile (\p -> p * p <= x) primes)

So, Why Lazy?

Inert lists make the code and structure more flexible for certain list operations of indefinite length. To use the primes list as an example, in a traditional C or Java implementation, we would have to declare a maximum length or a maximum range of values, such as prime numbers up to 10000. If the later calculation is to use more than this range, we will have to re-call the generation function to regenerate a longer list. The problem here is: first, to actively call this factory function, and second, to manually maintain a cache list if we want to reuse the already computed data, which inevitably increases the complexity of the code. Another possible scenario is that we pre-generate a very long list and only use a drop of data from the head of the list in the later calculations, which is also a great waste.

The use of inert lists increases the expressiveness of our programming, allowing us to focus more on the properties of the data structure itself, rather than wasting time on how to manage the stack. Because the inert value feature ensures that we only compute a value when we need it, we can automatically minimize our computation and save resources.

For example, we can read and write files with lazy byteString, which itself does not load the whole file into our memory, but reads it on demand. Sometimes we can read a large file and filter out only the first few dozens of data we need, but we have to put hundreds of M or even G files into memory.

Here is also an article from more than 10 years ago How to Speed Up Lo-Dash ×100? Introducing Lazy Evaluation, if you are interested, you can click on it.

Implementing Lazy Lists in JavaScript

Is there an inert structure in JavaScript? Let’s look at the following example.

let fetchSomething = fetch('/some/thing');  
if (condition) {  
  fetchSomething = fetch('/some/thing/condition');
}
fetchSomething.then(() => {  
  // TODO
});

The fetch method itself is executed immediately, but if the condition is met, the fetch method here will be executed twice. This is not the behavior we expect, and the usual way to make this fetch action execute when we need it, instead of starting it when it is declared, is to change it to look like the following.

let fetchSomething = () => fetch('/some/thing');  
if (condition) {  
  fetchSomething = () = fetch('/some/thing/condition');
}
fetchSomething.then(() => {  
  // TODO
});

Inspired by this, we can roughly implement the following structure.

class List<T> {  
  head: T | () => T
  tail: List<T> | () => List<T>

  constructor(head: T, tail: () => List<T>) {
    this.head = () => head;
    this.tail = tail;
  }
}

List<T> is essentially a single-linked table, and the tail passed into the constructor is a factory function that builds new List nodes. Only when we access a node, we evaluate its head, and when we access its next node, we evaluate tail, otherwise both head and tail are just expressions to be evaluated.

This seems to have solved my problem, but this structure has a lot of unnecessary extra overhead when converting to and from normal Array.

So is there a more natural structure in JavaScript that would save us from having to construct such a complex object, simplifying the code while making it more generalizable?

Getting Started with Iterable

A new feature of ES6 gave me the answer I was looking for, Iteration Protocols. If the MDN description is too long, you can just look at the equivalent type declaration below.

interface Iterable<T> {  
  [Symbol.iterator](): Iterator<T>;
}

interface Iterator<T> {  
  next(): IteratorResult<T>;
}

interface IteratorResult<T> {  
  done: Boolean;
  value?: T;
}

interface IterableIterator<T> {  
  [Symbol.iterator](): Iterator<T>;
  next(): IteratorResult<T>;
}

All objects that implement an Iterable interface can be used by means of an object such as for... ...of..., ... .itor and Array.from. The iteration ends when done is true in the return value of the next method. And only when we access the next method will we move to the next iteration, the ideal Lazy structure.

At this point we look at how our fibonacci should be written.

class Fibonacci implements IterableIterator<number> {  
  private prev = 1;
  private next = 1;

  public next() {
    let current = this.prev;
    this.prev = this.next;
    this.next = current + this.prev;
    return {
      done: false,
      value: current
    }
  }

  [Symbol.iterator]() {
    return this;
  }
}

const fib = new Fibonacci();  
fib.next() // => { done: false, value: 1 }  
fib.next() // => { done: false, value: 1 }  
fib.next() // => { done: false, value: 2 }  
// etc

At this point, we can already express an inert infinite series. But the above code is too cumbersome, so it’s a good thing that ES6 also provides us with Generator, which makes it easy to write IterableItorator, and in a sense, Generator is syntactic sugar for the above code.

Using Generator, the above code can be simplified to look like the following.

export function* fibonacci() {  
  let prev = 1;
  let next = 1;

  while (true) {
    yield prev;
    const temp = prev;
    prev = next;
    next = temp + prev;
  }
}

const fib = fibonacci();  
// etc

If you don’t know the syntax of Generator, you can read this article A Simple Guide to Understanding Javascript (ES6) Generators first.

Define Infinite List

Moving on from the above code, the following code implements the repeat, cycle, iterate, range and other methods from the beginning of the article.

export function* repeat<T>(item: T) {  
  while (true) {
    yield item;
  }
}

export function* cycle<T>(items: Iterable<T>) {  
  while (true) {
    yield* [...items];
  }
}

export function* iterate<T>(fn: (value: T) => T, initial: T) {  
  let val = initial;
  while (true) {
    yield val;
    val = fn(val);
  }
}

export function* range(start: number, end = Infinity, step = 1) {  
  while (start <= end) {
    yield start;
    start += step;
  }
}

As you can see, the code is very intuitive and easy to understand.

Define Operator

Once we have the list, we need to operate on top of it. The following code implements the map/filter/take/takeWhile methods respectively.

export function* map<T, U>(fn: (value: T) => U, items: Iterable<T>) {  
  for (let item of items) {
    yield fn(item);
  }
}

export function* filter<T>(  
  predicate: (value: T) => boolean,
  items: Iterable<T>
) {
  for (let item of items) {
    if (predicate(item)) {
      yield item;
    }
  }
}

export function* take<T>(n: number, items: Iterable<T>) {  
  let i = 0;
  if (n < 1) return;

  for (let item of items) {
    yield item;
    i++;
    if (i >= n) {
      return;
    }
  }
}

function* takeWhile<T>(  
  predicate: (value: T) => boolean,
  items: Iterable<T>
) {
  for (let item of items) {
    if (predicate(item)) {
      yield item;
    } else {
      return;
    }
  }
}

The above code is relatively simple. The more difficult part is to implement the zip method, that is, how to merge two lists into one?

The difficulty is that if you receive an Iterable object, you don’t necessarily need to implement the next method, such as Array, String, etc. Also, not all Iterable objects can be accessed by index. In addition, if we want to turn Array.from into an array first and then operate on the array, we will encounter a situation where the Iterable object we pass in is infinite, like fibonacci above, in which case we can’t use Array.from.

One of my ideas is to elevate an Iterable object to an IterableItorator object, and then iterate through it one by one using the next method.

How? Fortunately, Generator gives us a yield* operator that allows us to easily define a lift method.

1
2
3

export function* lift<T>(items: Iterable<T>): IterableIterator<T> {  
  yield* items;
}

With this lift method, it is easy to write zip methods and zipWith methods.

export function* zip<T, G>(  
  seqA: Iterable<T>,
  seqB: Iterable<G>
): IterableIterator<[T, G]> {
  const itorA = lift(seqA);
  const itorB = lift(seqB);
  let valA = itorA.next();
  let valB = itorB.next();
  while (!valA.done || !valB.done) {
    yield [valA.value, valB.value];
    valA = itorA.next();
    valB = itorB.next();
  }
}

export function* zipWith<T, G, R>(  
  fn: (a: T, b: G) => R,
  seqA: Iterable<T>,
  seqB: Iterable<G>
): IterableIterator<R> {
  const itorA = lift(seqA);
  const itorB = lift(seqB);
  let valA = itorA.next();
  let valB = itorB.next();
  while (!valA.done || !valB.done) {
    yield fn(valA.value, valB.value);
    valA = itorA.next();
    valB = itorB.next();
  }
}

More methods can be found at the bottom of my repo, so I won’t list them all here.

Summary

Generator and Iterator are very powerful language-level capabilities that ES6 brings to the table, and its own evaluation can be seen as inert.

When co from TJ first came out around 13 years ago, the code was amazingly short and concise. However, in our use, one is limited by browser compatibility, and the other is limited by our use scenario, I think we have not developed its features far enough. Combining IO, network, Generator and Iterator can do a lot more for us.

On the contrary, the abuse of inert structures will cache a lot of thunk during the execution of the program and increase the overhead in memory.

GitHub lazyList。
Slide Lazy List With Generator and Iterator

Reference https://tech.youzan.com/lazy-list-with-generator-and-iterator/

Table of Contents