## A.1. Single-core Computing

The goal of this module is to provide you with basic knowledge about sequential computing (i.e., running a program on a single core).

There is a lot of complexity under the cover, which belongs in Computer Architecture and Operating Systems courses/textbooks. Instead, we take a high-level approach, with a focus on performance.

Go through the tabs below in sequence…

#### Learning Objectives

• Understand the concepts of work and of compute speed
• Be familiar with Flop as a measure of work and with Flop/sec as a measure of compute speed
• Understand the simple relationship between program execution time, work, and compute speed

### Measures of Work and Compute Speed

In these pedagogic modules we rarely consider programs that are interactive, i.e., that react based on real-time user input via the keyboard or the mouse. A text editor would fall in this category. Instead, we almost always consider programs that have some amount of computation, or work, to perform and then terminate. An example would be a program that mines a cryptocurrency.

The simplest model of performance when executing a non-interactive program on a core of a computer is to assume that the computer delivers constant compute speed, which is measured by the quantity of work performed per time unit. For instance, a program with 100 units of work would run in 50 seconds on a core with a speed of 2 units of work per second. This last number is called the program’s execution time.

Generalizing the above example, for a given amount of work to perform there is a linear relationship between the program’s execution time and the speed of the core on which it is executed:

\begin{align} \text{execution time} & = \frac{\text{work}}{\text{compute speed}}\;. \end{align}

There are many options for choosing an appropriate way to quantify work. One possibility is to use a measure that is specific to what the program does. For instance, if the program renders movie frames, a good measure of work would be the number of frames to render. One would then want to measure a core’s speed in terms of the number of frames that can be rendered per second (assuming all frames require the same amount of computation).

Another possibility is to use a more generic measure, for instance, the number of instructions. The work of a program would then be measured by its number of instructions (e.g., the number of hardware instructions the program performs) and the speed of a core would be in number of instructions per second. This approach is known to have problems, as instructions are not all equal, and especially across different families of processors. Therefore, a processor that delivers fewer instructions per seconds than another could actually be preferred for running some program.

### Flop and Flop/sec

It turns out that the question of defining a universal unit of work is not possible. Nevertheless, in these pedagogic modules, unless specified otherwise, we use a simple measure of work: the number of floating-point operations, or Flop, that the program performs. We thus measure the speed of a core in Flop/sec, which is commonly used in the field of high-performance scientific computing.

Like any single measure of work, the Flop count is imperfect (e.g., programs do non-floating-point computations, floating-point operations are not all the same). Fortunately, all the concepts we learn in these pedagogic modules are agnostic to the way in which we measure work. And so we just pick Flop counts to be consistent throughout.

Say a program that performs 100 Tflop (“100 TeraFlop”) is executed on a core with speed 35 Gflop/sec (“35 GigaFlop per second”). The program’s execution time would then be:

\begin{align} \text{execution time} & = \frac{100 \times 10^{12}\; \text{Flop}}{35 \times 10^{9}\; \text{Flop/sec}}\\ & \simeq 2,857.14\; \text{sec} \end{align}

If a program that performs 12 Gflop runs in 5 seconds on a core, then the speed of this core in Mflop/sec (“MegaFlop per second”) is:

\begin{align} \text{speed} & = \frac{12 \times 10^{9}\; \text{Flop}}{5 \; \text{sec}} \times \frac{1}{10^{6}}\\ & = 2,400 \; \text{Mflop/sec} \end{align}

Make sure you know your units: - K(ilo): $10^3$ - M(ega): $10^6$ - G(iga): $10^9$ - T(era): $10^12$ - P(eta): $10^15$ - E(exa): $10^18$

#### Practice Questions

[A.1.p1.1] You have to run a program that performs 4000 Gflop, and your core computes at speed 30 Tflop/sec. How long will the program run for in seconds?

$\frac{4 \;\text{Tflop}}{30\; \text{Tflop/sec}} \simeq 0.13\; \text{sec}$

[A.1.p1.2] A program just ran in 0.32 sec on a core with speed 2 Tflop/sec, how many Gflop does the program perform?

$2000\; \text{Gflop/sec} \times 0.32\; \text{sec} = 640 \;\text{Gflop}$

#### Questions

[A.1.q1.1] You have to run a program that performs 2000 Tflop, and your core computes at speed 450 Gflop/sec. How long will the program run for in minutes?

[A.1.q1.2] A program that performs 3000 Gflop just ran in 1.5 minutes on a core. What is the core speed in Tflop/sec?

[A.1.q1.3] On a given core, a program just ran in 14 seconds. By what factor should the core speed be increased if you want the program to run in 10 seconds?

#### Suggested activities

[Benchmarking #1]: Find on-line resources that provide benchmark results for currently available cores. What is the fastest currently available core in terms of floating point benchmarked performance? Are there other kinds of benchmarks?

[Benchmarking #2]: Determine the compute speed of a core of your computer by searching for, downloading, and running a freely available floating point benchmark program.

[Programming]: Implements a program that performs a large, but known, number of floating point operations and use it to perform your own benchmarking of your machine. Implement this program in different languages and observe whether the Gflop/sec measurement varies across languages.

#### Learning Objectives

• Understand the concept of time sharing
• Understand how time sharing impacts program execution times

### Time Sharing

As you know, you can execute multiple programs at once on your computer (e.g., your Web browser and a text editor). This is called multi-programming, something that Operating Systems have known how to do since the 1960’s.

Consider multiple (non-interactive) programs that run on a single core. In this case, the Operating System makes it so that the programs alternate on the core: a program runs for a while, then another, then another, and so on until we cycle back and repeat. This is called time sharing. The process of going from one running program to another is called context switching. How this is done is one of the core topics of all Operating System textbooks.

In these pedagogic modules we take a very high-level, ideal view: When running n programs at the same time on one core, each of them proceeds at 1/n-th of the core’s compute speed.

For instance, if at time 0 I start two programs, each with work 5 Gflop on a single core with speed 1 Gflop/sec, both programs would complete at time 10. Say now that the second program has work of 10 Gflop. Then the first program will complete at time 10, and the second program at time 15. During the first 10 seconds, both program proceed at speed 0.5 Gflop/sec. At time 10 the first program thus completes (because it has performed all its work), at which point the second program proceeds at full 1 Gflop/sec speed. Since the second program still has 5 units of work to complete, it runs for another 5 seconds and completes at time 15.

In practice, the programs in the above examples would complete later! This is because in real systems the context switching overhead is non-zero and because the programs would most likely compete for memory and caches (see Computer Architecture and Operating Systems textbooks for all details). But the above ideal model will be sufficient for our purposes.

In fact, we will almost always avoid time sharing altogether by never running two programs on a single core. However, the reasoning needed to compute how time sharing would impact program execution times is important and applicable to other learning objectives (e.g., networking). This is why we include below some questions on this topic.

#### Practice Questions

[A.1.p2.1] At time 0 on a core with speed 2 Tflop/sec you start two programs. Program A’s work is 200 Gflop, and program B’s work is 220 Gflop. At what times do the programs complete?

In a first phase, both programs proceed at speed 1 TFLop/sec. Program A completes first at time:

$T_{A} = \frac{0.2 \text{Tflop}}{1\; \text{Tflop/sec}} = 0.2\; \text{sec}$

At that point, program $B$ still has 20 Gflop left to compute, but it now proceeds at speed 2 Tflop/sec. Therefore, it completes at time:

$T_{B} = T_{A} + \frac{0.02 \text{Tflop}}{2\; \text{Tflop/sec}} = 0.21 \; \text{sec}$

[A.1.p2.2] Two programs, each with work 800 Tflop, were started at the same time on a core and both completed simultaneously after one hour. What is the core’s speed in Gflop/sec?

$\text{speed} = 2 \times \frac{800000\; \text{Gflop}}{ 3600\; \text{sec}} \simeq 444.44 \;\text{Gflop/sec}$

#### Questions

[A.1.q2.1] Two programs are started at the same time on a core. These programs both have work 1 Tflop, and both complete after 220 seconds. What was the core speed in Gflop/sec?

[A.1.q2.2] Three programs, A, B, and C, were started at the same time on a core with speed 600 GFLop/sec. After 10 seconds, A and C complete. Then, 2 seconds later, program B completes. What is the work (in Gflop) of each of the three programs?

[A.1.q2.3] A program, A, with work 4 Tflop is started on a core of speed 500 Gflop/sec. 5 seconds later another program B, is started. Both programs finish at the same time. What is the work of B in Tflop?

#### Learning Objectives

• Understand the concept of memory (RAM)
• Understand how the amount of available memory limits program executions

### Memory

When a program executes on a computer, it uses some of the computer’s memory (a.k.a., Random Access Memory or RAM) to store its “address space”. The address space consists of all the bytes of content that the program needs to execute (the “code”, “data”, “stack”, “heap”, “page table”, etc.). You can learn about details of the structure of address spaces in Operating Systems textbooks. In these pedagogic modules, we simply consider that a program needs some number of bytes of RAM to execute.

When not enough RAM is available (e.g., the program’s address space is too big), the Operating System keeps part of the address space on disk and shuffles content between RAM and disk as necessary. Albeit fascinating, this comes with a significant performance hit. Our main focus in these pedagogic modules is performance, and we will only consider executing a program if its entire address space fits in RAM.

For instance, consider a single-core computer with 4 GB of RAM. We have three programs A, B, and C, with address spaces of 1 GB, 2 GB, and 3 GB, respectively. In this case, we cannot run all three programs at the same time, because 6 GB is larger than the available RAM. But we can run program A and C together as they together require 4 GB of RAM. Note that in practice such a “tight” fit may not work because the entire RAM is not available for user processes. For instance, some RAM is used to store the Operating System’s Kernel (see an OS textbook).

If you have been paying attention you may wonder why we are even talking about running programs at the same time since in the previous tab we said we would almost never do it! We will find out in the Multicore Computing module!

#### Suggested activities

[How much RAM on your computer] Find out how much RAM you have on your computer in total, and how much RAM is available for running new programs on your computer right now.

[How much RAM for your browser] Find out how much RAM your Web browser is using right now on your computer (this could be a substantial amount). In these pedagogic modules we often assume that programs use a fixed amount of memory that is allocated throughout the program’s execution. But in fact, programs’ memory needs typically evolve throughout their execution. To see this in action, in your browser open a few tabs to navigate to content-heavy Web sites (e.g., news sites, social media sites). How much more memory does your browser use now?

#### Learning Objectives

• Understand the concept of IO
• Understand the impact of IO operations on computing
• Understand the basics of optimizing computation around IO operations

### Basic Concepts

A computer typically does not run a program start to finish in a vacuum. Programs often need to consume Input and produce Output, which is done by executing IO operations. A couple of very common IO operations are reading from disk and writing to disk. As the disk is much slower than the CPU, even small disk reads or writes can represent a large (from the CPU’s perspective) chunk of time during which the CPU is sitting idle.

When it comes to IO operations, not all programs are created equal. Some programs will require more IO time than others. In fact, programs are typically categorized as IO- or CPU-intensive. If a program spends more time performing IO operations than CPU operations, it is said to be IO-intensive. If the situation is reversed, the program is said to be CPU-intensive. For instance, a program that reads a large jpeg image from disk, reduces the brightness of every pixel (to make the image darker), and writes the modified image to disk is IO-intensive on most standard computers (a lot of data to read/write from/to disk, and very quick computation on this data - in this case perhaps just a simple subtraction). By contrast, a program that instead of reducing the brightness of the image applies an oil painting filter to it will most likely be CPU-intensive (applying an oil painting filter entails many, many more computations than a simple subtraction).

As mentioned above, reading from and writing to the disk are slow operations compared to the CPU. Typically, there is a difference between read and write speeds as well. Reading is typically significantly faster than writing. Furthermore, different kinds of disks have different speeds as well. The table below shows advertised read and write speeds for two mass-market SATA disks, a Hard Disk Drive (HDD) and a Solid State Drive (SSD), at the time this content is being written:

WD HDD (10EZEX) 160 MB/sec 143 MB/sec
Samsung 860 EVO 550 MB/sec 520 MB/sec

The read and write speeds are often referred to as bandwidths. The units above is MB/sec (MegaByte per second), which is also written as MBps.

Determining the exact bandwidth that disk reads and writes will experience during program execution is actually difficult (due to the complexity of the underlying hardware and software, and due to how the data is stored and accessed on the disk). In this module, we will always assume that disk bandwidths are constant.

### A Program with Computation and IO

Let us consider a program that performs a task in three phases. First, it reads data from disk. Second, it performs some computation on that data to create new data. And third, it writes the new data back to disk. This could be one of the image processing programs mentioned in the previous section as examples. If this program is invoked to process 2 images, i.e., so that it performs 2 tasks, then its execution timeline is as depicted below:

Figure 1: Example execution timeline.

As can be seen in the figure, at any given time either the CPU is idle (while IO operations are ongoing) or the disk is idle (while computation is ongoing). In the above figure, reading an image from disk takes 1 second, writing an image to disk takes 1 second, and processing an image takes 2 seconds. (We can thus infer that the two images have the same size, and that the disk has identical read and write bandwidths). We can compute the CPU Utilization as follows:

\begin{align} \text{CPU Utilization} & = \frac{T_{Compute}}{T_{Compute} + T_{Idle}} \\ & = \frac{4}{4 + 4} \\ & = 0.5 \end{align}

This means that the CPU is idle for half of the execution of the program. This program is perfectly balanced, i.e., it is neither CPU-intensive nor IO-intensive.

### Overlapping computation and IO

The execution in the previous section can be improved. This is because the CPU and the disk are two different hardware components, and they can work at the same time. As a result, while the CPU is processing the 1st image, the 2nd image could be read from disk! The CPU can then start processing the 2nd image right away after it finishes processing the 1st image. The 1st image can be written to disk at the same time. This execution is depicted below:

Figure 2: Example execution timeline with overlap of IO and computation.

The total execution time has dropped by 2 seconds and the CPU utilization is increased:

\begin{align} \text{CPU Utilization} & = \frac{T_{Compute}}{T_{Compute} + T_{Idle}} \\ & = \frac{4}{4 + 2} \\ & = 0.66 \end{align}

If there were additional, similar images to process, the CPU utilization would continue to drop as it would be idle only at the very beginning and at the very end of the execution.

The above is an ideal situation because IO time for an image is exactly equal to compute time. More precisely, save for the first and last task, while the program computes task $i$ there is enough time to write the output of task $i-1$ and to read the input of task $i+1$.

If the above condition does not hold, then there is necessarily CPU idle time. For instance, if the time to read an image is instead 2s (for instance because the program reads larger images but writes back down-scaled images) and the program must process 3 images, then the execution would be as:

Figure 3: Example execution timeline with overlap of IO and computation.

As expected, the program first reads 2 images, and then alternates write and read operations while CPU computation is going on. But in this case, the CPU experiences idle time because images cannot be read from disk fast enough. So although overlapping IO and computation almost always reduces program execution time, the benefit can vary based on IO and computation volumes (and especially if these volumes vary from task to task!).

### Practical concerns

In practice, one can implement a program that overlaps IO and computation. This can be done by using non-blocking IO operations and/or threads. These are techniques that are often taught in Operating Systems courses. The overlap may not be completely “free”, as reading/writing data from disk can still require the CPU to perform some computation. Therefore, there can be time-sharing of the CPU between the IO operations and the computation, and the computation is slowed down a little bit by the IO operations (something we did not show in the figures above). This said, there are ways for IO operations to use almost no CPU cycles. One such technique, which relies on specific but commonplace hardware, is called Direct Memory Access (DMA). See an Operating Systems course for more details.

Another practical concern is RAM pressure. When going from the example execution in Figure 1 to that in Figure 2, the peak amount of RAM needed by the program is increased because at some point more than one input images are held in RAM. Since tasks could have significant memory requirements, RAM constrains can prevent some overlap of IO and computation.

#### Simulating IO

So that you can gain hands-on experience with the above concepts, use the simulation app below.

Initially, you can create a series of identical tasks that have a certain input and output. Run the simulation to see the progression of tasks and host utilization without allowing IO to overlap with computation. Once you have observed this, try selecting the checkbox to allow overlap. With IO overlap there should be an improvement in execution time and host utilization. You can view this in the output graphs that are generated. You can also try varying the input/output and computation amounts to create IO-intensive or CPU-intensive tasks. Understanding which tasks will benefit from increased R/W or computation speeds will assist you in answering the questions to come.

(Open simulator here)

### eduWRENCH Pedagogic Module Simulator

#### Practice Questions

[A.1.p4.1] Say you have 10 tasks to execute, where each task reads in 200 MB of input, computes 2500 Gflop, and writes out 100MB of output. These 10 tasks are to be executed on the platform shown in the simulation app above. What is the total execution time when I/O and computation can be overlaped? Use the simulation app to check your answer. What is the core utilization percentage?

Reading 200 MB takes 2 seconds, computing 2500 Gflop takes 25 seconds, and writing 100 MB takes 1 second. The compute time is much larger than the I/O time. So the total execution time will be $2 + 10\times 25 + 1 = 253$ seconds, which is confirmed by the simulation.

The core is idle for only 3 seconds (at the very beginning and the very of of the execution), and so the core utilization is 250/253 = 98.8%.

[A.1.p4.2] A program reads 2GB of input from disk, performs a 6 Tflop computation on this input, and then writes 1GB of output to disk. It is executed on a computer that has a CPU that computes at speed 500 Gflop/sec and has a HDD with R/W bandwidth of 200 MB/sec. Is the program execution IO-intensive or CPU-intensive? If you could upgrade either the CPU or the HDD, which upgrade would you choose?

The execution time breakdown is as follows:

• Read input: 2000 MB / 200 MB/sec = 10 sec
• Compute: 6000 Gflop / 500 Gflop/sec = 12 sec
• Write input: 1000 MB / 200 MB/sec = 5 sec

Therefore the program’s execution is IO-intensive. Therefore one should upgrade the HDD.

[A.1.p4.3] You are working at a company that runs instances of the same task repeatedly. On the currently available hardware, the time to process a task instance is as follows:

• CPU computation: 3 sec
• Write output: 2 sec

A program was designed to overlap IO and computation when executing multiple task instances in sequence. As in Figure 3, the program first reads the input for the first 2 tasks, and then alternates between writing the output for task $i$ and reading the input for task $i+2$, until at the end it writes the output of the last two tasks one after the other. The computation on the CPU for a task is started as soon as its input has been read from disk.

What is the total execution time when processing 4 consecutive task instances? You can use the simulation app above to check your answer!

What is the core utilization?

Here is a depiction of the execution:

The execution time is 18 seconds. (This result can be generalized for $n$ tasks by identifying the repeating pattern: $2 + 3 + (n-1) x (3 + 1) + 1 = 4n + 2$.)

We can double-check the result in simulation by setting the number of tasks to 4, the amount of input data to 200 MB, the amount of output data to 200 MB, and the task work to 300 Gflop.

The CPU is utilized for 12 seconds. Therefore the CPU utilization is 12/18 = 66.6%.

[A.1.p4.4] In the same setting as in the previous question, it is decided to purchase a SSD to replace the HDD currently being used. The SSD has twice the bandwidth of the HDD. What is now the CPU utilization when processing 4 consecutive task instances?

Here is a depiction of the execution:

The execution time it 14 seconds. (This result can be generalized for $n$ tasks easily: $3n + 2$.)

The CPU is utilized for 12 seconds. Therefore the CPU utilization is 12/14 = 85.7%.

By making the IO faster, input for tasks is always ready for the CPU to process. As the number of tasks increases, the CPU utilization tends to 100%.

#### Questions

[A.1.q4.1] Consider a series of 10 identical tasks. With the hardware we have available, each task requires 1 second to read data from disk, 2 seconds for computation, and 0.5 seconds to write the output back to the disk.

• What is the lowest possible execution time if we are not able to perform IO during computation?

• What is the lowest possible execution time when overlap of computation and IO is possible?

[A.1.q4.2] A task requires 50 MB of input data to be loaded from disk before computation and writes 50 MB of data to disk once computation has been completed. The computation performs 500 Gflop. Instances of this task are executed continuously in sequence throughout the day, in a way that overlaps IO and computation. The computer on which this is done has a disk with R/W bandwidth 200 MB/sec and a CPU with compute speed 1.5 Tflop/sec. We wish to increase the number of task instances we can execute per day. Should we upgrade the processor? Or should we upgrade the disk?

[A.1.q4.3] A task requires 100 MB of input data to be loaded from disk, performs 1 Tflop of computation, and writes some output back to disk. A batch of fifty instances of this task is to be run on a computer with a processor capable of 250 Gflop/sec and a disk with R/W bandwidths of 100 MB/sec. IO and computation are overlapped. How large can the task output be so that the CPU is still 100% utilized? (ignoring the initial input read and final output write, during which the CPU is necessarily idle).

#### Suggested activities

[Programming #1]: Implement a program that first reads in all the bytes in a file into an array of bytes in RAM. The program then computes and prints out the number of bytes that are equal to 0. The program should also print the time spent reading the file into RAM and the time to compute the result. Determine experimentally whether this program is compute- or I/O-intensive. You will need to run the program on a sufficiently large file so that you can obtain meaningful results.

[Programming #2]: Modify your program so that it determines which byte value is the most frequent in the file. Is this modified program more or less I/O-intensive? By how much?

#### Learning Objectives

• Be able to apply the concepts learned in this module to reason about and optimize the performance of a scenario that includes computation activities, I/O activities, and RAM constraints

### Production Scenario

You are working for a company that uses a single-core computer to run computational tasks as part of day-to-day business. The specifications of the computer and tasks are as follows:

##### Machine
• 1 1-Core CPU that computes at 50 Gflop/sec
• 12 GB of RAM
• 1 HDD with 1TB capacity and 200 MB/sec R/W bandwidth

Two tasks need to be run back-to-back throughout the day. Each task proceeds in three phases: (i) it reads its input file fully; (ii) it computes; and (iii) it writes its output file fully. Each task has a memory footprint that must be allocated to it throughout its execution (i.e., from the time it begins reading its input file until it finishes writing its output file).

• 500 Gflop
• Requires 12 GB of RAM
• Input File (Read from disk before computation can start): 4 GB
• Output File (Written to disk after computation has completed): 2 GB
• 800 Gflop
• Requires 12 GB of RAM
• Input File (Read from disk before computation can start): 2 GB
• Output File (Written to disk after computation has completed): 4 GB

### Phase #1: Hardware upgrades for the current implementation

In the current implementation of the software that executes the two tasks, the first task must run to completion (i.e., its output file must be written to disk) before the next task cant start executing (i.e., start reading its input file from disk).

Your manager has tasked you with decreasing the total execution time for executing the two tasks. Ignoring things like hardware wear-and-tear and reliability, you have some decisions to make as far as allocating funds to upgrade the computer used to run the tasks. Here are possible upgrades:

1. Keep current 50 Gflop/sec CPU for $0 2. Upgrade CPU to 100 Gflop/sec for$100
3. Upgrade CPU to 200 Gflop/sec for $250 ##### RAM Upgrades 1. Keep current 12 GB RAM for$0
2. Upgrade to 16GB RAM for $50 3. Upgrade to 32GB RAM for$100
1. Keep current 200 MB/sec HDD for $0 2. Upgrade to SSD with 400 MB/sec R/W for$100
3. Upgrade to SSD with 500 MB/sec R/W for $250 Your manager has allocated$250 to spend for upgrades. Leftover money is encouraged if spending it will not decrease execution time further.

#### Questions

[A.1.q5.1] What is the execution time of this 2-task application on that computer?

[A.1.q5.2] Is the execution I/O-intensive or CPU-intensive?

[A.1.q5.3] If you were given only $100 to spend on upgrades, which upgrade in the list above would you purchase? [A.1.q5.4] What is the optimal way to spend the$250 on upgrades to decrease execution time and what is the corresponding execution time?

[A.1.q5.5] Will the execution then be I/O-intensive or CPU-intensive?

### Phase #2: Hardware upgrades for a better implementation

Before you purchase the upgrades you determined in question A.1.q5.4 above, your manager informs you that a software engineer in the company has just rewritten the software so that the execution can be more efficient: when the first task starts performing its computation, the second task can then start reading its input file from disk. However, only one task can compute at a time, and only one I/O operation can be performed at a time. So, for instance, if at time $t$ the first task begins computing, which will last 100 seconds, and the second task begins reading its input file, which will last 200 seconds, then the first task will only start writing its output file at time $t+200$ (i.e., it has to wait for the disk to be idle). The above is only feasible if there is sufficient RAM to accommodate both tasks.

In this more complex, but possibly more efficient, implementation, there is now a choice of which task to execute first.

#### Questions

[A.1.q5.6] If you execute A before B, what is the best way to spend the $250 on upgrades? [A.1.q5.7] If you execute B before A, what is the best way to spend the$250 on upgrades?

[A.1.q5.8] Considering the best of these two options, compare and contrast to your answers to A.1.q5.4 in terms of money spend and execution time achieved. Is the more complex implementation worthwhile?

Running eduWRENCH 1.0-dev

eduWRENCH is funded by the National Science Foundation (NSF) under grants number 1923539, and 1923621.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.