Scope (computer science)

Photo by Harsch Shivam on Pexels.com

In computer programming, the scope of a name binding—an association of a name to an entity, such as a variable—is the part of a program where the name binding is valid, that is where the name can be used to refer to the entity. In other parts of the program the name may refer to a different entity (it may have a different binding), or to nothing at all (it may be unbound). The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is from the perspective of the referenced entity, not the referencing name.

The term “scope” is also used to refer to the set of all name bindings that are valid within a part of a program or at a given point in a program, which is more correctly referred to as context or environment.[a]

Strictly speaking[b] and in practice for most programming languages, “part of a program” refers to a portion of source code (area of text), and is known as lexical scope. In some languages, however, “part of a program” refers to a portion of run time (time period during execution), and is known as dynamic scope. Both of these terms are somewhat misleading—they misuse technical terms, as discussed in the definition—but the distinction itself is accurate and precise, and these are the standard respective terms. Lexical scope is the main focus of this article, with dynamic scope understood by contrast with lexical scope.

In most cases, name resolution based on lexical scope is relatively straightforward to use and to implement, as in use one can read backwards in the source code to determine to which entity a name refers, and in implementation one can maintain a list of names and contexts when compiling or interpreting a program. Difficulties arise in name maskingforward declarations, and hoisting, while considerably subtler ones arise with non-local variables, particularly in closures.

Levels of scope

Scope can vary from as little as a single expression to as much as the entire program, with many possible gradations in between. The simplest scope rule is global scope—all entities are visible throughout the entire program. The most basic modular scope rule is two-level scope, with a global scope anywhere in the program, and local scope within a function. More sophisticated modular programming allows a separate module scope, where names are visible within the module (private to the module) but not visible outside it. Within a function, some languages, such as C, allow block scope to restrict scope to a subset of a function; others, notably functional languages, allow expression scope, to restrict scope to a single expression. Other scopes include file scope (notably in C) which behaves similarly to module scope, and block scope outside of functions (notably in Perl).

A subtle issue is exactly when a scope begins and ends. In some languages, such as C, a name’s scope begins at its declaration, and thus different names declared within a given block can have different scopes. This requires declaring functions before use, though not necessarily defining them, and requires forward declaration in some cases, notably for mutual recursion. In other languages, such as JavaScript or Python, a name’s scope begins at the start of the relevant block (such as the start of a function), regardless of where it is defined, and all names within a given block have the same scope; in JavaScript this is known as variable hoisting. However, when the name is bound to a value varies, and behavior of in-context names that have undefined value differs: in Python use of undefined names yields a runtime error, while in JavaScript undefined names declared with var (but not names declared with let nor const) are usable throughout the function because they are bound to the value undefined.

Expression scope

The scope of a name binding is an expression, which is known as expression scope. Expression scope is available in many languages, especially functional languages which offer a feature called let-expressions allowing a declaration’s scope to be a single expression. This is convenient if, for example, an intermediate value is needed for a computation. For example, in Standard ML, if f() returns 12, then let val x = f() in x * x end is an expression that evaluates to 144, using a temporary variable named x to avoid calling f() twice. Some languages with block scope approximate this functionality by offering syntax for a block to be embedded into an expression; for example, the aforementioned Standard ML expression could be written in Perl as do { my $x = f(); $x * $x }, or in GNU C as ({ int x = f(); x * x; }).

In Python, auxiliary variables in generator expressions and list comprehensions (in Python 3) have expression scope.

In C, variable names in a function prototype have expression scope, known in this context as function protocol scope. As the variable names in the prototype are not referred to (they may be different in the actual definition)—they are just dummies—these are often omitted, though they may be used for generating documentation, for instance.

Block scope

The scope of a name binding is a block, which is known as block scope. Block scope is available in many, but not all, block-structured programming languages. This began with ALGOL 60, where “[e]very declaration … is valid only for that block.”,[6] and today is particularly associated with languages in the Pascal and C families and traditions. Most often this block is contained within a function, thus restricting the scope to a part of a function, but in some cases, such as Perl, the block may not be within a function.

unsigned int sum_of_squares(const unsigned int N) {
  unsigned int ret = 0;
  for (unsigned int n = 1; n <= N; n++) {
    const unsigned int n_squared = n * n;
    ret += n_squared;
  }
  return ret;
}

A representative example of the use of block scope is the C code shown here, where two variables are scoped to the loop: the loop variable n, which is initialized once and incremented on each iteration of the loop, and the auxiliary variable n_squared, which is initialized at each iteration. The purpose is to avoid adding variables to the function scope that are only relevant to a particular block—for example, this prevents errors where the generic loop variable i has accidentally already been set to another value. In this example the expression n * n would generally not be assigned to an auxiliary variable, and the body of the loop would simply be written ret += n * n but in more complicated examples auxiliary variables are useful.

Blocks are primarily used for control flow, such as with if, while, and for loops, and in these cases block scope means the scope of variable depends on the structure of a function’s flow of execution. However, languages with block scope typically also allow the use of “naked” blocks, whose sole purpose is to allow fine-grained control of variable scope. For example, an auxiliary variable may be defined in a block, then used (say, added to a variable with function scope) and discarded when the block ends, or a while loop might be enclosed in a block that initializes variables used inside the loop that should only be initialized once.

A subtlety of several programming languages, such as Algol 68 and C (demonstrated in this example and standardized since C99), is that block-scope variables can be declared not only within the body of the block, but also within the control statement, if any. This is analogous to function parameters, which are declared in the function declaration (before the block of the function body starts), and in scope for the whole function body. This is primarily used in for loops, which have an initialization statement separate from the loop condition, unlike while loops, and is a common idiom.

Block scope can be used for shadowing. In this example, inside the block the auxiliary variable could also have been called n, shadowing the parameter name, but this is considered poor style due to the potential for errors. Furthermore, some descendants of C, such as Java and C#, despite having support for block scope (in that a local variable can be made to go out of context before the end of a function), do not allow one local variable to hide another. In such languages, the attempted declaration of the second n would result in a syntax error, and one of the n variables would have to be renamed.

If a block is used to set the value of a variable, block scope requires that the variable be declared outside of the block. This complicates the use of conditional statements with single assignment. For example, in Python, which does not use block scope, one may initialize a variable as such:

if c:
    a = "foo"
else:
    a = ""

where a is accessible after the if statement.

In Perl, which has block scope, this instead requires declaring the variable prior to the block:

my $a;
if (c) {
    $a = 'foo';
} else {
    $a = '';
}

Often this is instead rewritten using multiple assignment, initializing the variable to a default value. In Python (where it is not necessary) this would be:

a = ""
if c:
    a = "foo"

while in Perl this would be:

my $a = '';
if (c) {
    $a = 'foo';
}

In case of a single variable assignment, an alternative is to use the ternary operator to avoid a block, but this is not in general possible for multiple variable assignments, and is difficult to read for complex logic.

This is a more significant issue in C, notably for string assignment, as string initialization can automatically allocate memory, while string assignment to an already initialized variable requires allocating memory, a string copy, and checking that these are successful.

{
  my $counter = 0;
  sub increment_counter {
      return  ++$counter;
  }
}

Some languages allow the concept of block scope to be applied, to varying extents, outside of a function. For example, in the Perl snippet at right, $counter is a variable name with block scope (due to the use of the my keyword), while increment_counter is a function name with global scope. Each call to increment_counter will increase the value of $counter by one, and return the new value. Code outside of this block can call increment_counter, but cannot otherwise obtain or alter the value of $counter. This idiom allows one to define closures in Perl.

Function scope

The scope of a name binding is a function, which is known as function scope. Function scope is available in most programming languages which offer a way to create a local variable in a function or subroutine: a variable whose scope ends (that goes out of context) when the function returns. In most cases the lifetime of the variable is the duration of the function call—it is an automatic variable, created when the function starts (or the variable is declared), destroyed when the function returns—while the scope of the variable is within the function, though the meaning of “within” depends on whether scope is lexical or dynamic. However, some languages, such as C, also provide for static local variables, where the lifetime of the variable is the entire lifetime of the program, but the variable is only in context when inside the function. In the case of static local variables, the variable is created when the program initializes, and destroyed only when the program terminates, as with a static global variable, but is only in context within a function, like an automatic local variable.

Importantly, in lexical scope a variable with function scope has scope only within the lexical context of the function: it goes out of context when another function is called within the function, and comes back into context when the function returns—called functions have no access to the local variables of calling functions, and local variables are only in context within the body of the function in which they are declared. By contrast, in dynamic scope, the scope extends to the execution context of the function: local variables stay in context when another function is called, only going out of context when the defining function ends, and thus local variables are in context of the function in which they are defined and all called functions. In languages with lexical scope and nested functions, local variables are in context for nested functions, since these are within the same lexical context, but not for other functions that are not lexically nested. A local variable of an enclosing function is known as a non-local variable for the nested function. Function scope is also applicable to anonymous functions.

def square(n):
    return n * n

def sum_of_squares(n):
    total = 0 
    i = 0
    while i <= n:
        total += square(i)
        i += 1
    return total

For example, in the snippet of Python code on the right, two functions are defined: square and sum_of_squares. square computes the square of a number; sum_of_squares computes the sum of all squares up to a number. (For example, square(4) is 42 = 16, and sum_of_squares(4) is 02 + 12 + 22 + 32 + 42 = 30.)

Each of these functions has a variable named n that represents the argument to the function. These two n variables are completely separate and unrelated, despite having the same name, because they are lexically scoped local variables with function scope: each one’s scope is its own, lexically separate function and thus, they don’t overlap. Therefore, sum_of_squares can call square without its own n being altered. Similarly, sum_of_squares has variables named total and i; these variables, because of their limited scope, will not interfere with any variables named total or i that might belong to any other function. In other words, there is no risk of a name collision between these names and any unrelated names, even if they are identical.

No name masking is occurring: only one variable named n is in context at any given time, as the scopes do not overlap. By contrast, were a similar fragment to be written in a language with dynamic scope, the n in the calling function would remain in context in the called function—the scopes would overlap—and would be masked (“shadowed”) by the new n in the called function.

Function scope is significantly more complicated if functions are first-class objects and can be created locally to a function and then returned. In this case any variables in the nested function that are not local to it (unbound variables in the function definition, that resolve to variables in an enclosing context) create a closure, as not only the function itself, but also its context (of variables) must be returned, and then potentially called in a different context. This requires significantly more support from the compiler, and can complicate program analysis.

File scope

The scope of a name binding is a file, which is known as file scope. File scope is largely particular to C (and C++), where scope of variables and functions declared at the top level of a file (not within any function) is for the entire file—or rather for C, from the declaration until the end of the source file, or more precisely translation unit (internal linking). This can be seen as a form of module scope, where modules are identified with files, and in more modern languages is replaced by an explicit module scope. Due to the presence of include statements, which add variables and functions to the internal context and may themselves call further include statements, it can be difficult to determine what is in context in the body of a file.

In the C code snippet above, the function name sum_of_squares has file scope.

Module scope

The scope of a name binding is a module, which is known as module scope. Module scope is available in modular programming languages where modules (which may span various files) are the basic unit of a complex program, as they allow information hiding and exposing a limited interface. Module scope was pioneered in the Modula family of languages, and Python (which was influenced by Modula) is a representative contemporary example.

In some object-oriented programming languages that lack direct support for modules, such as C++, a similar structure is instead provided by the class hierarchy, where classes are the basic unit of the program, and a class can have private methods. This is properly understood in the context of dynamic dispatch rather than name resolution and scope, though they often play analogous roles. In some cases both these facilities are available, such as in Python, which has both modules and classes, and code organization (as a module-level function or a conventionally private method) is a choice of the programmer.

Global scope

The scope of a name binding is an entire program, which is known as global scope. Variable names with global scope—called global variables—are frequently considered bad practice, at least in some languages, due to the possibility of name collisions and unintentional masking, together with poor modularity, and function scope or block scope are considered preferable. However, global scope is typically used (depending on the language) for various other sorts of names, such as names of functions, names of classes and names of other data types. In these cases mechanisms such as namespaces are used to avoid collisions.