Lazy Evaluation
Overview
- Lazy Evaluation is a programming technique or strategy that calculates expressions only when the values are truly needed.
- Scala supports lazy evaluation through
lazy val
,view
, and lazy collections likeLazyList
.
🥨 Layman’s Overview
Imagine you’re preparing a big meal for your friends. Your menu has many dishes, but you’re not sure if everyone will eat every dish. If you prepare all the dishes in advance, there might be a lot of food left uneaten, leading to waste. This is similar to performing all calculations in programming in advance, even if they may not be needed in the end.
Here, the concept of "lazy" evaluation is like changing your strategy, deciding to start preparing a dish only when someone requests it. This way, you won’t waste time and resources preparing food that won’t be eaten. You only start cooking a dish when your friend says, "I want that dish."
Pros and Cons of Lazy Evaluation
Performance Improvement
Using the lazy
keyword can improve performance because it allows delaying the execution of resource-intensive or computationally intensive operations until they are actually needed. The main benefits of lazy initialization include:
-
Reduction of unnecessary computations: If an expensive operation or object creation is never used, using
lazy
can avoid these unnecessary computations or resource allocations. This is especially useful when initializing objects with high overhead. -
Faster program startup: If a program has many resource-intensive or computationally intensive initialization operations, the program’s startup time may be long. By marking these operations as
lazy
, you can avoid them during program startup, speeding up the process. These operations are only executed when they are actually needed. -
Avoiding circular dependency issues: In some cases, the initialization of two or more objects depends on each other, which can lead to circular dependency issues. Using
lazy
initialization can help resolve such problems because the actual initialization is deferred until all related objects are accessed. -
Handling potentially infinite structures: Here, the word "potentially" is crucial. These structures are designed to be potentially infinite, but in reality, at any given time, we only process a part of them. For example, an infinite sequence that theoretically extends indefinitely, but we only calculate specific parts when needed. This is especially relevant when dealing with concepts like streams or iterators. These structures allow us to write programs in a seemingly infinitely running manner, but in reality, we only perform calculations when necessary.
Drawbacks
In a multi-threaded environment, using lazy
initialization can introduce thread safety issues and synchronization overhead for the following reasons:
-
Thread safety issues: In a multi-threaded environment, if multiple threads simultaneously attempt to access the same
lazy
initialized variable, there can be a race condition. Since alazy
variable is initialized only on its first access, if two threads almost simultaneously access this variable, they might both try to initialize it. This can result in the variable being initialized multiple times or the initialization process being interrupted, leading to data inconsistency or program errors. -
Synchronization overhead: To avoid thread safety issues, Scala’s
lazy
initialization implementation usually includes some form of synchronization mechanism, such as locks. This means that every time alazy
variable is accessed, there might be lock acquisition and release operations involved. In a high-concurrency environment, this can result in significant performance overhead, as multiple threads might be blocked while waiting to acquire locks. -
Complex initialization logic: If the initialization logic of a
lazy
variable itself is complex and time-consuming, and involves access to other shared resources, it can increase the risk of deadlocks. Handling complexlazy
initialization in a multi-threaded environment requires special care to avoid deadlocks and other concurrency issues.
Lazy Variables (Lazy Val
)
- Use
lazy val
to define variables with lazy initialization. - The value of the variable is computed on the first access and then cached for subsequent use.
- It provides performance improvement in scenarios where variables may not be used
View
Definition
View
is a feature of Scala collections that provides a way to delay transformation operations on collections (such asmap
,filter
,flatMap
, etc.).- When these operations are applied to a
View
, they are not executed immediately; instead, they are stored as a sequence of operations and are executed only when elements are accessed.
Use Cases
Using view
is more efficient in scenarios where large amounts of data need to be processed, and only a portion of the results is required. It allows for the delayed execution of data transformation and filtering operations until the data is actually needed. This can significantly reduce unnecessary computations, especially when dealing with large datasets. Without view
, even if only a small portion of the results is needed, the entire dataset would be computed and stored, leading to resource wastage.
i.e., it is used to optimize chained operations on collections, avoiding the creation of multiple intermediate collections, reducing memory usage, and improving performance.
Steps
-
Saving transformation operations: When transformation operations like
map
,filter
,flatMap
, etc., are applied to aView
, these operations are not executed immediately. Instead, they are saved as a sequence of operations. During this stage, the original collection remains unchanged, and no new collections are created. -
Delayed execution: These saved operations are executed only when elements of the
View
are actually accessed. For example, when you attempt to access or iterate over the results of theView
, these operations are applied to the original collection in the order they were saved, and the corresponding elements are computed. -
Efficiency optimization: This delayed execution allows
View
to avoid immediate and unnecessary computation of the entire collection, especially in scenarios where only a portion of the results is required. It helps optimize performance, especially when multiple transformation operations are involved, as it avoids creating multiple intermediate collections.
Example of Batch Data Processing and Conditional Filtering
Design a scenario to demonstrate the advantages of view
, focusing on its lazy evaluation feature. Consider the following scenario:
Suppose you are processing a list containing a large amount of data. Your task is to perform a series of transformations and filtering operations on this data, but you only need a small portion of the results, e.g., finding the first 10 elements that satisfy a specific condition.
In this scenario, we can compare the performance difference between using view
and not using view
:
Using view
:
val numbers = (1 to 1000000).toList
val view = numbers.view.map(_ * 2).filter(_ > 5)
// Assume we only need the first 10 elements that satisfy the condition
val resultWithView = view.take(10).toList
In this example, view
delays the execution of map
and filter
operations on the entire list. When take(10)
is called, only the first 10 elements that satisfy the condition are actually computed. This means that map
and filter
operations are not executed on the entire list,
saving a significant amount of unnecessary computation.
Without view
:
val numbers = (1 to 1000000).toList
val processedNumbers = numbers.map(_ * 2).filter(_ > 5)
// Similarly, we only need the first 10 elements that satisfy the condition
val resultWithoutView = processedNumbers.take(10)
In this example, map
and filter
operations are immediately executed on the entire list, creating a new processed list. Despite needing only the first 10 elements, the entire list is computed and stored, leading to performance overhead, especially with large datasets.
Code
In actual execution, using view
is approximately 12 times faster:
package week2
object ViewVsNoView extends App {
import scala.collection.View
def time[R](block: => R): Long = {
val start = System.nanoTime()
block
System.nanoTime() - start
}
val numbers = (1 to 1000000).toList
// With view
val timeWithView = time {
val view = numbers.view.map(_ * 2).filter(_ > 5)
val resultWithView = view.take(10).toList
}
// Without view
val timeWithoutView = time {
val processedNumbers = numbers.map(_ * 2).filter(_ > 5)
val resultWithoutView = processedNumbers.take(10)
}
println(s"Time with View: $timeWithView nanoseconds")
println(s"Time without View: $timeWithoutView nanoseconds")
}
// Time with View: 6729800 nanoseconds
// Time without View : 85503600 nanoseconds
LazyList
Definition
LazyList
is a lazy collection class in Scala that represents a potentially infinite list where elements are computed on-demand.- Each element is computed only when first accessed, and the computed result is cached for subsequent use.
Use Cases
- Used for creating potentially infinite sequences or efficiently processing large collections.
Example of Filtering Prime Numbers
Using LazyList
and the Sieve of Eratosthenes algorithm to generate a sequence of prime numbers.
LazyList.from(1).filter(isPrime)(1)
This expression does not compute the entire sequence and then take the second element. Due to the lazy nature of the sequence, this expression works efficiently:
Comparison
If you were to use a Range
or a regular list, the expression would first compute and generate a list of all elements within the range that satisfy the isPrime
condition. Subsequently, when using .apply(1)
or a similar indexing method, it would retrieve the second element from this precomputed list.
In this non-lazy evaluation scenario, the computation efficiency is lower, especially when only a few elements from the sequence are needed. Even if only one or a few elements are required, the entire sequence would still be fully computed and stored. This is why lazy evaluation (such as using LazyList
or similar lazy structures) is often more efficient when dealing with large datasets or potentially infinite sequences.
Code
In actual execution, this example with Lazy List is approximately 12 times faster:
object LazyListvsRange extends App {
def isPrime(n: Int): Boolean = {
if (n <= 1) return false
if (n <= 3) return true
if (n % 2 == 0 || n % 3 == 0) return false
var i = 5
while (i * i <= n) {
if (n % i == 0 || n % (i + 2) == 0) return false
i += 6
}
true
}
val startTimeLazyList = System.nanoTime()
val resultLazyList = LazyList.from(1).filter(isPrime)(1)
val endTimeLazyList = System.nanoTime()
val durationLazyList = endTimeLazyList - startTimeLazyList
println(s"Result with LazyList: $resultLazyList, Time taken: $durationLazyList nanoseconds")
val startTimeRange = System.nanoTime()
val resultRange = (1 to 10000000).filter(isPrime)(1)
val endTimeRange = System.nanoTime()
val durationRange = endTimeRange - startTimeRange
println(s"Result with Range: $resultRange, Time taken: $durationRange nanoseconds")
}
LazyList vs View
-
Timing of Execution:
View
saves a sequence of transformation operations that are applied to the original collection only when elements are accessed.LazyList
computes each element and caches the result upon the first access, suitable for potentially infinite sequences.
-
Caching:
View
typically has lower caching – it does not cache computed results, recalculating each time it is accessed.LazyList
may have higher caching – it caches the computed result for each element.
-
Use Cases:
View
is suitable for optimizing chained transformation operations on finite collections.LazyList
is suitable for generating and operating on potentially infinite sequences or scenarios where caching results is needed.
In summary, both View
and LazyList
are used for lazy evaluation, but they differ in caching behavior, timing of execution, and optimal use cases.
Pitfalls
Is def
considered lazy in Scala?
In Scala, the def
keyword is used to define functions or methods and is not directly considered lazy evaluation. However, functions or methods defined using def
do have a form of delayed computation because they are executed only when called. While this shares some similarity with lazy evaluation, it’s not the same concept.
-
On-demand computation: Functions or methods defined using
def
are executed only when called, similar to lazy evaluation, as computation is deferred until needed. -
Recomputation on each call: Unlike
lazy val
, functions or methods defined usingdef
are recomputed every time they are called.lazy val
, on the other hand, computes its value once upon first access and then caches it for subsequent use.
i.e.,
- In Scala,
def
is primarily used for defining functions and methods, and while it has a feature of on-demand computation, it’s not a typical representation of lazy evaluation. - However, functions or methods defined with
def
share some similarity with lazy evaluation because they defer computation until they are actually called. - The main difference is that
def
functions or methods are recomputed every time they are called, whereaslazy val
computes once and caches the result for reuse.
Is val x: LazyList
already lazy, or do you need lazy val x: LazyList
?
In Scala, when you declare a variable of type LazyList
, the LazyList
itself is already lazy, regardless of whether you use the lazy val
keyword. The laziness of a LazyList
is determined by its internal implementation, where each element is computed only upon first access. This means that even if you use a regular val
to define a LazyList
, it still maintains the lazy evaluation behavior.