Second, how can you process really large collections efficiently? Ideally, to speed up the processing, you want to leverage multicore architectures. However, writing parallel code is hard and error-prone.
The Streams API addresses both these issues. It introduces a new abstraction called Stream that lets you process data in a declarative way. Furthermore, streams can leverage multicore architectures without you having to deal with low-level constructs such as threads, locks, conditional variables, and volatiles, etc.
For example, say you need to filter a list of invoices to find those related to a specific customer, sort them by amount of the invoice, and then extract their IDs. Using the Streams API, you can express this simply with the following query:
List
You’ll see how this code works in more detail later in this chapter.
What Is a Stream?
So what is a stream? Informally, you can think of it as a “fancy iterator” that supports database-like operations. Technically, it’s a sequence of elements from a source that supports aggregate operations. Here’s a breakdown of the more formal definition: Sequence of elements
A stream provides an interface to a sequenced set of values of a specific element type. However, streams don’t actually store elements; they’re computed on demand. Source
Streams consume from a data-providing source such as collections, arrays, or I/O resources. Aggregate operations
Streams support database-like operations and common operations from functional programming languages, such as filter, map, reduce, findFirst, allMatch, sorted, and so on.
Furthermore, stream operations have two additional fundamental characteristics that differentiate them from collections: Pipelining
Many stream operations return a stream themselves. This allows operations to be chained to form a larger pipeline. This style enables certain optimizations such as laziness, short-circuiting, and loop fusion. Internal iteration
In contrast to collections, which are iterated explicitly (external iteration), stream operations do the iteration behind the scenes for you.
Stream Operations
The Stream interface in java.util.stream.Stream defines many operations, which can be grouped into two categories:
Operations such as filter, sorted, and map, which can be connected together to form a pipeline
Operations such as collect, findFirst, and allMatch, which terminate the pipeline and return a result
Stream operations that can be connected are called
Let’s take a tour of some of the operations available on streams. Refer to the java.util.stream.Stream interface for the complete list.
Filtering
There are several operations that can be used to filter elements from a stream: filter
Takes a Predicate object as an argument and returns a stream including all elements that match the predicate distinct
Returns a stream with unique elements (according to the implementation of equals for a stream element) limit
Returns a stream that is no longer than a certain size skip
Returns a stream with the first n number of elements discarded
List
Matching
A common data processing pattern is determining whether some elements match a given property. You can use the anyMatch, allMatch, and noneMatch operations to help you do this. They all take a predicate as an argument and return a boolean as the result. For example, you can use allMatch to check that all elements in a stream of invoices have a value higher than 1,000: boolean expensive = invoices.stream() .allMatch(inv -> inv.getAmount() > 1_000);
Finding