Streams in Java 8: In-Depth Tutorial with Examples

streams.jpeg

Overview

The addition of the Stream is one of the major new functionality in Java 8. This in-depth tutorial is an introduction to the many functionalities supported by streams, with a focus on simple, practical examples.

To understand this material, you need to have a basic, working knowledge of Java 8 (lambda expressions, Optional, method references).

Introduction

First of all, Java 8 Streams should not be confused with Java I/O streams (ex: FileInputStream etc); these have very little to do with each other.

Simply put, streams are wrappers around a data source, allowing us to operate with that data source and making bulk processing convenient and fast.

A stream does not store data and, in that sense, is not a data structure. It also never modifies the underlying data source.

This new functionality – java.util.stream – supports functional-style operations on streams of elements, such as map-reduce transformations on collections.

Let’s now dive into few simple examples of stream creation and usage – before getting into terminology and core concepts.

Stream Creation

Let’s first obtain a stream from an existing array:

private static Employee[] arrayOfEmps = {
new Employee(1, "Jeff Bezos", 100000.0),
new Employee(2, "Bill Gates", 200000.0),
new Employee(3, "Mark Zuckerberg", 300000.0)
};

Stream.of(arrayOfEmps);

We can also obtain a stream from an existing list:

private static List empList = Arrays.asList(arrayOfEmps);
empList.stream();

Note that Java 8 added a new stream() method to the Collection interface.

And we can create a stream from individual objects using Stream.of():

Stream.of(arrayOfEmps[0], arrayOfEmps[1], arrayOfEmps[2]);

Or simply using Stream.builder():

Stream.Builder empStreamBuilder = Stream.builder();

empStreamBuilder.accept(arrayOfEmps[0]);
empStreamBuilder.accept(arrayOfEmps[1]);
empStreamBuilder.accept(arrayOfEmps[2]);

Stream empStream = empStreamBuilder.build();

There are also other ways to obtain a stream, some of which we will see in sections below.

Stream Operations

Let’s now see some common usages and operations we can perform on and with the help of the new stream support in the language.

forEach

forEach() is simplest and most common operation; it loops over the stream elements, calling the supplied function on each element.

The method is so common that is has been introduced directly in Iterable, Map etc:

@Test
public void whenIncrementSalaryForEachEmployee_thenApplyNewSalary() {
    empList.stream().forEach(e -> e.salaryIncrement(10.0));

    assertThat(empList, contains(
      hasProperty("salary", equalTo(110000.0)),
      hasProperty("salary", equalTo(220000.0)),
      hasProperty("salary", equalTo(330000.0))
    ));
}

This will effectively call the salaryIncrement() on each element in the empList.

forEach() is a terminal operation, which means that, after the operation is performed, the stream pipeline is considered consumed, and can no longer be used. We’ll talk more about terminal operations in the next section.

map

map() produces a new stream after applying a function to each element of the original stream. The new stream could be of different type.

The following example converts the stream of Integers into the stream of Employees:

@Test
public void whenMapIdToEmployees_thenGetEmployeeStream() {
    Integer[] empIds = { 1, 2, 3 };

    List employees = Stream.of(empIds)
      .map(employeeRepository::findById)
      .collect(Collectors.toList());

    assertEquals(employees.size(), empIds.length);
}

Here, we obtain an Integer stream of employee ids from an array. Each Integer is passed to the function employeeRepository::findById() – which returns the corresponding Employee object; this effectively forms an Employee stream.

collect

We saw how collect() works in the previous example; its one of the common ways to get stuff out of the stream once we are done with all the processing:

@Test
public void whenCollectStreamToList_thenGetList() {
    List employees = empList.stream().collect(Collectors.toList());

    assertEquals(empList, employees);
}

collect() performs mutable fold operations (repackaging elements to some data structures and applying some additional logic, concatenating them, etc.) on data elements held in the Stream instance.

The strategy for this operation is provided via the Collector interface implementation. In the example above, we used the toList collector to collect all Stream elements into a List instance.

filter

Next, let’s have a look at filter(); this produces a new stream that contains elements of the original stream that pass a given test (specified by a Predicate).

Let’s have a look at how that works:

@Test
public void whenFilterEmployees_thenGetFilteredStream() {
    Integer[] empIds = { 1, 2, 3, 4 };

    List employees = Stream.of(empIds)
      .map(employeeRepository::findById)
      .filter(e -> e != null)
      .filter(e -> e.getSalary() > 200000)
      .collect(Collectors.toList());

    assertEquals(Arrays.asList(arrayOfEmps[2]), employees);
}

In the example above, we first filter out null references for invalid employee ids and then again apply a filter to only keep employees with salaries over a certain threshold.

findFirst

findFirst() returns an Optional for the first entry in the stream; the Optional can, of course, be empty:

@Test
public void whenFindFirst_thenGetFirstEmployeeInStream() {
    Integer[] empIds = { 1, 2, 3, 4 };

    Employee employee = Stream.of(empIds)
      .map(employeeRepository::findById)
      .filter(e -> e != null)
      .filter(e -> e.getSalary() > 100000)
      .findFirst()
      .orElse(null);

    assertEquals(employee.getSalary(), new Double(200000));
}

Here, the first employee with the salary greater than 100000 is returned. If no such employee exists, then null is returned.

toArray

We saw how we used collect() to get data out of the stream. If we need to get an array out of the stream, we can simply use toArray():

@Test
public void whenStreamToArray_thenGetArray() {
    Employee[] employees = empList.stream().toArray(Employee[]::new);

    assertThat(empList.toArray(), equalTo(employees));
}

The syntax Employee[]::new creates an empty array of Employee – which is then filled with elements from the stream.

flatMap

A stream can hold complex data structures like Stream<List>. In cases like this, flatMap() helps us to flatten the data structure to simplify further operations:

@Test
public void whenFlatMapEmployeeNames_thenGetNameStream() {
    List<List> namesNested = Arrays.asList(
      Arrays.asList("Jeff", "Bezos"),
      Arrays.asList("Bill", "Gates"),
      Arrays.asList("Mark", "Zuckerberg"));

    List namesFlatStream = namesNested.stream()
      .flatMap(Collection::stream)
      .collect(Collectors.toList());

    assertEquals(namesFlatStream.size(), namesNested.size() * 2);
}

Notice how we were able to convert the Stream<List> to a simpler Stream – using the flatMap() API.

peek

We saw forEach() earlier in this section, which is a terminal operation. However, sometimes we need to perform multiple operations on each element of the stream before any terminal operation is applied.

peek() can be useful in situations like this. Simply put, it performs the specified operation on each element of the stream and returns a new stream which can be used further. peek() is an intermediate operation:

@Test
public void whenIncrementSalaryUsingPeek_thenApplyNewSalary() {
    Employee[] arrayOfEmps = {
        new Employee(1, "Jeff Bezos", 100000.0),
        new Employee(2, "Bill Gates", 200000.0),
        new Employee(3, "Mark Zuckerberg", 300000.0)
    };

    List empList = Arrays.asList(arrayOfEmps);

    empList.stream()
      .peek(e -> e.salaryIncrement(10.0))
      .peek(System.out::println)
      .collect(Collectors.toList());

    assertThat(empList, contains(
      hasProperty("salary", equalTo(110000.0)),
      hasProperty("salary", equalTo(220000.0)),
      hasProperty("salary", equalTo(330000.0))
    ));
}

Here, the first peek() is used to increment the salary of each employee. The second peek() is used to print the employees. Finally, collect() is used as the terminal operation.

Method Types and Pipelines

As we’ve been discussing, stream operations are divided into intermediate and terminal operations.

Intermediate operations such as filter() return a new stream on which further processing can be done. Terminal operations, such as forEach(), mark the stream as consumed, after which point it can no longer be used further.

A stream pipeline consists of a stream source, followed by zero or more intermediate operations, and a terminal operation.

Here’s a sample stream pipeline, where empList is the source, filter() is the intermediate operation and count is the terminal operation:

@Test
public void whenStreamCount_thenGetElementCount() {
    Long empCount = empList.stream()
      .filter(e -> e.getSalary() > 200000)
      .count();

    assertEquals(empCount, new Long(1));
}

Some operations are deemed short-circuiting operations. Short-circuiting operations allow computations on infinite streams to complete in finite time:

@Test
public void whenLimitInfiniteStream_thenGetFiniteElements() {
    Stream infiniteStream = Stream.iterate(2, i -> i * 2);

    List collect = infiniteStream
      .skip(3)
      .limit(5)
      .collect(Collectors.toList());

    assertEquals(collect, Arrays.asList(16, 32, 64, 128, 256));
}

Here, we use short-circuiting operations skip() to skip first 3 elements, and limit() to limit to 5 elements from the infinite stream generated using iterate().

We’ll talk more about infinite streams later on.

Lazy Evaluation

One of the most important characteristics of streams is that they allow for significant optimizations through lazy evaluations.

Computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.

All intermediate operations are lazy, so they’re not executed until a result of a processing is actually needed.

For example, consider the findFirst() example we saw earlier. How many times is the map() operation performed here? 4 times, since the input array contains 4 elements?

@Test
public void whenFindFirst_thenGetFirstEmployeeInStream() {
    Integer[] empIds = { 1, 2, 3, 4 };

    Employee employee = Stream.of(empIds)
      .map(employeeRepository::findById)
      .filter(e -> e != null)
      .filter(e -> e.getSalary() > 100000)
      .findFirst()
      .orElse(null);

    assertEquals(employee.getSalary(), new Double(200000));
}

Stream performs the map and two filter operations, one element at a time.

It first performs all the operations on id 1. Since the salary of id 1 is not greater than 100000, the processing moves on to the next element.

Id 2 satisfies both of the filter predicates and hence the stream evaluates the terminal operation findFirst() and returns the result.

No operations are performed on id 3 and 4.

Processing streams lazily allows avoiding examining all the data when that’s not necessary. This behavior becomes even more important when the input stream is infinite and not just very large.

Comparison Based Stream Operations

sorted

Let’s start with the sorted() operation – this sorts the stream elements based on the comparator passed we pass into it.

For example, we can sort Employees based on their names:

@Test
public void whenSortStream_thenGetSortedStream() {
    List employees = empList.stream()
      .sorted((e1, e2) -> e1.getName().compareTo(e2.getName()))
      .collect(Collectors.toList());

    assertEquals(employees.get(0).getName(), "Bill Gates");
    assertEquals(employees.get(1).getName(), "Jeff Bezos");
    assertEquals(employees.get(2).getName(), "Mark Zuckerberg");
}

Note that short-circuiting will not be applied for sorted().

This means, in the example above, even if we had used findFirst() after the sorted(), the sorting of all the elements is done before applying the findFirst(). This happens because the operation cannot know what the first element is until the entire stream is sorted.

min and max

As the name suggests, min() and max() return the minimum and maximum element in the stream respectively, based on a comparator. They return an Optional since a result may or may not exist (due to, say, filtering):

@Test
public void whenFindMin_thenGetMinElementFromStream() {
    Employee firstEmp = empList.stream()
      .min((e1, e2) -> e1.getId() - e2.getId())
      .orElseThrow(NoSuchElementException::new);

    assertEquals(firstEmp.getId(), new Integer(1));
}

We can also avoid defining the comparison logic by using Comparator.comparing():

@Test
public void whenFindMax_thenGetMaxElementFromStream() {
    Employee maxSalEmp = empList.stream()
      .max(Comparator.comparing(Employee::getSalary))
      .orElseThrow(NoSuchElementException::new);

    assertEquals(maxSalEmp.getSalary(), new Double(300000.0));
}

distinct

distinct() does not take any argument and returns the distinct elements in the stream, eliminating duplicates. It uses the equals() method of the elements to decide whether two elements are equal or not:

@Test
public void whenApplyDistinct_thenRemoveDuplicatesFromStream() {
    List intList = Arrays.asList(2, 5, 3, 2, 4, 3);
    List distinctIntList = intList.stream().distinct().collect(Collectors.toList());

    assertEquals(distinctIntList, Arrays.asList(2, 5, 3, 4));
}

allMatch, anyMatch, and noneMatch

These operations all take a predicate and return a boolean. Short-circuiting is applied and processing is stopped as soon as the answer is determined:

@Test
public void whenApplyMatch_thenReturnBoolean() {
    List intList = Arrays.asList(2, 4, 5, 6, 8);

    boolean allEven = intList.stream().allMatch(i -> i % 2 == 0);
    boolean oneEven = intList.stream().anyMatch(i -> i % 2 == 0);
    boolean noneMultipleOfThree = intList.stream().noneMatch(i -> i % 3 == 0);

    assertEquals(allEven, false);
    assertEquals(oneEven, true);
    assertEquals(noneMultipleOfThree, false);
}

allMatch() checks if the predicate is true for all the elements in the stream. Here, it returns false as soon as it encounters 5, which is not divisible by 2.

anyMatch() checks if the predicate is true for any one element in the stream. Here, again short-circuiting is applied and true is returned immediately after the first element.

noneMatch() checks if there are no elements matching the predicate. Here, it simply returns false as soon as it encounters 6, which is divisible by 3.

Stream Specializations

From what we discussed so far, Stream is a stream of object references. However, there are also the IntStreamLongStream, and DoubleStream – which are primitive specializations for intlong and double respectively. These are quite convenient when dealing with a lot of numerical primitives.

These specialized streams do not extend Stream but extend BaseStream on top of which Stream is also built.

As a consequence, not all operations supported by Stream are present in these stream implementations. For example, the standard min() and max() take a comparator, whereas the specialized streams do not.

Creation

The most common way of creating an IntStream is to call mapToInt() on an existing stream:

@Test
public void whenFindMaxOnIntStream_thenGetMaxInteger() {
    Integer latestEmpId = empList.stream()
      .mapToInt(Employee::getId)
      .max()
      .orElseThrow(NoSuchElementException::new);

    assertEquals(latestEmpId, new Integer(3));
}

Here, we start with a Stream and get an IntStream by supplying the Employee::getId to mapToInt. Finally, we call max() which returns the highest integer.

We can also use IntStream.of() for creating the IntStream:

IntStream.of(1, 2, 3);

or IntStream.range():

IntStream.range(10, 20)

which creates IntStream of numbers 10 to 19.

One important distinction to note before we move on to the next topic:

Stream.of(1, 2, 3)

This returns a Stream and not IntStream.

Similarly, using map() instead of mapToInt() returns a Stream and not an IntStream.:

empList.stream().map(Employee::getId);

Specialized Operations

Specialized streams provide additional operations as compared to the standard Stream – which are quite convenient when dealing with numbers.

For example sum(), average(), range() etc:

@Test
public void whenApplySumOnIntStream_thenGetSum() {
    Double avgSal = empList.stream()
      .mapToDouble(Employee::getSalary)
      .average()
      .orElseThrow(NoSuchElementException::new);

    assertEquals(avgSal, new Double(200000));
}

Reduction Operations

A reduction operation (also called as fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation. We already saw few reduction operations like findFirst()min() and max().

Let’s see the general-purpose reduce() operation in action.

reduce

The most common form of reduce() is:

T reduce(T identity, BinaryOperator accumulator)

where identity is the starting value and accumulator is the binary operation we repeated apply.

For example:

@Test
public void whenApplyReduceOnStream_thenGetValue() {
    Double sumSal = empList.stream()
      .map(Employee::getSalary)
      .reduce(0.0, Double::sum);

    assertEquals(sumSal, new Double(600000));
}

Here, we start with the initial value of 0 and repeated apply Double::sum() on elements of the stream. Effectively we’ve implemented the DoubleStream.sum() by applying reduce() on Stream.

Advanced collect

We already saw how we used Collectors.toList() to get the list out of the stream. Let’s now see few more ways to collect elements from the stream.

joining

@Test
public void whenCollectByJoining_thenGetJoinedString() {
    String empNames = empList.stream()
      .map(Employee::getName)
      .collect(Collectors.joining(", "))
      .toString();

    assertEquals(empNames, "Jeff Bezos, Bill Gates, Mark Zuckerberg");
}

Collectors.joining() will insert the delimiter between the two String elements of the stream. It internally uses a java.util.StringJoiner to perform the joining operation.

toSet

We can also use toSet() to get a set out of stream elements:

@Test
public void whenCollectBySet_thenGetSet() {
    Set empNames = empList.stream()
            .map(Employee::getName)
            .collect(Collectors.toSet());

    assertEquals(empNames.size(), 3);
}

toCollection

We can use Collectors.toCollection() to extract the elements into any other collection by passing in a Supplier. We can also use a constructor reference for the Supplier:

@Test
public void whenToVectorCollection_thenGetVector() {
    Vector empNames = empList.stream()
            .map(Employee::getName)
            .collect(Collectors.toCollection(Vector::new));

    assertEquals(empNames.size(), 3);
}

Here, an empty collection is created internally, and its add() method is called on each element of the stream.

summarizingDouble

summarizingDouble() is another interesting collector – which applies a double-producing mapping function to each input element and returns a special class containing statistical information for the resulting values:

@Test
public void whenApplySummarizing_thenGetBasicStats() {
    DoubleSummaryStatistics stats = empList.stream()
      .collect(Collectors.summarizingDouble(Employee::getSalary));

    assertEquals(stats.getCount(), 3);
    assertEquals(stats.getSum(), 600000.0, 0);
    assertEquals(stats.getMin(), 100000.0, 0);
    assertEquals(stats.getMax(), 300000.0, 0);
    assertEquals(stats.getAverage(), 200000.0, 0);
}

Notice how we can analyze the salary of each employee and get statistical information on that data – such as min, max, average etc.

summaryStatistics() can be used to generate similar result when we’re using one of the specialized streams:

@Test
public void whenApplySummaryStatistics_thenGetBasicStats() {
    DoubleSummaryStatistics stats = empList.stream()
      .mapToDouble(Employee::getSalary)
      .summaryStatistics();

    assertEquals(stats.getCount(), 3);
    assertEquals(stats.getSum(), 600000.0, 0);
    assertEquals(stats.getMin(), 100000.0, 0);
    assertEquals(stats.getMax(), 300000.0, 0);
    assertEquals(stats.getAverage(), 200000.0, 0);
}

partitioningBy

We can partition a stream into two – based on whether the elements satisfy certain criteria or not.

Let’s split our List of numerical data, into even and ods:

@Test
public void whenStreamPartition_thenGetMap() {
    List intList = Arrays.asList(2, 4, 5, 6, 8);
    Map<Boolean, List> isEven = intList.stream().collect(
      Collectors.partitioningBy(i -> i % 2 == 0));

    assertEquals(isEven.get(true).size(), 4);
    assertEquals(isEven.get(false).size(), 1);
}

Here, the stream is partitioned into a Map, with even and odds stored as true and false keys.

groupingBy

groupingBy() offers advanced partitioning – where we can partition the stream into more than just two groups.

It takes a classification function as its parameter. This classification function is applied to each element of the stream.

The value returned by the function is used as a key to the map that we get from the groupingBy collector:

@Test
public void whenStreamGroupingBy_thenGetMap() {
    Map<Character, List> groupByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0))));

    assertEquals(groupByAlphabet.get('B').get(0).getName(), "Bill Gates");
    assertEquals(groupByAlphabet.get('J').get(0).getName(), "Jeff Bezos");
    assertEquals(groupByAlphabet.get('M').get(0).getName(), "Mark Zuckerberg");
}

In this quick example, we grouped the employees based on the initial character of their first name.

mapping

groupingBy() discussed in the section above, groups elements of the stream with the use of a Map.

However, sometimes we might need to group data into a type other than the element type.

Here’s how we can do that; we can use mapping() which can actually adapt the collector to a different type – using a mapping function:

@Test
public void whenStreamMapping_thenGetMap() {
    Map<Character, List> idGroupedByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0)),
        Collectors.mapping(Employee::getId, Collectors.toList())));

    assertEquals(idGroupedByAlphabet.get('B').get(0), new Integer(2));
    assertEquals(idGroupedByAlphabet.get('J').get(0), new Integer(1));
    assertEquals(idGroupedByAlphabet.get('M').get(0), new Integer(3));
}

Here mapping() maps the stream element Employee into just the employee id – which is an Integer – using the getId() mapping function. These ids are still grouped based on the initial character of employee first name.

reducing

reducing() is similar to reduce() – which we explored before. It simply returns a collector which performs a reduction of its input elements:

@Test
public void whenStreamReducing_thenGetValue() {
    Double percentage = 10.0;
    Double salIncrOverhead = empList.stream().collect(Collectors.reducing(
        0.0, e -> e.getSalary() * percentage / 100, (s1, s2) -> s1 + s2));

    assertEquals(salIncrOverhead, 60000.0, 0);
}

Here reducing() gets the salary increment of each employee and returns the sum.

reducing() is most useful when used in a multi-level reduction, downstream of groupingBy() or partitioningBy(). To perform a simple reduction on a stream, use reduce() instead.

For example, let’s see how we can use reducing() with groupingBy():

@Test
public void whenStreamGroupingAndReducing_thenGetMap() {
    Comparator byNameLength = Comparator.comparing(Employee::getName);

    Map<Character, Optional> longestNameByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0)),
        Collectors.reducing(BinaryOperator.maxBy(byNameLength))));

    assertEquals(longestNameByAlphabet.get('B').get().getName(), "Bill Gates");
    assertEquals(longestNameByAlphabet.get('J').get().getName(), "Jeff Bezos");
    assertEquals(longestNameByAlphabet.get('M').get().getName(), "Mark Zuckerberg");
}

Here we group the employees based on the initial character of their first name. Within each group, we find the employee with the longest name.

Parallel Streams

Using the support for parallel streams, we can perform stream operations in parallel without having to write any boilerplate code; we just have to designate the stream as parallel:

@Test
public void whenParallelStream_thenPerformOperationsInParallel() {
    Employee[] arrayOfEmps = {
      new Employee(1, "Jeff Bezos", 100000.0),
      new Employee(2, "Bill Gates", 200000.0),
      new Employee(3, "Mark Zuckerberg", 300000.0)
    };

    List empList = Arrays.asList(arrayOfEmps);

    empList.stream().parallel().forEach(e -> e.salaryIncrement(10.0));

    assertThat(empList, contains(
      hasProperty("salary", equalTo(110000.0)),
      hasProperty("salary", equalTo(220000.0)),
      hasProperty("salary", equalTo(330000.0))
    ));
}

Here salaryIncrement() would get executed in parallel on multiple elements of the stream, by simply adding the parallel() syntax.

This functionality can, of course, be tuned and configured further, if you need more control over the performance characteristics of the operation.

As is the case with writing multi-threaded code, we need to be aware of few things while using parallel streams:

  1. We need to ensure that the code is thread-safe. Special care needs to be taken if the operations performed in parallel modifies shared data.
  2. We should not use parallel streams if the order in which operations are performed or the order returned in the output stream matters. For example operations like findFirst() may generate the different result in case of parallel streams.
  3. Also, we should ensure that it is worth making the code execute in parallel. Understanding the performance characteristics of the operation in particular, but also of the system as a whole – is naturally very important here.

Infinite Streams

Sometimes, we might want to perform operations while the elements are still getting generated. We might not know beforehand how many elements we’ll need. Unlike using list or map, where all the elements are already populated, we can use infinite streams, also called as unbounded streams.

There are two ways to generate infinite streams:

generate

We provide a Supplier to generate() which gets called whenever new stream elements need to be generated:

@Test
public void whenGenerateStream_thenGetInfiniteStream() {
    Stream.generate(Math::random)
      .limit(5)
      .forEach(System.out::println);
}

Here, we pass Math::random() as a Supplier, which returns the next random number.

With infinite streams, we need to provide a condition to eventually terminate the processing. One common way of doing this is using limit(). In above example, we limit the stream to 5 random numbers and print them as they get generated.

Please note that the Supplier passed to generate() could be stateful and such stream may not produce the same result when used in parallel.

iterate

iterate() takes two parameters: an initial value, called seed element and a function which generates next element using the previous value. iterate(), by design, is stateful and hence may not be useful in parallel streams:

@Test
public void whenIterateStream_thenGetInfiniteStream() {
    Stream evenNumStream = Stream.iterate(2, i -> i * 2);

    List collect = evenNumStream
      .limit(5)
      .collect(Collectors.toList());

    assertEquals(collect, Arrays.asList(2, 4, 8, 16, 32));
}

Here, we pass 2 as the seed value, which becomes the first element of our stream. This value is passed as input to the lambda, which returns 4. This value, in turn, is passed as input in the next iteration.

This continues until we generate the number of elements specified by limit() which acts as the terminating condition.

File Operations

Let’s see how we could use the stream in file operations.

File Write Operation

@Test
public void whenStreamToFile_thenGetFile() throws IOException {
    String[] words = {
      "hello",
      "refer",
      "world",
      "level"
    };

    try (PrintWriter pw = new PrintWriter(
      Files.newBufferedWriter(Paths.get(fileName)))) {
        Stream.of(words).forEach(pw::println);
    }
}

Here we use forEach() to write each element of the stream into the file by calling PrintWriter.println().

File Read Operation

private List getPalindrome(Stream stream, int length) {
    return stream.filter(s -> s.length() == length)
      .filter(s -> s.compareToIgnoreCase(
        new StringBuilder(s).reverse().toString()) == 0)
      .collect(Collectors.toList());
}

@Test
public void whenFileToStream_thenGetStream() throws IOException {
    List str = getPalindrome(Files.lines(Paths.get(fileName)), 5);
    assertThat(str, contains("refer", "level"));
}

Here Files.lines() returns the lines from the file as a Stream which is consumed by the getPalindrome() for further processing.

getPalindrome() works on the stream, completely unaware of how the stream was generated. This also increases code reusability and simplifies unit testing.

Conclusion

In this article, we focused on the details of the new Stream functionality in Java 8. We saw various operations supported and how lambdas and pipelines can be used to write concise code.

We also saw some characteristics of streams like lazy evaluation, parallel and infinite streams.

Functional Programming in Java8 Using Lambda Expressions

2 copy.pngLambda expressions are the most talked feature of Java 8. Lambda expressions drastically change the way we write the code in Java 8 as compared to what we used to, in older versions.

Let’ understand what lambda expression is and how it works.

At first, you could think about lambda expressions as a way of supporting functional programming in Java. Functional programming is a paradigm that allows programming using expressions i.e. declaring functions, passing functions as arguments and using functions as statements (rightly called expressions in Java8). If you want to learn more about why it is called Lambda expressions, you could try to understand what Lambda Calculus is.

If you have worked on groovy and any other programming language which supports functional programming, comprehending Lambda expressions shouldn’t be a big deal. However, there is more about Lambda expressions than just being methods getting passed along with other methods.

Consider the following example to know more about Lambda Expressions.

Suppose you have to write a program that welcomes you into a different world of programming language.

A simple implementation could be as:

public class WelcomeToProgrammingWorld {
public static void main(String[] args) {
System.out.println("Welcome to world of Java Programming Language!");
}
}

Now, what if I also need to welcome you in Groovy and Python. A simple code could be as below:

public class WelcomeToProgrammingWorld {
    public static void main(String[] args) {
        System.out.println("Welcome to world of Java Programming Language!");
        System.out.println("Welcome to world of Groovy Programming Language!");
        System.out.println("Welcome to world of Python Programming Language!");
    }
}

It’s not a correct approach. You shall be thinking about writing some generic implementation. You either create some method as below:

public class WelcomeToProgrammingWorld {
    public static void main(String[] args) {
        welcomeToALanguage("Java");
        welcomeToALanguage("Groovy");
        welcomeToALanguage("Scala");
    }
    static void welcomeToALanguage(String language) {
        System.out.println("Welcome to world of " + language + "Programming Language");
    }
}

Though the code looks concise, we could write it as follows:

public class WelcomeToProgrammingWorld {
    public static void main(String[] args) {
        WelcomeToALanguage welcomeToALanguage = new WelcomeToALanguage();
        welcomeToALanguage.welcome("Java");
        welcomeToALanguage.welcome("Groovy");
        welcomeToALanguage.welcome("Scala");
    }
}
interface Welcome {
    abstract void welcome(String string);
}
class WelcomeToALanguage implements Welcome {
    @Override
    public void welcome(String language) {
        System.out.println("Welcome to world of " + language + "Programming Language");
    }
}

Notice that we now have an interface, Welcome which could have multiple implementations for different ways to welcome. We have created a concrete class WelcomeToALanguage implementing the Welcome interface. Still, the output remains same, but the code is more generic and allows more ways of welcoming.

Here, if you have to welcome something differently, you need to provide another implementation and so on.

Another solution here could be to use Anonymous inner class to avoid creating multiple concrete classes for various implementations.

public class WelcomeToProgrammingWorld {
    public static void main(String[] args) {
        Welcome welcomeToALanguage = new Welcome() {
            @Override
            public void welcome(String language) {
                System.out.println("Welcome to world of " + language + "Programming Language");
            }
        };
        welcomeToALanguage.welcome("Java");
        welcomeToALanguage.welcome("Groovy");
        welcomeToALanguage.welcome("Scala");
    }
}
interface Welcome {
    abstract void welcome(String string);
}

So, now we have a robust and generic code which allows declaring and using different implementations with anonymous inner class.

Now, let us understand Lambda expression and its implementation in the above example.

An implementation for a method say :

public void welcome(String language) {
      System.out.println("Welcome to world of " + language + "Programming Language");
}

will look like below in a lambda expression:

(String language) -> {
       System.out.println("Welcome to world of " + language + "Programming Language");
}

You may want to forget about the assignment to some reference variable for now (and compilation too).

If you notice the difference:-

1. We have removed access specifier and return type as is intuitive for compiler by method definition.
2. We have a new operator -> which is called Lambda expression i.e. it’s the separator between input and output.

Rest of the code remains the same.

Rule 1: If you have only one statement inside the expression, then you don’t need curly braces. Hence, the above code would look like:

(String language) -> System.out.println("Welcome to world of " + language + "Programming Language");

It is better than the actual method call.

As you have only one parameter, you could remove brackets and input parameter type in the input argument which is very brief.

language -> System.out.println("Welcome to world of " + language + "Programming Language");

Remember till now we have not discussed any reference variable to reference a Lambda expression.

So, Java probably might have provided some new class like Function, but that’s not the case. Java says that to assign a lambda expression you need to have an interface with only one abstract method and the compiler will automatically detect and assign it to that Interface reference variable.

An interface with a single method is known as a Functional Interface.

Let us now try to modify our code for the problem that we initially solved using interface Welcome.

public class WelcomeToProgrammingWorld {
    public static void main(String[] args) {
        Welcome welcomeToALanguage =  language -> System.out.println("Welcome to world of " + language + "Programming Language");
        welcomeToALanguage.welcome("Java");
        welcomeToALanguage.welcome("Groovy");
        welcomeToALanguage.welcome("Scala");
    }
}
interface Welcome {
    abstract void welcome(String string);
}

You can see that we haven’t declared any anonymous inner class, but declared only a Lambda expression assigned to Welcome’s reference variable, and output is the same, but the code is super crisp.

Several people on the web, hold an opinion that Lambda expression is an Anonymous inner class which is untrue. Anonymous inner class is an utterly different concept. A simple way to prove this is to print the object welcomeToALanguage’s reference when we are using an inner class and when we are using the lambda expression. The output will be for example;

java8.lambda.expressions.WelcomeToProgrammingWorld$1@14ae5a5 

for inner class and;

java8.lambda.expressions.WelcomeToProgrammingWorld$$Lambda$1/1096979270@682a0b20

for the lambda expression.

You can see a significant difference here. Though an anonymous inner class with one method is closer to a lambda expression and these expressions, appear like a syntactic sugar, but there is more underneath.

One thing is sure that “Lambda expression seamlessly binds itself with the existing Java ecosystem.”

Before winding up this blog, have a look at some examples of Lambda expression:

A lambda expression with no parameters

() -> System.out.println("Hi!Lambda Expressions is great.");

A lambda expression with one parameter

(String s) -> System.out.println("Hi!Lambda Expressions is great."+s);

Another lambda expression with one parameter

s -> System.out.println("Hi!Lambda Expressions is great."+s);

A lambda expression with 2 parameters (Note parantesis is required for 2 or more params)

(s1, s2) -> System.out.println("Hi!Lambda Expressions is great." + s1 + s2);

Some of its advantages are enlisted below:

1. Less boilerplate and more concise code.

2. Enables functional programming. You could read more about Functional programming, Lambda Calculus and Monads to understand Lambda expressions and how these enable functional programming.

3. Binds seamlessly with stream API & existing Java ecosystem where implementation varies for different streams.

Hope, you will now be able to understand a basic use and syntax of Lambda expressions. I would be writing about Type Inference and Java Functional Interface API for Lambda expressions soon.

Docker Commands with Examples

docker-cloud-twitter-card.png

 

Docker is a containerization system which packages and runs the application with its dependencies inside a container. There are several docker commands you must know when working with Docker. This article is all about that.

1. Finding the version

One of the first things you want to know is how to find the installed docker version.

saten@satender:/home/satender$ docker --version

Docker version 18.09.6, build 481bc77

2. Downloading image

Let’s say you need to pull the docker image from dockerhub (docker repository). The following example of pulling the Apache HTTP server image.

saten@satender:/home/satender$ docker pull httpd

Using default tag: latest

latest: Pulling from library/httpd

f5d23c7fed46: Pull complete

b083c5fd185b: Pull complete

bf5100a89e78: Pull complete

98f47fcaa52f: Pull complete

622a9dd8cfed: Pull complete

Digest: sha256:8bd76c050761610773b484e411612a31f299dbf7273763103edbda82acd73642

Status: Downloaded newer image for httpd:latest

saten@satender:/home/satender$

3. Images

List all the docker images pulled on the system with image details such as TAG/IMAGE ID/SIZE etc.

saten@satender:/home/satender$ docker images

REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE

httpd                      latest              ee39f68eb241        2 days ago          154MB

hello-world                latest              fce289e99eb9        6 months ago        1.84kB

sequenceiq/hadoop-docker   2.7.0               789fa0a3b911        4 years ago         1.76GB

4. Run

Run the docker image mentioned in the command. This command will create a docker container in which the Apache HTTP server will run.

saten@satender:/home/satender$ docker run -it -d httpd

09ca6feb6efc0578951a3e2557ed5855b2edda39a795d9703eb54d975930fe6e

5. What’s running?

ps lists all the docker containers are running with container details.

saten@satender:/home/satender$ docker ps

CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS               NAMES

09ca6feb6efc        httpd               "httpd-foreground"   36 seconds ago      Up 33 seconds       80/tcp              suspicious_bell

As you can see, the Apache server is running in this docker container.

6. ps -a

List all the docker containers running/exited/stopped with container details.

saten@satender:/home/satender$ docker ps -a

CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS                     PORTS                                                                                                                                NAMES

09ca6feb6efc        httpd                            "httpd-foreground"       51 seconds ago      Up 49 seconds              80/tcp                                                                                                                               suspicious_bell

2f6fb3381078        sequenceiq/hadoop-docker:2.7.0   "/etc/bootstrap.sh -d"   2 weeks ago         Exited (137) 9 days ago                                                                                                                                         quizzical_raman

9f397feb3a46        sequenceiq/hadoop-docker:2.7.0   "/etc/bootstrap.sh -…"   2 weeks ago         Exited (255) 2 weeks ago   2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   determined_ritchie

9b6343d3b5a0        hello-world                      "/hello"                 2 weeks ago         Exited (0) 2 weeks ago                                                                                                                                          peaceful_mclean

7. exec

Access the docker container and run commands inside the container. I am accessing the apache server container in this example.

saten@satender:/home/satender$ docker exec -it 09ca6feb6efc bash

root@09ca6feb6efc:/usr/local/apache2# ls

bin  build  cgi-bin  conf  error  htdocs  icons  include  logs                modules

root@09ca6feb6efc:/usr/local/apache2#

Type exit and press enter to come out of the container.

8. Removing container

Remove the docker container with container id mentioned in the command.

saten@satender:/home/satender$ docker rm 9b6343d3b5a0

9b6343d3b5a0

Run the below command to check if the container got removed or not.

saten@satender:/home/satender$ docker ps -a

CONTAINER ID        IMAGE                            COMMAND                  CREATED              STATUS                     PORTS                                                                                                                                NAMES

09ca6feb6efc        httpd                            "httpd-foreground"       About a minute ago   Up About a minute          80/tcp                                                                                                                               suspicious_bell

2f6fb3381078        sequenceiq/hadoop-docker:2.7.0   "/etc/bootstrap.sh -d"   2 weeks ago          Exited (137) 9 days ago                                                                                                                                         quizzical_raman

9f397feb3a46        sequenceiq/hadoop-docker:2.7.0   "/etc/bootstrap.sh -…"   2 weeks ago          Exited (255) 2 weeks ago   2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   determined_ritchie

9. Removing image

Remove the docker image with the docker image id mentioned in the command

saten@satender:/home/satender$ docker rmi fce289e99eb9

Untagged: hello-world:latest

Untagged: hello-world@sha256:41a65640635299bab090f783209c1e3a3f11934cf7756b09cb2f1e02147c6ed8

Deleted: sha256:fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e

Deleted: sha256:af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3

saten@satender:/home/satender$

10. Restart Docker

Restart the docker container with container id mentioned in the command.

saten@satender:/home/satender$ docker restart 09ca6feb6efc

09ca6feb6efc

Run the command below and check the STATUS parameter to verify if the container started recently.

saten@satender:/home/satender$ docker ps

CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS               NAMES

09ca6feb6efc        httpd               "httpd-foreground"   6 minutes ago       Up 9 seconds        80/tcp              suspicious_bell

11. Stopping Docker

Stop a container with container id mentioned in the command.

saten@satender:/home/satender$ docker stop 09ca6feb6efc

09ca6feb6efc

Run the below command to check if the container is still running or it has stopped.

saten@satender:/home/satender$ docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

12. Starting Docker

This command in docker starts the docker container with container id mentioned in the command.

saten@satender:/home/satender$ docker start 09ca6feb6efc

09ca6feb6efc

Run the command below to check if the container started or not.

saten@satender:/home/satender$ docker ps

CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS               NAMES

09ca6feb6efc        httpd               "httpd-foreground"   8 minutes ago       Up 3 seconds        80/tcp              suspicious_bell

13. Kill

Stop the docker container immediately. Docker stop command stops the container gracefully, that’s the difference between a kill and stop commands.

saten@satender:/home/satender$ docker kill 09ca6feb6efc

09ca6feb6efc

Run the below command to see if the container got killed or not.

saten@satender:/home/satender$ docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

14. Commit

Save a new docker image with container id mentioned in the command on the local system. In the example below, satender is the username, and httpd_image is the image name.

saten@satender:/home/satender$ docker commit 09ca6feb6efc satender/httpd_image

sha256:d1933506f4c1686ab1a1ec601b1a03a17b41decbc21d8acd893db090a09bb31c

15. Login

Login into docker hub. You will be asked your docker hub credentials to log in.

saten@satender:/home/satender$ docker login

Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.

Username: satender

Password:

Configure a credential helper to remove this warning. See

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

16. Push

Upload a docker image with the image name mentioned in the command on the dockerhub.

saten@satender:/home/satender$ docker push satender/httpd_image

The push refers to repository

734d9104a6a2: Pushed

635721fc6973: Mounted from library/httpd

bea448567d6c: Mounted from library/httpd

bfaa5f9c3b51: Mounted from library/httpd

9d542ac296cc: Mounted from library/httpd

d8a33133e477: Mounted from library/httpd

latest: digest: sha256:3904662761df9d76ef04ddfa5cfab764b85e3eedaf10071cfbe2bf77254679ac size: 1574

17. Docker network

The following command in docker lists the details of all the network in the cluster.

saten@satender:/home/satender$ docker network ls

NETWORK ID          NAME                DRIVER              SCOPE

85083e766f04        bridge              bridge              local

f51d1f3379e0        host                host                local

5e5d9a192c00        none                null                local

There are several other docker network commands.

saten@satender:/home/satender$ docker network

Usage:  docker network COMMAND

Manage networks

Commands:

connect     Connect a container to a network

create      Create a network

disconnect  Disconnect a container from a network

inspect     Display detailed information on one or more networks

ls          List networks

prune       Remove all unused networks

rm          Remove one or more networks

Run 'docker network COMMAND --help' for more information on a command.

18. Docker info

Get detailed information about docker installed on the system including the kernel version, number of containers and images, etc.

saten@satender:/home/satender$ docker info

Containers: 3

Running: 1

Paused: 0

Stopped: 2

Images: 3

Server Version: 18.09.6

Storage Driver: overlay2

Backing Filesystem: extfs

Supports d_type: true

Native Overlay Diff: true

Logging Driver: json-file

Cgroup Driver: cgroupfs

Plugins:

Volume: local

Network: bridge host macvlan null overlay

Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog

Swarm: inactive

Runtimes: runc

Default Runtime: runc

Init Binary: docker-init

containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84

runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30

init version: fec3683

Security Options:

apparmor

seccomp

Profile: default

Kernel Version: 4.18.0-25-generic

Operating System: Ubuntu 18.10

OSType: linux

Architecture: x86_64

CPUs: 1

Total Memory: 4.982GiB

Name: satender

ID: RBCP:YGAP:QG6H:B6XH:JCT2:DTI5:AYJA:M44Z:ETRP:6TO6:OPAY:KLNJ

Docker Root Dir: /var/lib/docker

Debug Mode (client): false

Debug Mode (server): false

Username: satender

Registry: https://index.docker.io/v1/

Labels:

Experimental: false

Insecure Registries:

127.0.0.0/8

Live Restore Enabled: false

Product License: Community Engine

19. Copying file

Copy a file from a docker container to the local system.

In this example, I am copying httpd.pid file inside a docker container with id 09ca6feb6efc to /home/satender/

saten@satender:/home/satender$ sudo docker cp 09ca6feb6efc:/usr/local/apache2/logs/httpd.pid /home/satender/

[sudo] password for satender:

Run the command below to check if the file got copied or not.

saten@satender:/home/satender$ ls

Desktop  Documents  example  examples.desktop  httpd.pid  nginx_new.yml  nginx.yml

20. Checking history

Shows the history of a docker image with the image name mentioned in the command.

saten@satender:/home/satender$ docker history httpd

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT

ee39f68eb241        2 days ago          /bin/sh -c #(nop)  CMD ["httpd-foreground"]     0B

<missing>           2 days ago          /bin/sh -c #(nop)  EXPOSE 80                    0B

<missing>           2 days ago          /bin/sh -c #(nop) COPY file:c432ff61c4993ecd…   138B

<missing>           4 days ago          /bin/sh -c set -eux;   savedAptMark="$(apt-m…   49.1MB

<missing>           4 days ago          /bin/sh -c #(nop)  ENV HTTPD_PATCHES=           0B

<missing>           4 days ago          /bin/sh -c #(nop)  ENV HTTPD_SHA256=b4ca9d05…   0B

<missing>           4 days ago          /bin/sh -c #(nop)  ENV HTTPD_VERSION=2.4.39     0B

<missing>           4 days ago          /bin/sh -c set -eux;  apt-get update;  apt-g…   35.4MB

<missing>           4 days ago          /bin/sh -c #(nop) WORKDIR /usr/local/apache2    0B

<missing>           4 days ago          /bin/sh -c mkdir -p "$HTTPD_PREFIX"  && chow…   0B

<missing>           4 days ago          /bin/sh -c #(nop)  ENV PATH=/usr/local/apach…   0B

<missing>           4 days ago          /bin/sh -c #(nop)  ENV HTTPD_PREFIX=/usr/loc…   0B

<missing>           5 days ago          /bin/sh -c #(nop)  CMD ["bash"]                 0B

<missing>           5 days ago          /bin/sh -c #(nop) ADD file:71ac26257198ecf6a…   69.2MB

21. Checking logs

Show the logs of the docker container with contained id mentioned in the command.

saten@satender:/home/satender$ docker logs 09ca6feb6efc

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

[Mon Jul 15 14:01:55.400472 2019] [mpm_event:notice] [pid 1:tid 140299791516800] AH00489: Apache/2.4.39 (Unix) configured -- resuming normal operations

[Mon Jul 15 14:01:55.400615 2019] [core:notice] [pid 1:tid 140299791516800] AH00094: Command line: 'httpd -D FOREGROUND'

[Mon Jul 15 14:08:36.798229 2019] [mpm_event:notice] [pid 1:tid 140299791516800] AH00491: caught SIGTERM, shutting down

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

[Mon Jul 15 14:08:38.259870 2019] [mpm_event:notice] [pid 1:tid 139974087980160] AH00489: Apache/2.4.39 (Unix) configured -- resuming normal operations

[Mon Jul 15 14:08:38.260007 2019] [core:notice] [pid 1:tid 139974087980160] AH00094: Command line: 'httpd -D FOREGROUND'

[Mon Jul 15 14:09:01.540647 2019] [mpm_event:notice] [pid 1:tid 139974087980160] AH00491: caught SIGTERM, shutting down

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

[Mon Jul 15 14:10:43.782606 2019] [mpm_event:notice] [pid 1:tid 140281554879616] AH00489: Apache/2.4.39 (Unix) configured -- resuming normal operations

[Mon Jul 15 14:10:43.782737 2019] [core:notice] [pid 1:tid 140281554879616] AH00094: Command line: 'httpd -D FOREGROUND'

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message

[Mon Jul 15 14:14:08.270906 2019] [mpm_event:notice] [pid 1:tid 140595254346880] AH00489: Apache/2.4.39 (Unix) configured -- resuming normal operations

[Mon Jul 15 14:14:08.272628 2019] [core:notice] [pid 1:tid 140595254346880] AH00094: Command line: 'httpd -D FOREGROUND'

22. Searching image

Search for a docker image on dockerhub with the name mentioned in the command.

saten@satender:/home/satender$ docker search hadoop

NAME                             DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED

sequenceiq/hadoop-docker         An easy way to try Hadoop                       611                                     [OK]

uhopper/hadoop                   Base Hadoop image with dynamic configuration…   98                                      [OK]

harisekhon/hadoop                Apache Hadoop (HDFS + Yarn, tags 2.2 - 2.8)     54                                      [OK]

bde2020/hadoop-namenode          Hadoop namenode of a hadoop cluster             22                                      [OK]

kiwenlau/hadoop                  Run Hadoop Cluster in Docker Containers         19

izone/hadoop                     Hadoop 2.8.5 Ecosystem fully distributed, Ju…   14                                      [OK]

uhopper/hadoop-namenode          Hadoop namenode                                 9                                       [OK]

bde2020/hadoop-datanode          Hadoop datanode of a hadoop cluster             9                                       [OK]

singularities/hadoop             Apache Hadoop                                   8                                       [OK]

uhopper/hadoop-datanode          Hadoop datanode                                 7                                       [OK]

harisekhon/hadoop-dev            Apache Hadoop (HDFS + Yarn) + Dev Tools + Gi…   6                                       [OK]

23. Updating configuration

Update container configurations. This shows all the update options.

saten@satender:/home/saten$ docker update --help

Usage:  docker update [OPTIONS] CONTAINER [CONTAINER...]

Update configuration of one or more containers

Options:

--blkio-weight uint16        Block IO (relative weight), between 10 and 1000, or 0 to disable

(default 0)

--cpu-period int             Limit CPU CFS (Completely Fair Scheduler) period

--cpu-quota int              Limit CPU CFS (Completely Fair Scheduler) quota

--cpu-rt-period int          Limit the CPU real-time period in microseconds

--cpu-rt-runtime int         Limit the CPU real-time runtime in microseconds

-c, --cpu-shares int             CPU shares (relative weight)

--cpus decimal               Number of CPUs

--cpuset-cpus string         CPUs in which to allow execution (0-3, 0,1)

--cpuset-mems string         MEMs in which to allow execution (0-3, 0,1)

--kernel-memory bytes        Kernel memory limit

-m, --memory bytes               Memory limit

--memory-reservation bytes   Memory soft limit

--memory-swap bytes          Swap limit equal to memory plus swap: '-1' to enable unlimited swap

--restart string             Restart policy to apply when a container exits

Run the below command to update the CPU configuration of docker container with container id mentioned in the command.

saten@satender:/home/satender$ docker update -c 1 2f6fb3381078

2f6fb3381078

24. Creating volume

Create a volume which docker container will use to store data.

saten@satender:/home/satender$ docker volume create

7e7bc886f69bb24dbdbf19402e31102a25db91bb29c56cca3ea8b0c611fd9ad0

Run the below command if the volume got created or not.

saten@satender:/home/satender$ docker volume ls

DRIVER              VOLUME NAME

local               7e7bc886f69bb24dbdbf19402e31102a25db91bb29c56cca3ea8b0c611fd9ad0

25. Installing plugin

Install a docker plugin vieux/sshfs with debug environment set to 1.

saten@satender:/home/satender$ docker plugin install vieux/sshfs DEBUG=1

Plugin "vieux/sshfs" is requesting the following privileges:

- network: [host]

- mount: [/var/lib/docker/plugins/]

- mount: []

- device: [/dev/fuse]

- capabilities: [CAP_SYS_ADMIN]

Do you grant the above permissions? [y/N] y

latest: Pulling from vieux/sshfs

52d435ada6a4: Download complete

Digest: sha256:1d3c3e42c12138da5ef7873b97f7f32cf99fb6edde75fa4f0bcf9ed277855811

Status: Downloaded newer image for vieux/sshfs:latest

Installed plugin vieux/sshfs

Run the below command to list the docker plugins.

saten@satender:/home/satender$ docker plugin ls

ID                  NAME                 DESCRIPTION               ENABLED

2a32d1fb95af        vieux/sshfs:latest   sshFS plugin for Docker   true

26. Logout

Logging out from dockerhub.

saten@satender:/home/satender$ docker logout

Removing login credentials for https://index.docker.io/v1/

Conclusion

I hope you have got a fair understanding of docker commands by now. Try out those commands in your dev or lab environment to practice and learn.

Gradle Maven Plugin

GradleAndMaven

Gradle can be used to install artifacts to maven repository. Gradle provides support for maven plugin which installs generated artifacts in local or remote repository.

Maven Plugin

To configure maven plugin, we need to add following code in build.gradle configuration file in project root:


apply plugin: 'maven'

Also, maven required groupId and version (which are not present by default).


group = "com.gradle.demo"

version = "1.0"

Install in Local Repository

To install generated artifact in local maven repository execute following command from project root:


gradle -q install

Install in Remote Repository

To install artifact to a remote repository nee to add following task to build configuration:


uploadArchives {

repositories {

mavenDeployer {

repository(url: "file://localhost/tmp/myRepo/")

} } }

If user requires authentication, authentication credentials should be provided:


uploadArchives {

repositories.mavenDeployer {

configuration = configurations.deployerJars

repository(url: "scp://repos.mycompany.com/releases") {

authentication(userName: "me", password: "myPassword")

}   } }

 

Introduction to Gradle

index

What is Gradle?

Gradle is an open source, advanced general purpose build management system. It is built on ANT, Maven, and lvy repositories. It supports Groovy based Domain Specific Language (DSL) over XML.

Download Gradle

You can download gradle from: https://gradle.org/releases/

Or,

follow steps from this location: https://gradle.org/install/

Extract downloaded ZIP file.

Eclipse Configuration

To configure Gradle in Eclipse, download Gradle plugin from Eclipse Marketplace:

gradle-eclipse

Complete the installation.

Create a Gradle Project

  • Go to File -> New -> Projects
  • Select Gradle Project
  • Provide a project name. Click Next.
  • Choose Local Installation Directory and provide path to gradle extract directory.
  • Provide path to JDK home directory path.
  • Click Finish to create project.

Project structure

Initial project structure will look like:

gradle-project-structure

Configuration files

There are two main configuration files for gradle:

  • build.gradle
  • settings.gradle

build.gradle: Defines all configuration required for project build. Initial structure will look like:

build.gradle

settings.gradle: Defines all configuration related to project structure. Required for multi-module projects.

Building the project

From Eclipse: For building the project, you can see gradle build tab available near console tab:

gradle-eclipse-build

Right Click on build task and select Run Gradle Tasks.

See Console tab for logs.

Spring AspectJ Compile-Time and Load-time weaving

The Spring container instantiates and configures beans defined in your application context. Those beans should marked as @Component to be scanned by spring component scanner. Also, its dependencies are marked with @Autowired annotation.

QUESTION ??? What about objects initiated with new operator or initialized by hibernate?

Solution:

First, we need to configure that class first to make it eligible for dependency injection. We need to use @Configurable annotation for this. For example,

import org.springframework.beans.factory.annotation.Configurable;

@Configurable
public class Account {

// ...

}

Second, we need to enable Aspects configuration in context using spring. It can be done in context XML by adding following line:

<context:spring-configured/>

Programmatically, it can be done by providing @EnableSpringConfigured to application @Configuration class. For example,

@Configuration
@EnableSpringConfigured
public class AppConfig {
// ....
} 

Third, We need to use AspectJ weaver for weaving dependencies. Spring provides support to weave dependencies. It can be done with two methods:

  • Compile-Time Weaving
  • Load time weaving

Compile-Time Weaving:

Compile time weaving means annotation will be added at build time or compile-time of code. For a maven project, we need add following dependencies to our pom.xml:

<dependency>
 <groupId>org.springframework</groupId>
 <artifactId>spring-aspects</artifactId>
 </dependency>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjrt</artifactId>
 <version>${aspectj.version}</version>
 </dependency>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjtools</artifactId>
 <version>${aspectj.version}</version>
 </dependency>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjweaver</artifactId>
 <version>${aspectj.version}</version>
 </dependency>

And, aspectj-maven-plugin in plugins. Like,


<plugins>
 <plugin>
 <groupId>org.apache.maven.plugins</groupId>
 <artifactId>maven-compiler-plugin</artifactId>
 <configuration>
 <compilerVersion>${complianceLevel}</compilerVersion>
 </configuration>
 </plugin>
 <plugin>
 <groupId>org.codehaus.mojo</groupId>
 <artifactId>aspectj-maven-plugin</artifactId>
 <configuration>
 <showWeaveInfo>true</showWeaveInfo>
 <source>${complianceLevel}</source>
 <target>${complianceLevel}</target>
 <Xlint>ignore</Xlint>
 <complianceLevel>${complianceLevel}</complianceLevel>
 <encoding>UTF-8</encoding>
 <verbose>true</verbose>
 <aspectLibraries>
 <aspectLibrary>
 <groupId>org.springframework</groupId>
 <artifactId>spring-aspects</artifactId>
 </aspectLibrary>
 </aspectLibraries>
 </configuration>
 <executions>
 <execution>
 <goals>
 <goal>compile</goal>
 <goal>test-compile</goal>
 </goals>
 </execution>
 </executions>
 <dependencies>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjrt</artifactId>
 <version>${aspectj.version}</version>
 </dependency>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjtools</artifactId>
 <version>${aspectj.version}</version>
 </dependency>
 <dependency>
 <groupId>org.aspectj</groupId>
 <artifactId>aspectjweaver</artifactId>
 <version>${aspectj.version}</version>
 </dependency>
 </dependencies>
 </plugin>
 </plugins>

During, the build we can see weaving logs.

Load-time weaving:

Load-time weaving (LTW) refers to the process of weaving AspectJ aspects into an application’s class files as they are being loaded into the Java virtual machine (JVM).

For a maven project, we need to add following code in context file:


<context:load-time-weaver aspectj-weaving="autodetect"/>

Or, add @EnableLoadTimeWeaving annotation in application @Configuration class.

Another thing we need do is to provide javaagent configuration to the environment. For example, for starting an application following code should we added to the JVM arguments:


-javaagent:PATH_TO_MAVEN_REPO_DIR/org/springframework/spring-instrument/{version}/spring-instrument-{version}.jar

or, another JAR can be used for this purpose:


-javaagent:PATH_TO_USER_MAVEN_REPO_DIR/org/aspectj/aspectjweaver/{version}/aspectjweaver-{version}.jar

 

THATS ALL !!!

ENJOY SPRING !!!!

Apache Maven

What is Maven?

Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project’s build, reporting and documentation from a central piece of information.

What Maven does?

  • Making the build process easy
  • Providing a uniform build system
  • Providing quality project information
  • Providing guidelines for best practices development
  • Allowing transparent migration to new features
 Enough theory, Lets do some real work !!!

Installing Maven

Maven can be downloaded from Apache website: https://maven.apache.org/download.cgi
Download  latest version binary ZIP file.

Configure Maven

  • First of all,  extract downloaded ZIP.
  • Set environment path by putting following lines in .bashrc file for Ubuntu system: export M2_HOME=/home/user/apache-maven-3.1.1
    export M2=$M2_HOME/bin

Creating a maven based project

Creating maven project is simple a creating a normal java project. We are using Eclipse here to create a maven project.
  1. Go to New -> Projectmaven_project
  2. Check Create a Simple Project checkboxnew_project
  3. Set project Group ID and artifect IDproject_name
  4. Click finish to create project.
  5. Project structure will look like:                                                                                                                      project_structure
  6. Open pom.xml. It will look likeproject_pom
  7.  Now project is ready for development.

Pentaho BI Server clustering and load balancing

Why we need cluster of BI server?

Pentaho’s BI Server or BA platform allows you to access business data in the form of dashboards, reports or OLAP cubes via a convenient web interface. But when we have to serve it on large scale and lots of users accessing it at same time then it comes to an end. We can resolve this problem by serving pentaho requests from multiple BI servers using clustering. Since pentaho BI server doesn’t support session sharing we have to do some modification to Pentaho APIs. Please following steps to server BI server requests from multiple instances:

Download Pentaho BI server

Version 4.5 can be downloaded from:

http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/4.5.0-stable/
Note: We have considered two machines having BI server instances running. We have also considered IP address as Server1_IP and Server2_IP. Configuration related to BI server will be done on both servers. Apache server will be configured only on parent server Server1.

Configure Apache web server

1. Install Apache2 with mod_proxy using following command:

sudo apt-get install apache2 libapache2-mod-proxy-html libxml2-dev

NOTE: Please use OS specific command to install apache2 and mod_proxy.

2. Edit virtual host configuration and add following configuration in /etc/apache2/sites-enabled/000-default:


<VirtualHost *:443>

DocumentRoot /var/www/pentaho
 ServerName mypentaho.com
 ServerAlias mypentaho.com

LogFormat "%h %l %u %t \"%r\" %>s %b %D %{BALANCER_WORKER_ROUTE}e"
 ErrorLog /var/log/apache2/pentaho_ssl_error_log
 TransferLog /var/log/apache2/pentaho_ssl_access_log
 LogLevel warn

# SSL Config
 SSLEngine on
 SSLCertificateFile /etc/apache2/ssl/server.crt
 SSLCertificateKeyFile /etc/apache2/ssl/server.key
 Header unset 'X-Powered-By'

ProxyPreserveHost on
 SSLProxyEngine On

<Location "/reports">
 SecRuleRemoveById 950120
 ProxyPass balancer://reports
 ProxyPassReverseCookiePath / /reports

</Location>

<Location "/pentaho-style">
 SecRuleEngine Off
 ProxyPass balancer://pentahostyle
 ProxyPassReverseCookiePath / /reports
 </Location>

</VirtualHost>

<Proxy balancer://reports>

ProxySet lbmethod=bybusyness stickysession=JSESSIONID|jsessionid scolonpathdelim=On timeout=1800 failonstatus=502,503

BalancerMember ajp://Server1_IP:8009/reports loadfactor=10 route=worker1 connectiontimeout=1800

BalancerMember ajp://Server2_IP:8009/reports loadfactor=10 route=worker2 connectiontimeout=1800

</Proxy>

<Proxy balancer://pentahostyle>

ProxySet lbmethod=bybusyness stickysession=JSESSIONID|jsessionid scolonpathdelim=On timeout=10 failonstatus=502,503

BalancerMember ajp://Server1_IP:8009/pentaho-style loadfactor=10 route=worker1 connectiontimeout=1800

BalancerMember ajp://Server2_IP:18009/pentaho-style loadfactor=10 route=worker2 connectiontimeout=1800

</Proxy>

 

Note: Sticky session has been used.

 

Configure tomcat servers

Create two instances of BI server. Create cluster for each server. Modify conf/server.xml in tomcat directory of BI server to modify <engine>:

1. Add jvmRoute to <engine> tag on both servers.


<Engine name="Catalina" defaultHost="localhost" jvmRoute="worker1">

Note: jvmRoute value should be same as defined in config file and different on both servers.

2. Add following lines inside <engine> tag:

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8">

<Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/>

<Channel className="org.apache.catalina.tribes.group.GroupChannel">

<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">

<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>

</Sender>

<Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4000" autoBind="100" selectorTimeout="5000" maxThreads="6"/>

<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>

<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>

<Interceptor className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">

<Member className="org.apache.catalina.tribes.membership.StaticMember" securePort="-1" host="Server2_IP" port="4000" uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10}"/>

</Interceptor>

</Channel>

<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/>

<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>

<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>

</Cluster>

Note: We have used unicast to send and receive message for session sharing.

Note: We have considered server Server1_IP as parent server. So database will be used from this server and both servers will be using same database.
3. Modify /biserver-ce/tomcat/webapps/reports/WEB-INF/ web.xml and add following code in the end of file in <web-app> tag:

</distributable>

4. Modify biserver-ce/tomcat/conf/context.xml and comment <Manager> tag.
Configure Pentaho system

1. Modify pentaho-solutions/system/application-spring- security.xml on server Server2_IP to point to database on parent server and add following lines:


<property name="url" value="jdbc:mysql://Server1_IP:3306/db_name" />

<property name="username" value="username" />

<property name="password" value="password" />

2. Replace following pentaho jars with following modified jars in tomcat/webapps/reports/WEB-INF/lib/:

pentaho-actionsequence-dom-2.3.4.jar
pentaho-bi-platform-api-4.5.0-stable.jar
pentaho-bi-platform-engine-core-4.5.0-stable.jar
pentaho-bi-platform-engine-services-4.5.0-stable.jar
pentaho-bi-platform-plugin-actions-4.5.0-stable.jar
pentaho-bi-platform-repository-4.5.0-stable.jar
pentaho-bi-platform-web-4.5.0-stable.jar
pentaho-bi-platform-util-4.5.0-stable.jar

NOTE: Theses jar are modified to make code serializable and support distributable application.

3. Replace following Saiku jars with modified jars in pentaho-solutions/system/saiku-adhoc/lib/:

saiku-adhoc-core-1.0-GA.jar

NOTE: Saiku plugin uses runtime metamodel creation and store models in memory. An error may appear while creating saiku adhoc custom report. We have modified error message appear during editing.
Sync pentaho directories using Rsysnc

Create rsync configuration using crontab to sync custom reports. Use following commands to install cron expression:

Configure SSH for passwordless access for the server using following commands:

ssh-keygen -t dsa

Note: Use of default file to store key is recommended.
ssh-copy-id -i ~/.ssh/id_dsa.pub user@Server2_IP

Open crontab using:

crontab -e

Add following code in the file and save it:


*/1 * * * * rsync -avuz --log-file=path/to/log/file/cron.log --include=pentaho_* --exclude="*" path/to/parent/pentaho/server/biserver-ce-4.5.0-stable/biserver-ce/pentaho-solutions/ user@Server2_IP:path/to/remote/pentaho/server/biserver-ce-4.5.0-stable/biserver-ce/pentaho-solutions/

*/1 * * * * rsync -avuz –log-file=path/to/log/file/cron.log --include="pentaho_*/**" --exclude="*" user@Server2_IP:path/to/remote/pentaho/server/biserver-ce-4.5.0-stable/biserver-ce/pentaho-solutions/ path/to/parent/pentaho/server/biserver-ce-4.5.0-stable/biserver-ce/pentaho-solutions/

NOTE: This expression will sync custom reports for all tenants in pentaho-solutions directory on both servers.

Restart tomcat servers and apache web server.

Restart apache2 server using following command:

Service apache2 restart

 

Hope this may help you. Please let me know if you have any query.

Install Oracle Sun JDK in Ubuntu and set JAVA_HOME

To install JDK from Oracle use following steps:

1. Download JDK from Oracle web site:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

2. Extract file will be of .bin format. For example – for 64-bit system we have jdk-6u37-linux-x64.bin.

3. Put this bin file in your home directory.

4. Now need to extract this file to use Java API.

5. Extract BIN file:

  1. Make BIN file executable using this command: chmod +x jdk***.bin
  2. For executing use: ./jdk***.bin

6. Same named folder will be created in the same directory.

7. Now need to provide JDK path to JAVA_HOME variable in .bashrc file.

8. Edit .bashrc file in editor use following command:

gedit .bashrc

9. Now put this code in your .bashrc file:

export JAVA_HOME=/home/jdk***
    export PATH=$JAVA_HOME/bin:$PATH

10. Save file.

11. To refresh your bash settings write this command and hit enter:

bash

12. To verify configured JDK path use this command:

which java

This will display your configured path i.e. /home/jdk***/bin/java

Install Java in Chrome, Chromium or Firefox in Ubuntu

To enable Java plugin in your Linux browsers, just copy these lines into a script, and run it!

JAVA_HOME=/usr/lib/jvm/jdk1.7.0  or the path where you have installed java

MOZILLA_HOME=~/.mozilla mkdir$MOZILLA_HOME/plugins

For 32-bit systems :

ln -s $JAVA_HOME/jre/lib/i386/libnpjp2.so $MOZILLA_HOME/plugins

For 64-bit systems:

ln -s $JAVA_HOME/jre/lib/amd64/libnpjp2.so $MOZILLA_HOME/plugins

For running a script in linux use following commands:

1. Open terminal.

2. Go to the directory where you have created script. (Note: your file must have extension .sh. For example: file_name.sh )

3. Write following commands:  chmod +x <file_name>.sh

4. Now write ./<file_name>.sh

5. Restart your browser.