Algorithmics are Important

We live in the digital era, an era where big companies rely on software to automate and facilitate complex business processes.

Our mission is to deliver solutions that satisfy our client’s needs as efficiently as software developers. Remember, time is money! So the less time it takes, the more money our client saves. And this is where Algorithmics plays a significant role.

Let’s suppose our client deals with extensive lists of items that are currently processed manually by their agents. This has proven to be time-consuming and error-prone, so the company wants to automate it to save time and money for providing better service.

Each item has several attributes like ID, name or price, and a priority. When a batch of items arrives, an item with maximum priority is taken, processed, and then returned to the list with the immediate lower priority found. For example, if 1, 5, 3, 5 were the batch’s priorities, an item with a priority of 5 would be taken, processed, and returned with a priority of 3. This is repeated until all have the same minimum priority.

The complexity of how the items are processed will depend on the actual business logic. We are not expected to understand it fully. Let’s say, a coworker has been assigned to leverage it. What has been given to us is to develop a program that, given a list of items, calculates the total number of times items will be processed until all have the same minimum priority. 

Why is this desirable? Multiplying this number by the average time an item takes to be processed will give us a first estimation of the average total time it would take to process a whole list of production items without actually doing it, which could be more expensive to calculate.

Typically this would be trusted in the hands of the good old and robust Java. So let’s code on!

First, we need an Item object. 


public class Item { 
  
    private int priority; 
  
    public int getPriority() { 
        return priority; 
   } 
  
    public Item setPriority(int priority) { 
        this.priority = priority; 
        return this; 
   } 
}   

It is nothing fancy, just a priority private field with the corresponding getter and setter following the good practices of encapsulation. Of course, this object would have many more members like ID, name, or price, as we said before. For now, this is enough. Later, it will be replaced by the real one in a compatible way. This is called a “mock” object, a temporary simulated one.

Then, our method would probably live inside a helper class containing other useful tools. ProcessUtils sounds suitable.

Now, the method takes a List of Item-s and returns a number. Since this number could be big, let’s use long instead of int as a return type.

public class ProcessUtils { 
  
    public static long countProcesses(List<Item> list) { 
     
   ... 
     
   } 
} 

Let’s take care and not modify the List we are using as input. Since Java passes object variables as a reference, we need to make a copy. Fortunately, we only care about the priority.

List<Item> items = list.stream() 
           .map(item -> new Item().setPriority(item.getPriority())) 
           .collect(Collectors.toList()); 

All setup! Time to implement the actual logic. Well, we can just follow the business logic without processing the Item-s and add a counter somewhere. Right?

int count = 0; 
  
        while(true) { 
            int maxPriority = -1; 
            int maxPriorityIndex = -1; 
            int subMaxPriority = -1; 
  
            for (int i = 0; i < items.size(); i++) { 
                int priority = items.get(i).getPriority(); 
                if (maxPriority < priority) { 
                    subMaxPriority = maxPriority; 
                    maxPriority = priority; 
                    maxPriorityIndex = i; 
               } else if (subMaxPriority < priority && priority < maxPriority) { 
                    subMaxPriority = priority; 
               } 
           } 
  
            if (subMaxPriority == -1) { 
               break; 
           } else { 
                count ++; 
                items.get(maxPriorityIndex).setPriority(subMaxPriority); 
           } 
       } 
  
        return count; 

Let me explain. We loop through the list over and over again and count. As we scan, we update the maximum priority found, where it was found, and the priority immediately below. If the last one is not found, meaning all Item-s have the same priority, we break the loop.

Ok, nice, done. We should test the performance precisely because our method’s purpose is to calculate something faster in advance. We will use JMH (Java Microbenchmark Harness) to build a benchmark.

public class MyBenchmark { 
  
   final static int SIZE = 1000; 
   final static int MAX_PRIORITY = 10; 
   final static int SAMPLES = 10; 
  
   @State(Scope.Benchmark) 
    public static class Data { 
        static Random random = new Random(); 
        static List<List<Item>> samples = new ArrayList<>(SAMPLES); 
  
        static { 
            List<Item> items = new ArrayList<>(SIZE); 
            for (int i=0; i < SAMPLES; i++){ 
                for (int j = 0; j < SIZE; j++) { 
                    items.add(new Item().setPriority(random.nextInt(MAX_PRIORITY))); 
               } 
                samples.add(items); 
           } 
       } 
   } 
  
   @Benchmark 
   @Fork(value = 5, warmups = 2) 
   @Warmup(iterations = 3) 
   @BenchmarkMode(Mode.AverageTime) 
   @OutputTimeUnit(TimeUnit.MILLISECONDS) 
    public void testCountProcesses(Data data, Blackhole blackhole) { 
        for (List<Item> items : data.samples){ 
            blackhole.consume(ProcessUtils.countProcesses(items)); 
       } 
   } 
}

As a data set, we randomly generate 10 List-s of 1000 Item-s each with priorities from 0 to 10. We run the code and discard the result a few times to warm up and then calculate an average execution time of 5 runs in seconds. And here we have the result:

Benchmark                          Mode  	Cnt   	Score    	Error  	   	Units 
MyBenchmark.testCountProcesses     avgt   	25  	17009.468 	± 84.458  	ms/op 
  
17009.468 ±(99.9%) 84.458 ms/op [Average] 
(min, avg, max) = (16856.688, 17009.468, 17195.859), stdev = 112.748 
CI (99.9%): [16925.010, 17093.926] (assumes normal distribution)

What? Come on! 17 seconds per run is like centuries in computational time. But we did everything right! Didn’t we?

To be honest, we jumped straight into coding without meditating on getting what we want with minimum resources or better performance. And we got a not-so-good solution. 

That is the point: algorithmics is essential. Now let’s meditate! Do we need to loop through the List over and over again? Notice we don’t care in which order the Item-s will be processed, so why not first get each priority frequency?

public static long countProcesses(List<Item> items) { 
  
        Map<Integer, Integer> occurrences = new HashMap<>(); 
        items.stream() 
               .map(Item::getPriority) 
               .forEach(i -> { 
                    if(occurrences.containsKey(i)){ 
                        occurrences.put(i, occurrences.get(i)+1); 
                   } else { 
                        occurrences.put(i, 1); 
                   } 
               }); 
                 
               ... 

Again we scan through the List but just once. When a new priority is found, we “write it down” with a frequency of 1; otherwise, we just increment the frequency by 1. Using a map is simple, and we are looping a lot less. We don’t even need to make a copy of the List since we are only reading it!

The remaining logic simplifies: Decrement by one the maximum priority frequency and increment by 1 the priority immediately below until we can no more. But wait! Again, do we need to count it that way? Let’s meditate.

Picture it as a stairway. Each priority value being represented by a step and its frequency by balls on that step. The balls from the highest step are falling one by one to the step below. In the end, all balls will fall the same number of times no matter the order. So let’s just multiply each frequency by the priority’s “height” and sum it all.

       ...    
                 
        int count = 0; 
  
        int index = 0; 
        for (int frequency : occurrences.values()){ 
            count += index*frequency; 
            index++; 
       } 
  
        return count; 
   }

Much simpler that we were about to code, right? And finally, the moment of truth: performance!

Benchmark                        Mode	   Cnt	        Score		Error	  	Units 
MyBenchmark.testCountProcesses   avgt      25   	1.428   	± 0.016  	ms/op 
  
1.428 ±(99.9%) 0.016 ms/op [Average] 
(min, avg, max) = (1.390, 1.428, 1.447), stdev = 0.021 
CI (99.9%): [1.412, 1.444] (assumes normal distribution) 

Around 1.4 milliseconds per run. HUGE difference. Convinced?

Is this the best solution? Who knows? Not me, but I’m sure it will take time to find it and then prove that it is the best, and the last one did a pretty good job. Is it worthy of going deeper? As we said, time is money; it also applies this way. And as software developers, our mission is to deliver solutions that satisfy our client needs as efficiently as possible.

Pump algorithms to that brain! There are many books like Introduction to Algorithms by Thomas H. Corman, online courses, and sites like www.codewars.com or www.codesignal.com where you can find various challenges with various difficulties. You can even compare solutions and post your problems. Solution by solution, your algorithmics grows stronger, it is fun, and you will learn a lot. And if you are into mathematics like me, take a look at projecteuler.net.

You can find the code at https://gitlab.com/tkg-eb-group/articles/algorithmics-are-important

15.5 GiB

Intel® Core™ i7-8550U CPU @ 1.80GHz × 8

AMD® Iceland / Mesa Intel® UHD Graphics 620 (KBL GT2) Ubuntu 20.04.1 LTS – 64-bit

Written by 

 

Let's get to work!

Simply fill out the form and we will get in touch! Your digital solution partner is just a few clicks away!

"*" indicates required fields

This field is for validation purposes and should be left unchanged.