Thursday, February 2, 2017

limited size priorityqueue implementation using core JAVA and Guava

Example of implement limited size priorityqueue using Guava's  MinMaxPriorityQueue

For example, Let’s say if I want to keep track of the top 10 most frequent elements. The following would be the implementation if I use Java’s native PriorityQueue.
//using Java's priorityQueue
public List<Integer> getTopKFrequent(int[] nums, int k) {
  List<Integer> result = new ArrayList<>();
  List<Integer> distinctList = new ArrayList<>(); //distinct numbers
  HashMap<Integer, Integer> map = new HashMap<Integer, Integer>(); //number, frequency
  for(int num: nums){
      if(map.containsKey(num)){
          map.put(num, map.get(num)+1);
      }else{
          map.put(num, 1);
          distinctList.add(num);
      }
  }
  //PriorityQuueue with custom comparator,  with the head of queue being the element with highest frequency.
  PriorityQueue<Integer> maxHeap = new PriorityQueue<>((a,b) -> (map.get(b)-map.get(a)));//offer first k from distinct number list
  for(int i =0; i<distinctList.size(); i++){
      maxHeap.offer(distinctList.get(i));
  }
  for(int i=0; i<k; i++){
      result.add(maxHeap.poll());
  }
  return result;
}

the priorityqueue will hold N elements ( n being the number of distinct elements in the list). it can get very memory intensive when N gets big (let’s say 200k items).  
so, We currently have two options if we need to save on memory.

Method 1: use minHeap. we add the elements to the queue until the size of queue reaches k, then we can compare each of the rest of the  elements with the head of queue. ( see code_snippet_2 below)
Method 2: use Guava’s minMaxQueue , which can be configured with a maximum size. each time the size of queue exceeds the configured maximum size, the queue automatically removes its greatest element according to its comparator.  ( see code_snippet_3 below)

code_snippet_2
//using Java's priorityQueue -minHeap
public List<Integer> getTopKFrequent(int[] nums, int k) {
  List<Integer> result = new ArrayList<>();
  List<Integer> distinctList = new ArrayList<>(); //distinct numbers
  HashMap<Integer, Integer> map = new HashMap<Integer, Integer>(); //number, frequency
  for(int num: nums){
      if(map.containsKey(num)){
          map.put(num, map.get(num)+1);
      }else{
          map.put(num, 1);
          distinctList.add(num);
      }
  }
  //PriorityQuueue with custom comparator,  with the head of queue being the element with lowest frequency.
  PriorityQueue<Integer> minHeap = new PriorityQueue<>((a, b) -> (map.get(a)-map.get(b)));//offer first k from distinct number list
  int index = 0;
  for(int i =0; i<k; i++){
      minHeap.offer(distinctList.get(i));
      index++;
  }
  while(index < distinctList.size()){
      if(map.get(minHeap.peek())< map.get(distinctList.get(index))){
          minHeap.poll();
          minHeap.offer(distinctList.get(index));
      }
      index ++;
  }
  //add to the result in reverse order
  for(int i=0; i<k; i++){
      result.add(0, minHeap.poll());
  }
  return result;
}

code_snippet_3
//using Guava's minMaxPriorityQueue
public List<Integer> getTopKFrequent(int[] nums, int k) {
  List<Integer> result = new ArrayList<>();
  List<Integer> temp = new ArrayList<>(); //distinct numbers
  HashMap<Integer, Integer> map = new HashMap<>(); //number, frequency
  for(int num: nums){
      if(map.containsKey(num)){
          map.put(num, map.get(num)+1);
      }else{
          map.put(num, 1);
          temp.add(num);
      }
  }
//Guava's MinMaxPriorityQueue
  MinMaxPriorityQueue<Integer> maxHeap = MinMaxPriorityQueue.orderedBy(new Comparator<Integer>(){
      public int compare(Integer a, Integer b){
          return Integer.compare(map.get(b), map.get(a));
      }
  }).maximumSize(k).create();
//offer all elements, MinMaxPriorityQueue will envict based on comparators
  for(int i =0; i<temp.size(); i++){
      maxHeap.add(temp.get(i));
  }
  //add to the result
  for(int i=0; i<k; i++){
      result.add(maxHeap.poll());
  }
  return result;
}

At my test, when the number of distinct element is 7, the running time using  minHeap is 38ms, while the running time using Guava’s MinMaxPriorityQueue is 8 ms.