CS220, The City College of New York May 30th, 2014
Introduction
The problems being studied in this paper is from the family of knapsack problems, the fractional and 0/1 knapsack problem. In the fraction knapsack problem we are given a knapsack with capacity M, and a set N of n objects. Each object i has a given weight wi and a prot pi. The goal is to pack these objects into a knapsack, such that the prot of the objects in the knapsack is maximized, while the total weight does not exceed the total capacity, M. In the fractional knapsack problem you choose a fraction xi, 0 xi 1, of the object i and place it into the knapsack, then a profit pi * xi is earned. If we limit the xi to only 1 or 0 (take it or leave it), this results in the 0/1 Knapsack problem.
Fractional Knapsack-- Greedy Method
The fractional knapsack problem can be solved using the greedy method because we will work in stages, considering one input at a time. At each stage, we will consider one input and decided whether it will be included to form an optimal solution. If including this input results in an infeasible solution, then this input is useless. Otherwise, it is added to the partial optimal solution being built.
We first compute the ratio ri of the profit to the weight, (pi/wi), for each item and then sort the ratios so that pi/wi pi+1/wi+1 for 1 i n - 1. Afterward, we greedily take objects in this order and add it to the knapsack as long as adding it does not exceed the capacity of the knapsack. If the weight of the object exceeds the remaining knapsack capacity, take a fraction of the object.
An example that performs this greedy algorithm is shown below: Find an optimal solution to the knapsack instance n = 7, M = 15, (p1, p2, , p7) = (10, 5, 15, 7, 6, 18, 3) and (w1, w2, , w7) = (2, 3, 5, 7, 1, 4, 1)?
Step 2. Sort the ratios in non-increasing order Object 5 1 6 3 7 2 4 pi 6 10 18 15 3 5 7 wi 1 2 4 5 1 3 7 (pi/wi) 6.0 5.0 4.5 3.0 3.0 1.67 1.0 Step 3. Greedily add objects to knapsack while the weight does not exceed capacity. Objects 5, 1, 6, 3 and 7 are included in the knapsack and this adds up to the weight of 13. The next object we consider is object 2, which has the weight of 3. But, if we include object 2 entirely, the sum of weights will be 1 + 2 + 4 + 5 + 1 + 3 = 16 and it exceeds the capacity of the knapsack, which is only 15. We have only 2 weight units left to fill the knapsack. Therefore, the fraction, (2/3), of object 2 is included in the knapsack to fill its capacity. The profit obtained is 6 + 10 + 18 + 15 + 3 + 5(2/3) = 55.33.
0/1 Knapsack
The 0/1 knapsack problem has been proven to be NP-complete, meaning there are no known algorithms that can solve it in polynomial time, and it is probably (but no proven) that none can possible exist. So, were left to tinker with algorithms that are slower than polynomial time. The nave solution to this problem is a brute-force approach in which every combination of every object being chosen or not chosen is tested, and the combination with the maximum value as well as a weight that does not exceed the capacity of the knapsack is chosen as the correct answer. The problem with this algorithm, of course, is that it runs in exponential time. For n objects, there are 2 n ways to choose them. This results in an exponential algorithmic complexity, which, for all intents and purposes, is bad. We studied and implemented two algorithms that improve on the nave approach using two well- known problem-solving patterns: dynamic programming and backtracking.
Dynamic Programming
To solve this problem we start with a two dimensional array: X(i, W) = 2D array, where i, which is the row, represents the number of items and W, which is the column, represents the weight of the knapsack. i ranges from the 0 to N and W ranges from 0 to W. In this 2D array, at any position X(i, W) represent how many objects and which objects you can take up to that position of W. If we have no objects to choose, then X(0, w) = 0. If we capacity of knapsack is 0, then X(i, 0) = 0. To populate the 2D array, we use the following recurrence relation:
If the weight of the knapsack is less than weight of the i th object then that object cannot be included in the knapsack. So that position will be equal to the value for when an object is taken or when no object is taken. If the weight of the knapsack is greater than the weight of the i th object, then the value of that position depends on maximum of the two quantities:
1. If the i th object we are checking does not help us in increasing our total profit, then we reject that object and continue to check other remaining objects.
2. If the i th object increases our total profit then we add the value of the i th object with the remaining objects and the remaining weight.
An illustration of this dynamic approach is shown below:
Find an optimal solution to the knapsack instance n = 7, M = 15, (p1, p2, , p7) = (10, 5, 15, 7, 6, 18, 3) and (w1, w2, , w7) = (2, 3, 5, 7, 1, 4, 1)? The 2D array is shown below. When we have 0 items to put in the knapsack or the weight of the knapsack is zero then X(0, w) and X(i, 0) is 0.
X(1, 1): Wi = 2 > w = 1, so X(1, 1) = X(0, 1) = 0 X(1, 2): Wi = 2 <= w = 2, so X(1, 2) = Max( X(0,2), Pi + X(0, 1) ) = Max (0, 10) = 10 X(1, 3): Wi = 2 <= w = 3, so X(1, 3) = Max(X(0,3), Pi + X(0,2) ) = Max (0,10) = 10 (same for the entire row).
X(2, 1): Wi = 3 > w = 1, so X(2, 1) = X(1, 1) = 0 X(2, 2): Wi = 3 > w = 2, so X(2, 2) = X(1, 2) = 10 X(2, 3): Wi = 3 <= w = 3, so X(2, 3) = Max( X(1, 3), Pi + X(1, 0) ) = Max(10, 5) = 10 X(2, 4): Wi = 3 <= w = 4, so X(2, 4) = Max( X(1, 4), Pi + X(1, 1) ) = Max(10, 5) = 10 X(2, 5): Wi = 3 <= w = 5, so X(2, 5) = Max( X(1, 5), Pi + X(1, 2) ) = Max(10, 15) = 15 (same for the entire row).
This process can be continued for the entire table but will take forever to do it on paper. We used python programming to generate of the results and we verified on paper as well. The final result is stored in the last column and last row, X[7][15] which is 54. To find out which objects are included in the final result we can have another 2D array, call it Y, of same size as the previous array, which will have binary results, 0 or 1. 1 indicates that the object will be included in the knapsack and 0 indicates that it will not be included. For an object to included it must be able to fit in the knapsack must have higher profit than remaining objects. The keep array for the previous example is shown below: w # Objects 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 3 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 5 0 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 6 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 7 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0
Once the keep array is generated, you can backtrack from the lower right corner of the array, Y[7][15]. If that position is 0, then we decrement n by 1 (n-1) and then look at that position. If that position is 1 then we know that coin has been taken. Next, we decrement n 1 and then also decrement w by w wi of the object on that index. Object included are the following: Object 6 with profit 18, Object 5 with profit 6, Object 3 with profit 15, Object 2 with profit 5, Object 1 with profit 10.
Dynamic programming was a useful technique of solving this certain kind of problem. The running time of dynamic programming algorithm vs. nave algorithm for the 0/1 Knapsack problem is O(nM) vs. O(2 n ), and the improvement is from exponential to polynomial.
Backtracking
Backtracking improves on the nave approach in which all combinations are tested by avoiding considering combinations that, at some point in the problem, can be determined to be infeasible. For example, if a programmer wants to solve the problem with a knapsack of capacity 10, and one of the candidate items has a weight of 15, it is clear that that item will not appear in the solution, and neither will any combination of items that includes that item. So, we can ignore those combinations to improve the running time of the algorithm. Backtracking achieves this optimization by performing a depth-first-search on an implicit state-space tree. As the algorithm is searching, some logic defines whether or not the current node can lead to a feasible solution. If it is determined that the node does not form a feasible solution (for example if the total weight of objects chosen exceeds the capacity) it is ignored (pruned from the state space tree). All the leaves of the space state tree represent possible solutions. The best solution is the leaf with the largest value.
Two factors determine whether or not a node in the space-state tree is feasible. The first is trivial: if the total weight of the objects considered so far exceeds the capacity of the bag, the solution up to that point cannot be included and the tree is pruned at that node. The second is more abstract: we calculate an upper bound for the total value that the objects can have including the solution so far and all possible child nodes under the current node. If the potential profit that we calculated for the sub-tree of which the current node is the root plus the current total profit up until that point is less than the maximum profit we have observed so far, there is no point in continuing the search after that node and tree is therefore pruned at that node. The big problem, then, is how to efficient calculate that upper bound for pruning purposes. For the upper bound we used the maximum profit as calculated by our greedy algorithm for the fractional knapsack problem. This quickly calculates a maximum profit that cannot possibly be exceeded in the 0/1 knapsack problem for that node and therefore functions as a good upper bound.
Conclusion
The greedy algorithm that solves the fractional knapsack problem is relatively trivial, and it is intuitiveif we are allowed to take fractions of objects, we have the opportunity to fill 100% of the knapsack as long as there are enough objects to do so. So, we fill it with the most expensive objects until a full object can no longer fit, at which point we put the maximum possible fraction of the most expensive object that is not yet used. Unfortunately, greedy approaches to the 0/1 problem are not guaranteed to be correct. In an application where runtime is more important that accuracy, it may be acceptable to use the greedy approach as an approximation. We have also already seen that the solution to the fraction knapsack problem can serve as an upper bound when testing whether a node is promising in the backtracking algorithm. The runtime for the greedy solution to the fractional knapsack problem is.
The dynamic programming algorithm has a complexity of O(nM), where M is the capacity of the knapsack. This is pseudo-polynomial time. Because a pseudo- polynomial time algorithm exists, the knapsack problem is considered weakly NP- complete. The dynamic programming algorithm is good for finding solutions to instances of the knapsack problem where the capacity is small, since the algorithms is exponential relative to the length of M, the capacity. The dynamic programming algorithm can basically be thought of as the nave recursive top-to-bottom approach converted to an iterative bottom-up approach where sub-problems are memoized.
It is difficult to accurately predict the runtime of the backtracking algorithm because it depends on the input, since non-promising solutions are not considered. The worst case, assuming the profits and weights are already sorted (the greedy algorithm which we use to determine whether a node is promising relies on sorted input), is O(2 n ), because the entire state-space tree has 2 n leaves. However, because the algorithm prunes infeasible solutions, the runtime is actually much less in the average case. In order to remove the requirement of a sorted input, our algorithm sorts the input, which adds to the run time. We used the sort built-in to Python which is an implementation of TimSort, and therefore has an average case complexity of O(nlogn). This is irrelevant to the prediction because O(2 n ) is much larger.
In this exercise we compared the running time of the dynamic programming and the backtracking algorithms by measuring execution time using the system clock. This approach gives us a rough idea of how the algorithms perform relative to each other. Of course, using this approach tells us nothing about why one algorithm performs better than anotheris one algorithm more efficient, or does one of them contain a bug that unreasonably inflates run time? Assuming our code does not have anomalies, which cause runtimes to be abnormally slow, this is an insightful comparison because both algorithms have advantages and disadvantages. The dynamic programming algorithm requires no sorting, and it does not make recursive function calls. On the other hand, it is heavily dependent on the capacity of the knapsack. The backtracking algorithms require sorting and makes recursive calls, but pruning of the state-space tree makes the worst case rare.
To make a fair comparison, we ran both algorithms and a few problem sizes (10, 50, 100, 200, 500, 1000, 2000) and plotted the running time. Because one of the dimensions of the weight matrix calculated by dynamic programming algorithm is bounded by the capacity of the knapsack, that algorithms running time depends on the capacity of the knapsack in the problem. Because of this, we also tested the algorithms on different capacities for each problem size (0.25n, 0.5n, 1n, 2n, 4n, where n is the problem size). The profits and weights for each problem are generated randomly: all profits are random numbers between 10 and 100, and all weights are random numbers between 2 and 10.
At first glance, our tests show that the backtracking algorithm performs better (the generated plot is attached on the last page of this report). And indeed, in most cases, it does. However, in theory the dynamic programming should be faster for knapsacks with small capacities. Though it is difficult to see evidence of this on our plot, skimming the program output shows several cases where, when the capacity is small, the dynamic programming algorithm completes faster. In the future, it is worth tweaking the test sizes and capacities to obtain a more definitive picture of how well the two algorithms perform relative to each other.