Comments | iluvsheep: A lesson on overthinking

iluvsheep

A lesson on overthinking

Nov 07, 2007 14:58

I recently came across a random text file that I had written earlier in the year. After a little inspection, it appears to contain two things: 1) The pseudo-code for two very different ways of determining which 3 numbers in a list (confusingly referred to as "string") sum to 0, and 2) A poignant lesson on the dangers of unnecessary complication.

( Read more... )

Comments 14

arisrabkin November 7 2007, 21:10:33 UTC

I think there's an O(N) /O(N^2) algorithm:

Make a hashtable h, sized for n elements.
Hash each number and stick it in the bucket.
For each pair of numbers, add them, and and stick them in the corresponding bucket.

Scan through the table
for each bucket:
Let x be the label of the bucket
If there's a number or pair in the bucket, look at the bucket for -x: if there's a pair or number [respectively]
then they sum to zero, return true.

If no matches, fail.

iluvsheep November 8 2007, 00:29:10 UTC

I think this was sort of what I was originally trying with the first crazy bit of code above. Looking at it again, perhaps I was on the right track and just didn't get it quite right.

I'm not sure if your solution is quite correct yet, say I gave you a list of positive numbers and two zeroes. You would hash the pair of both zeroes as summing to 0, and each 0 itself would hash into the single 0 value. Then you would find that there is a singleton 0 that matched a pair summing to 0, specifically (0,0), even though there aren't, in fact, three separate entries that sum to 0.

Also, for what it's worth, I think it takes more than O(N) memory. Say I gave you a list of [0,1,2,4,8,16]. Those 6 numbers would result in pairs summing to 15 different values. I think the memory usage is actually O(N^2).

arisrabkin November 8 2007, 00:41:13 UTC

*thinks*.

I see that I have N^2 elements, and probably need an O(N^2) size hash table. that's probably okay; also, I may be able to pick a better data structure. Something treeish, for instance.

And yes, I need to do something to detect duplicates. The things that go in the buckets are probably not the numbers themselves, but number+index in input string. That lets me detect duplicates.

jibb November 8 2007, 08:35:14 UTC

Hmm... But you don't really need to put the pair-sums in the hash table. If you have a hash table keyed by number containing the index(es) of the number in the list, then go through all pairs of i and j, checking if hashtable[-(list[i] + list[j])] contains an index other than i and j, you should be able to get an answer in O(n^2) time and O(n) space. And now no trouble with duplicates.

def triplets(nums):
indexes = {}
for index, num in enumerate(nums):
if num not in indexes:
indexes[num] = []
indexes[num].append(index)
for index_a, a in enumerate(nums):
for offset_b, b in enumerate(nums[index_a+1:]):
index_b = index_a + 1 + offset_b
c = -(a+b)
if c not in indexes:
continue
for index_c in indexes[c]:
if index_c not in (index_a, index_b):
return ((a, b, c), (index_a, index_b, index_c))
return None

A bit of lazy testing seems to indicate it works.

Thread 7

zwilichkl November 7 2007, 21:32:38 UTC

That looks like gibberish to me, but I like the icon!

iluvsheep November 8 2007, 03:50:09 UTC

The icon is quite awesome. You can tell that he really is going to fix it too!

derakon November 7 2007, 22:03:38 UTC

Are you assuming that every value in the string is unique? If they aren't, then your second solution fails if there's three zeroes in the list. :)

You can also limit your iteration; one iteration over the entire loop, with an inner loop iterating from the current index to the end, and an inner inner loop doing the same. Addition is commutative, so order is unimportant; hence, there's no need to iterate over everything.

Though, this is after-the-fact optimization. Correctness is definitely more important than cleverness or speed in a first-pass solution.

iluvsheep November 7 2007, 23:54:12 UTC

The values in the list certainly don't need to be unique. However, I am fairly confident that the second simpler solution (but not the first convoluted one) will correctly detect those.

It's true. The loops do a lot of repeated work. For an array of N numbers they could go:
i: 0->N
j: i->N
k: j->N

But that doesn't actually affect the asymptotic run-time analysis.

derakon November 8 2007, 01:06:27 UTC

Asymptotically, no, but it's a pretty significant constant factor improvement. Big-O analysis is useful, but it can be overused too.

And will your second solution catch three zeroes? It'll never reach the return statement, because the values are the same; it always ends up skipping whenever it encounters two zeroes.

iluvsheep November 8 2007, 03:46:53 UTC

i, j and k above are indexes into the array, not elements. The element values themselves are $string[i], $string[j], and $string[k], respectively.

Thread 5