[UFO Chicago] which is faster when dealing with large data sets

Brian Sobolak brian at planetshwoop.com
Sun Nov 15 18:50:39 PST 2015


I have a list of 4400 elements.  I have to find those items in a list of
4M elements.  It's not deterministic, ie, I won't have keys that match
exactly, but have to use simple checks (e.g. do the first 3 characters
match?  Do I find 2 words from from X in the other set Y?) etc.

I'm using awk, of course.

I'm sure the precise answer depends on the exact operations I choose, but
is it generally faster to compare 4400 vs 4M, or 4M vs 4400?  I'm guessing
which one you start with matters.

brian

-- 


More information about the ufo mailing list