[UFO Chicago] which is faster when dealing with large data sets
Brian Sobolak
brian at planetshwoop.com
Sun Nov 15 18:50:39 PST 2015
I have a list of 4400 elements. I have to find those items in a list of
4M elements. It's not deterministic, ie, I won't have keys that match
exactly, but have to use simple checks (e.g. do the first 3 characters
match? Do I find 2 words from from X in the other set Y?) etc.
I'm using awk, of course.
I'm sure the precise answer depends on the exact operations I choose, but
is it generally faster to compare 4400 vs 4M, or 4M vs 4400? I'm guessing
which one you start with matters.
brian
--
More information about the ufo
mailing list