[UFO Chicago] which is faster when dealing with large data sets
jay at m5.chicago.il.us
jay at m5.chicago.il.us
Sun Nov 15 19:03:45 PST 2015
Centuries ago, Nostradamus predicted that Brian Sobolak would write on Sun Nov 15 20:50:39 2015:
>
> I have a list of 4400 elements. I have to find those items in a list of
> 4M elements. It's not deterministic, ie, I won't have keys that match
> exactly, but have to use simple checks (e.g. do the first 3 characters
> match? Do I find 2 words from from X in the other set Y?) etc.
>
> I'm using awk, of course.
>
> I'm sure the precise answer depends on the exact operations I choose, but
> is it generally faster to compare 4400 vs 4M, or 4M vs 4400? I'm guessing
> which one you start with matters.
>
> brian
>
Beats heck out of me, but it's an interesting question, so if you get
a plausible answer off-list please share it with the rest of us.
Jay F. Shachter
6424 N Whipple St
Chicago IL 60645-4111
(1-773)7613784 landline
(1-410)9964737 GoogleVoice
jay at m5.chicago.il.us
http://m5.chicago.il.us
"Quidquid latine dictum sit, altum videtur"
More information about the ufo
mailing list