[UFO Chicago] which is faster when dealing with large data sets

jay at m5.chicago.il.us jay at m5.chicago.il.us
Sun Nov 15 19:03:45 PST 2015


Centuries ago, Nostradamus predicted that Brian Sobolak would write on Sun Nov 15 20:50:39 2015:


> 
> I have a list of 4400 elements.  I have to find those items in a list of
> 4M elements.  It's not deterministic, ie, I won't have keys that match
> exactly, but have to use simple checks (e.g. do the first 3 characters
> match?  Do I find 2 words from from X in the other set Y?) etc.
> 
> I'm using awk, of course.
> 
> I'm sure the precise answer depends on the exact operations I choose, but
> is it generally faster to compare 4400 vs 4M, or 4M vs 4400?  I'm guessing
> which one you start with matters.
> 
> brian
> 

Beats heck out of me, but it's an interesting question, so if you get
a plausible answer off-list please share it with the rest of us.


                        Jay F. Shachter
                        6424 N Whipple St
                        Chicago IL  60645-4111
                                (1-773)7613784   landline
                                (1-410)9964737   GoogleVoice
                                jay at m5.chicago.il.us
                                http://m5.chicago.il.us

                        "Quidquid latine dictum sit, altum videtur"


More information about the ufo mailing list