c#: Linq/Lamba remove duplicates from list

2 comments
I have a list as a result of a join, and it contains duplicates.

private IEnumerable<Sak> LagListeMedSaker(IEnumerable<KorrespondanseInfo> korr, IEnumerable<Saksinfo> liste)
        {

            var result = from l in liste
                         join k in korr

                     on l.Saksnummer equals k.Saksnummer
                         select new Sak
                         {
                             Saksnr = l.Saksnummer,
                             Tag = l.Tag
                         };
            return result;

        }

The list korr can contain duplicates of the same Saksnummer, even though the korr liste entries are unique in it self, i.e they contain other properties that are different.

Now I want to remove duplicates of the result list.
I wanted to use .Distinct something like this:

            //remove duplicates, not working (nor compiling..)
            result= result.Distinct(t=>t.Saksnr);

but Distinct doesn't support Funcs, and rather requires you to create an implentation of IEqualityComparer.

Btw the default .Distinct() without parameters won't work correctly for your custom objects unless you provide correct
for more info if interested.

So I'm feeling lazy, and want to do this in just a line or two, what are the options?

I went with this solution someone posted at stackoverflow (lost the link sorry):

                result= result.GroupBy(sak => sak.Saksnr).Select(y => y.First());

I really like this as it is really fast to implement and change according to your needs.

You can also group on multiple values by using an anonymous type, i.e:

            result = result.GroupBy(sak => new { sak.Saksnr, sak.Tag }).Select(y => y.First());

See for example http://www.devcurry.com/2009/02/groupby-multiple-values-in-linq.html for full example of grouping on multiple values.

PS. I found another post using this kind of solution on stackoverflow, this contains both these concepts:
(see post from David B)

PSPS. One could use HashSet<T> to get unique entries but that requires mostly the same as when you use .Distinct(), i.e either provide IEqualityComparer or override GetHashCode and Equals.

PSPSPS. Yes, I should have removed the duplicates before the join, …sorry :)

2 comments :