Wednesday, 26 February 2014

Duplicate Table

Has this ever happened to you?  You are doing left joins to tables and somehow your most recent table has more records than it should.  What happened?  You were careful to ensure your ID field is unique, but for some unexplained reason you have too many records.  The best way to solve the problem is to identify if your table has duplicates and what those duplicates are.  I like to create a sub-table that lists all of the fields of every record that is part of a duplicate.  The code below will quickly create a table called "dups" that contains every record that is part of a duplicate and sort it by the field in question.

Create a table of duplicate records


%Let TableName = TableName;
%Let FieldName = FieldName;

proc sql;
      create table dups as
      select *
      from &TableName
      where ((&FieldName in(select &FieldName from &TableName as tmp
      group by &FieldName having Count(*)>1)))
      order by &FieldName
      ;
Quit;


This query uses the in statement to query against itself and pulls all fields from all records where your field name occurs more than once.  The limitation of this query is that it assumes you are dealing with one field that is supposed to be unique.

Macro Names

Use the %Let function to identify your table name and your field name so you don't have to type the table and field names multiple times


TableName is the table you think has duplicate records
FieldName is the field that is supposed to be unique

Usage

I keep this code in a macro abbreviation (see previous blog) under the shortcut text “dups”.  This way it is always very easy to do a quick duplicate check.  Use this code to identify the records where duplicate occur.  Usually once you see the value of all of the fields that are part of a duplicate is becomes obvious why the duplicates occur.  I like to use this code to confirm there are no duplicates in my tables.  Run the code and check the log screen to ensure the number of records for the dups is 0.  It takes less than a minute and it is a good check.



I would love to hear how you check for duplicates in your tables.

No comments:

Post a Comment