Introduction
Most of the times, we use primary key or unique key for preventing insertion of duplicate rows in SQL Server. But if we don't use these keys, then it's obvious that duplicate rows could be entered by the user. After inserting duplicate rows into table, it becomes a major issue to delete those duplicate rows. In that time, we need to delete those duplicate rows to resolve the issue. So this topic will help us to delete those duplicate rows from the specific table.
Background
I used some basic T-SQL code to accomplish the target. So you don't need to worry to understand this code.
Problem
Firstly, we will create a table, where we will insert some duplicate rows to understand the topic properly. Create a table called
ATTENDANCE
by using the following code:
Hide Copy Code
CREATE TABLE [dbo].[ATTENDANCE](
[EMPLOYEE_ID] [varchar](50) NOT NULL,
[ATTENDANCE_DATE] [date] NOT NULL
) ON [PRIMARY]
Now insert some data into this table.
Hide Copy Code
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A001',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A001',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
('A003',CONVERT(DATETIME,'01-01-11',5))
After inserting the data, check the data of the below table. If we grouped the
employee_id
andattendance_date
, then A001
and A002
become duplicates.EMPLOYEE_ID | ATTENDANCE_DATE |
A001 | 2011-01-01 |
A001 | 2011-01-01 |
A002 | 2011-01-01 |
A002 | 2011-01-01 |
A002 | 2011-01-01 |
A003 | 2011-01-01 |
So how can we delete those duplicate data?
Solution
First, insert an
identity
column in that table by using the following code:
Hide Copy Code
ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)
Now the table data will be like the following table:
EMPLOYEE_ID | ATTENDANCE_DATE | AUTOID |
A001 | 2011-01-01 | 1 |
A001 | 2011-01-01 | 2 |
A002 | 2011-01-01 | 3 |
A002 | 2011-01-01 | 4 |
A002 | 2011-01-01 | 5 |
A003 | 2011-01-01 | 6 |
Check the
AUTOID
column. Now we will start playing the game with this column.
Now use the following code to find out the duplicate rows that exist in the table.
Hide Copy Code
SELECT * FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)
The above code will give us the following result:
EMPLOYEE_ID | ATTENDANCE_DATE | AUTOID |
A001 | 2011-01-01 | 2 |
A002 | 2011-01-01 | 4 |
A002 | 2011-01-01 | 5 |
Ultimately, these are the duplicate rows which we want to delete to resolve the issue. Use the following code to resolve it.
Hide Copy Code
DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)
Now check the data. No duplicate rows exist in the table.
No comments:
Post a Comment