I ran into a problem the other day where I had an enormous table (several million rows) where I needed to eliminate duplicate titles. However, I wanted to preserve other information like the id (i.e. primary key). So without resorting to a PHP or perl script, I wanted to everything natively in the database for speed. Here's what I did.
First, I created a new table to hold the information. I made certain that the table was built similar to the original table. However, I placed a "unique" constraint on the "title" column to eliminate duplicates. Next, I employed the INSERT IGNORE INTO <TABLE_NAME> (col1, col2, etc.) SELECT (col1, col2, etc.) FROM <OLD_TABLE> query. This query is fairly efficient and helped eliminate the redundant rows with the duplicate titles and allowing me to preserve the other column information. Note that this query only works if other rows that have the same title are irrelevant to your application. In my case, I didn't need them.
Trackbacks: (Trackback URL)