Jump to content
  • 0

Spotting duplicates in Excel file


geogphotos

Question

Somehow I have uploaded a few duplicates over the years. 

 

If I download my data from AIM is there an easy way to spot my duplicate files refs in the Excel file without doing it by eye?

 

Some sort of command that operates down a row of data to spot my duplicate file names?

 

Any ideas?

Link to comment
Share on other sites

20 answers to this question

Recommended Posts

  • 0
12 minutes ago, spacecadet said:

I haven't looked at it in detail, but this

https://analysistabs.com/vba/find-duplicate-values-column/

has a macro for finding duplicate strings.

 

 

Thanks for the link though it looks hard to me. I did mean column not row, and actually I use Numbers not Excel.

 

So in principle it is possible. I found this:

 

How do I find duplicates in the same column?
To do so:
  1. Select the range of cells you wish to test. ... 
  2. On Excel's Home tab, choose Conditional Formatting, Highlight Cells Rules, and then Duplicate Values.
  3. Click OK within the Duplicate Values dialog box to identify the duplicate values.
  4. Duplicate values in the list will now be identified.

 

Edited by geogphotos
Link to comment
Share on other sites

  • 0
41 minutes ago, geogphotos said:

Somehow I have uploaded a few duplicates over the years. 

 

If I download my data from AIM is there an easy way to spot my duplicate files refs in the Excel file without doing it by eye?

 

Some sort of command that operates down a row of data to spot my duplicate file names?

 

Any ideas?

 

1) Sort on the column containing the filenames, so that the duplicates will appear on successive rows

2) Add a new column

3) Fill with equation from row 2 downwards along the lines of
       Cell = IF (Filename in Row1 = Filename in Row2 , "X","")

       In Excel, if the filenames are in Column A, the equation in Row 2 of the new column would look like this =IF(A1=A2,"X","")

       Copy and paste the formula to fill the column and the equation will auto-increment the row numbers for you.

4) This will put an "X" in the rows containing a filenames that is a duplicate of the row before, making them easy to spot

5) You can then sort on new column to get all the duplicates in a condensed list

 

Mark

Edited by M.Chapman
  • Like 2
Link to comment
Share on other sites

  • 0
1 hour ago, M.Chapman said:

 

1) Sort on the column containing the filenames, so that the duplicates will appear on successive rows

2) Add a new column

3) Fill with equation from row 2 downwards along the lines of
       Cell = IF (Filename in Row1 = Filename in Row2 , "X","")

       In Excel, if the filenames are in Column A, the equation in Row 2 of the new column would look like this =IF(A1=A2,"X","")

       Copy and paste the formula to fill the column and the equation will auto-increment the row numbers for you.

4) This will put an "X" in the rows containing a filenames that is a duplicate of the row before, making them easy to spot

5) You can then sort on new column to get all the duplicates in a condensed list

 

Mark

 

more or less what i was going to propose (using 0s and 1s) , but using filter in point 5, instead on re sort 

Link to comment
Share on other sites

  • 0

In Excel i’d say COUNTIF function on the column holding the identifying values. Then a simple filter for results >1. I don’t know Numbers but assume it has a help index and a similar function?

 

=COUNTIF (A:A,A1)  

 

A:A = range, A1 (the value in A1) = criteria

Link to comment
Share on other sites

  • 0
3 hours ago, ACC said:

In Excel i’d say COUNTIF function on the column holding the identifying values. Then a simple filter for results >1. I don’t know Numbers but assume it has a help index and a similar function?

 

=COUNTIF (A:A,A1)  

 

A:A = range, A1 (the value in A1) = criteria

 

That's neat. In Excel I think the range would need the numeric row number references of the first and last row prefixed with a dollar symbol. For example, assuming there are 5000 rows of data with the filename in 1st column (A) and there's 1 header row, these equations will be in in rows 2 and 3.

 

=COUNTIF(A$2:A$5000, A2)

=COUNTIF(A$2:A$5000, A3)

 

The $ prefix will ensure that when cut and paste is used to fill the column with the equation the range reference remains locked.

 

Mark

Edited by M.Chapman
Link to comment
Share on other sites

  • 0

Two good approaches from Mark and Angela (ACC).  I'll agree with Mark's COUNTIF modification.  I've found that using A:A rather than absolute references can occasionally cause problems in Excel (though it does work most of the time).

 

Another alternative is to use a Pivot table to identify duplicates.  You'd only need the column containing the duplicate references and a count column sorted by high to low values to easily pick out the culprits. 

 

Once you've found them in the main list with any of the methods it's then easy to pick up the Alamy ID and search and then delete in AIM.

Link to comment
Share on other sites

  • 0

I'm overwhelmed.

 

Would this be something that Alamy could easily do and then inform those contributors who ask?

 

Or would it be a lot of work?  How easy is easy?

Edited by geogphotos
Link to comment
Share on other sites

  • 0
4 minutes ago, geogphotos said:

I'm overwhelmed.

 

Would this be something that Alamy could easily do and then inform those contributors who ask?

 

Or would it be a lot of work?  How easy is easy?

From a data analytics point of view it’s very easy. I think the question is whether or not Alamy think thats a service they want to provide. If you want some XL type coaching I’m just across the river! I don’t know the Numbers spreadsheet though.

Link to comment
Share on other sites

  • 0
18 minutes ago, ACC said:

From a data analytics point of view it’s very easy. I think the question is whether or not Alamy think thats a service they want to provide. If you want some XL type coaching I’m just across the river! I don’t know the Numbers spreadsheet though.

 

Ian i’ve just done it in Numbers. Different to Excel but I got there, same COUNTIF function. Unfortunately no way to attach a screenshot. Email me if you want to take this offline before Alamy notice we’ve turned this board into an IT helpdesk!  (sorry Alamy)    ange.chalmers@btinternet.com 

 

 

 

 

  • Love 1
Link to comment
Share on other sites

  • 0
25 minutes ago, geogphotos said:

I'm overwhelmed.

 

Would this be something that Alamy could easily do and then inform those contributors who ask?

 

Or would it be a lot of work?  How easy is easy?

Very easy - if you know what you are doing.  5 minutes max.  But I don't see Alamy offering the service unless they're prepared to code it into their download facility.

 

In the meantime if you want to send a copy of the download file to john@tibouchinatraining.co.uk I'll find the 5 minutes over the weekend.

 

 

Link to comment
Share on other sites

  • 0
58 minutes ago, geogphotos said:

Would this be something that Alamy could easily do and then inform those contributors who ask?

 

Certainly for me duplicate filenames do not (necessarily) mean duplicate images, I have a lot of duplicate filenames on Alamy as both my Canon & Fuji have gone round the clock. However I use Lightroom collections for my Alamy uploads so it's relatively easy to identify any that have been uploaded by mistake, and relatively difficult to do so in the first place, though I have a tiny fraction of your port.

Link to comment
Share on other sites

  • 0
11 hours ago, Harry Harrison said:

 

Certainly for me duplicate filenames do not (necessarily) mean duplicate images, I have a lot of duplicate filenames on Alamy as both my Canon & Fuji have gone round the clock. However I use Lightroom collections for my Alamy uploads so it's relatively easy to identify any that have been uploaded by mistake, and relatively difficult to do so in the first place, though I have a tiny fraction of your port.

Same here. I have three or four of some, but inlike we two most here probably don't use OOC filenames unchanged for uploading. LR will change them on export for you.

My only defence against duplicates is working through images when new, and even then it doesn't always work.

Link to comment
Share on other sites

  • 0
19 minutes ago, spacecadet said:

My only defence against duplicates is working through images when new

 

I imagine that I could/should have changed my camera settings to to roll over at 99,999 rather than 9,999 but in practice I don't find it a workflow problem so I've never bothered, a bit late to do so now anyway. On the other hand I believe that the LR/Alamy plugin relies on unique filenames in Lightroom to match up with those on Alamy so that is a limitation as I quite fancy trying it.

 

Lightroom itself doesn't of course treat images with the same filename as a duplicate when importing with the "Don't import suspected duplicates" check-box checked as it looks at other parameters as well, "Lightroom determines a photo is a duplicate of another file in the catalog if it has the same, original filename; the same Exif capture date and time; and the same file size." 

Link to comment
Share on other sites

  • 0
44 minutes ago, Harry Harrison said:

 

I imagine that I could/should have changed my camera settings to to roll over at 99,999 rather than 9,999 but in practice I don't find it a workflow problem so I've never bothered, a bit late to do so now anyway. On the other hand I believe that the LR/Alamy plugin relies on unique filenames in Lightroom to match up with those on Alamy so that is a limitation as I quite fancy trying it.

 

Lightroom itself doesn't of course treat images with the same filename as a duplicate when importing with the "Don't import suspected duplicates" check-box checked as it looks at other parameters as well, "Lightroom determines a photo is a duplicate of another file in the catalog if it has the same, original filename; the same Exif capture date and time; and the same file size." 

 

Lightroom may not get confused with duplicate names but it is possible that you could mess up a backup (manual or automatic) with duplicate filenames. It is not a good idea in general in my opinion. You can use the Rename function in Lightroom to rename a batch with Custom Name - Sequence in a matter of seconds. 

Link to comment
Share on other sites

  • 0
53 minutes ago, MDM said:

 

Lightroom may not get confused with duplicate names but it is possible that you could mess up a backup

Thanks, I take your point but my backup software hasn't had a problem so far fortunately. I did consider adding a suffix to the 'rolled over' images but it wouldn't help with those already up there on Alamy so I think I still couldn't use the LR/Alamy plugin though I may need to read up about that again, I think he does describe some kind of workaround.

 

Incidentally, in Filemaker finding duplicates is very easy, just search with a '!' in the desired field.

 

In fact looking at the manual for Alamy-Lightroom Bridge here I can see that although I have made life difficult for myself I can get around it. From the csv export I will know which dupes I have to contend with and then I simply (?) manually enter the unique Alamy ref into the requisite field in Lightroom using a visual comparison in AIM. Once that is done I can do as you suggest and add a suffix so all my LR filenames are unique and then make sure my cameras don't add more dupes.

Edited by Harry Harrison
Link to comment
Share on other sites

  • 0
2 minutes ago, geogphotos said:

Thanks to all of you for your help and comments.

 

Angela has done the deed and I have been able to quickly delete loads of duplicates.

 

Thank you John for your kind offer to do this for me. 

 

 

No problem, Ian.  And well done Angela.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.