From 20,000 Wedding Photos to One Clean Album: My Journey Building an Image Deduper

thebuildlog@newsletter.paragraph.com (The Build Log) — Sat, 29 Nov 2025 13:03:54 GMT

It all started during my cousin’s wedding. The photographers took around twenty thousand photos, and after the event, they handed everything over for my cousin and her husband to sort through. They were supposed to pick the best photos and send them back. You can imagine how overwhelming that sounded.

I was chatting with my cousin’s husband about it, and that’s when I mentioned how software could probably automate this kind of task. He told me there were tools for it, but most of them were expensive. That got me thinking. Why not try building one myself? It would be a useful project, and it’d make for a good story to share during interviews too.

Starting the project

I asked my cousin to share the wedding photos with me, and she sent me about 60GB worth of images. As I went through them, I realized what the real problem was. The photographer had taken several shots of the same moment, often within seconds of each other. My goal became clear: group similar photos together and automatically select the best one from each group.

So I had two main tasks:

Group similar photos together
Score the photos and pick the best one

Figuring out the approach

This was around the time I had started learning about vector embeddings. I realized I could use them to represent images numerically and compare how similar they were. After doing some research, I landed on the ResNet-50 model to convert images into embeddings.

For finding similar images, I went with cosine similarity. Through some testing, I found that a similarity threshold of around 0.87 worked best for my use case.

The process was simple in theory. I’d take a directory of images, loop through them to generate embeddings, and store them. Then, I’d loop again, compare images using cosine similarity, and when I found matches, I’d move those images into a separate folder. Each time I grouped similar images, I also removed their vectors from the vector database so the same images wouldn’t be processed again.

Scoring the best photo

Once I had groups of similar photos, the next step was to pick the best one from each group. While researching, I found this GitHub repo:
https://github.com/shunk031/simple-aesthetics-predictor

It uses a pre-trained model from Hugging Face that scores images based on their aesthetics. I used that model to score all the images in each group, then kept the one with the highest score. The best ones were moved to a new output directory.

After everything was done, the app compressed the output folder and generated a download link for the user. Once the download finished, all temporary files were deleted immediately to protect user privacy and free up space.

Hosting and making it public

At first, I wanted to host this project on Render, but that didn’t work out well because of resource limitations. So I turned to Hugging Face Spaces for the backend and quickly built a simple frontend using Vite. Together, it worked smoothly.

You can check it out here:
Live site: https://imagededuper.netlify.app/
Backend: https://github.com/basilbenny1002/ImageDeduper-backend
Frontend: https://github.com/basilbenny1002/image-selector-front-end
Base script: https://github.com/basilbenny1002/Image-Selecter

Final thoughts

This project started as a random idea during a wedding conversation, but it turned out to be one of my favorite builds. It taught me a lot about embeddings, similarity search, and model deployment, and it also showed how a simple real-life problem can turn into a fun and practical project.