<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>The Build Log</title>
        <link>https://paragraph.com/@thebuildlog</link>
        <description>undefined</description>
        <lastBuildDate>Sun, 24 May 2026 19:27:30 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>All rights reserved</copyright>
        <item>
            <title><![CDATA[From 20,000 Wedding Photos to One Clean Album: My Journey Building an Image Deduper]]></title>
            <link>https://paragraph.com/@thebuildlog/from-20000-wedding-photos-to-one-clean-album-my-journey-building-an-image-deduper</link>
            <guid>KsyKQXej2HW3YfInIDC8</guid>
            <pubDate>Sat, 29 Nov 2025 13:03:54 GMT</pubDate>
            <description><![CDATA[It all started during my cousin’s wedding. The photographers took around twenty thousand photos, and after the event, they handed everything over for my cousin and her husband to sort through. They were supposed to pick the best photos and send them back. You can imagine how overwhelming that sounded. I was chatting with my cousin’s husband about it, and that’s when I mentioned how software could probably automate this kind of task. He told me there were tools for it, but most of them were ex...]]></description>
            <content:encoded><![CDATA[<p>It all started during my cousin’s wedding. The photographers took around twenty thousand photos, and after the event, they handed everything over for my cousin and her husband to sort through. They were supposed to pick the best photos and send them back. You can imagine how overwhelming that sounded.</p><p>I was chatting with my cousin’s husband about it, and that’s when I mentioned how software could probably automate this kind of task. He told me there were tools for it, but most of them were expensive. That got me thinking. Why not try building one myself? It would be a useful project, and it’d make for a good story to share during interviews too.</p><h2 id="h-starting-the-project" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Starting the project</strong></h2><p>I asked my cousin to share the wedding photos with me, and she sent me about 60GB worth of images. As I went through them, I realized what the real problem was. The photographer had taken several shots of the same moment, often within seconds of each other. My goal became clear: group similar photos together and automatically select the best one from each group.</p><p>So I had two main tasks:</p><ol><li><p>Group similar photos together</p></li><li><p>Score the photos and pick the best one</p></li></ol><h2 id="h-figuring-out-the-approach" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Figuring out the approach</strong></h2><p>This was around the time I had started learning about vector embeddings. I realized I could use them to represent images numerically and compare how similar they were. After doing some research, I landed on the ResNet-50 model to convert images into embeddings.</p><p>For finding similar images, I went with cosine similarity. Through some testing, I found that a similarity threshold of around 0.87 worked best for my use case.</p><p>The process was simple in theory. I’d take a directory of images, loop through them to generate embeddings, and store them. Then, I’d loop again, compare images using cosine similarity, and when I found matches, I’d move those images into a separate folder. Each time I grouped similar images, I also removed their vectors from the vector database so the same images wouldn’t be processed again.</p><h2 id="h-scoring-the-best-photo" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Scoring the best photo</strong></h2><p>Once I had groups of similar photos, the next step was to pick the best one from each group. While researching, I found this GitHub repo:<br><a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://github.com/shunk031/simple-aesthetics-predictor">https://github.com/shunk031/simple-aesthetics-predictor</a></p><p>It uses a pre-trained model from Hugging Face that scores images based on their aesthetics. I used that model to score all the images in each group, then kept the one with the highest score. The best ones were moved to a new output directory.</p><p>After everything was done, the app compressed the output folder and generated a download link for the user. Once the download finished, all temporary files were deleted immediately to protect user privacy and free up space.</p><h2 id="h-hosting-and-making-it-public" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Hosting and making it public</strong></h2><p>At first, I wanted to host this project on Render, but that didn’t work out well because of resource limitations. So I turned to Hugging Face Spaces for the backend and quickly built a simple frontend using Vite. Together, it worked smoothly.</p><p>You can check it out here:<br>Live site: <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://imagededuper.netlify.app/￼Backend"><u>https://imagededuper.netlify.app/</u><br></a>Backend:  <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://github.com/basilbenny1002/ImageDeduper-backend￼Frontend"><u>https://github.com/basilbenny1002/ImageDeduper-backend</u><br></a>Frontend: <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://github.com/basilbenny1002/image-selector-front-end￼Base"><u>https://github.com/basilbenny1002/image-selector-front-end</u><br></a>Base script: <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://github.com/basilbenny1002/Image-Selecter"><u>https://github.com/basilbenny1002/Image-Selecter</u></a></p><h2 id="h-final-thoughts" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Final thoughts</strong></h2><p>This project started as a random idea during a wedding conversation, but it turned out to be one of my favorite builds. It taught me a lot about embeddings, similarity search, and model deployment, and it also showed how a simple real-life problem can turn into a fun and practical project.</p><p><br></p>]]></content:encoded>
            <author>thebuildlog@newsletter.paragraph.com (The Build Log)</author>
            <category>python</category>
            <category>ai</category>
            <category>computervision</category>
            <category>computer</category>
            <category>vision</category>
            <category>opencv</category>
            <category>imagededuper</category>
            <category>aitools</category>
        </item>
    </channel>
</rss>