Skip to content

Find Duplicates

Upload multiple images to detect duplicates and near-duplicates using perceptual hashing (dHash). Groups similar images together, identifies the best quality version in each group, and calculates potential space savings.

API Endpoint

POST /api/v1/tools/find-duplicates

Accepts multipart form data with multiple image files and an optional JSON settings field.

Parameters

ParameterTypeRequiredDefaultDescription
thresholdnumberNo8Maximum Hamming distance to consider images as duplicates (0 to 20). Lower = stricter matching

File Fields

Upload at least 2 image files in the multipart request (all using the file field name or any field name for file parts).

Example Request

bash
curl -X POST http://localhost:1349/api/v1/tools/find-duplicates \
  -H "Authorization: Bearer si_your-api-key" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F 'settings={"threshold": 8}'

Example Response

json
{
  "totalImages": 4,
  "duplicateGroups": [
    {
      "groupId": 1,
      "files": [
        {
          "filename": "photo1.jpg",
          "similarity": 100,
          "width": 4032,
          "height": 3024,
          "fileSize": 2450000,
          "format": "jpeg",
          "isBest": true,
          "thumbnail": "data:image/jpeg;base64,/9j/..."
        },
        {
          "filename": "photo2.jpg",
          "similarity": 96.88,
          "width": 1920,
          "height": 1440,
          "fileSize": 850000,
          "format": "jpeg",
          "isBest": false,
          "thumbnail": "data:image/jpeg;base64,/9j/..."
        }
      ]
    }
  ],
  "uniqueImages": 2,
  "spaceSaveable": 850000,
  "skippedFiles": []
}

Response Fields

FieldTypeDescription
totalImagesnumberNumber of images successfully analyzed
duplicateGroupsarrayGroups of duplicate images
uniqueImagesnumberNumber of images not part of any duplicate group
spaceSaveablenumberTotal bytes that could be saved by removing non-best duplicates
skippedFilesarrayFiles that could not be processed (with filename and reason)

Duplicate Group Object

FieldTypeDescription
groupIdnumberGroup identifier
filesarrayImages in this duplicate group

File Object (within a group)

FieldTypeDescription
filenamestringOriginal filename
similaritynumberSimilarity percentage to the reference image (first in group)
widthnumberImage width in pixels
heightnumberImage height in pixels
fileSizenumberFile size in bytes
formatstringImage format
isBestbooleanWhether this is the highest quality version (most pixels, largest file)
thumbnailstring or nullBase64 JPEG thumbnail (200px wide) for preview

Notes

  • Uses a 128-bit dHash (64-bit row + 64-bit column) for perceptual similarity detection. This catches duplicates even across resizes, recompression, and minor edits.
  • The threshold represents maximum Hamming distance between hashes. Default of 8 catches near-duplicates while avoiding false positives. Use 0 for pixel-identical only, or 15-20 for very loose matching.
  • The "best" image in each group is the one with the most pixels (width x height), with file size as a tiebreaker.
  • At least 2 images are required. Files that fail validation or decoding are reported in skippedFiles rather than causing the entire request to fail.
  • Thumbnails are 200px-wide JPEG previews encoded as data URIs.
  • All common formats are supported (HEIC, RAW, PSD, SVG decoded automatically).