Case Study · Property Finder UAE

Fixing PropertyFinder's
Listing Quality Problem

A backend deduplication service built in a few hours to detect and reject duplicate property listings before they hit the database.

bash — dedup-service
The Problem

Duplicate listings are PropertyFinder's most complained-about issue

PropertyFinder now hosts well over 400,000 listings across the UAE, and the pace of activity has only accelerated. In Dubai, transaction volumes pushed past 200,000 in 2025, continuing strong double-digit growth. As supply scales, listing quality is under increasing pressure.

Agents frequently re-upload the same unit with minor variations. Titles get reworded, prices shift slightly within a narrow band, and identical photos are reused across entries. The outcome is a flood of near-duplicate listings that clutter search results and dilute relevance.

This goes beyond a surface-level UX issue. It directly impacts trust, distorts agent performance metrics, and undermines PropertyFinder's core value of helping users discover genuine, available homes. App store feedback and user sentiment throughout 2025 consistently highlight duplicate and misleading listings as a primary frustration.

The engineering challenge is to stop the problem at the source. Detection needs to happen at ingestion time, before listings enter the system. A robust multi-signal scoring model is required — one that combines text similarity, image hashing, geospatial proximity, and price variance. It must be precise enough to catch sophisticated duplicates while avoiding false positives on legitimately similar properties within the same building or community.

400,000+

Active listings in UAE

200,000+

Dubai transactions in 2025

#1

User complaint: fake & duplicate listings

The Solution

A multi-signal deduplication pipeline built on NestJS

Every incoming listing passes through a three-stage pipeline before being approved. A normalizer strips noise from the raw data. A scorer computes a weighted composite similarity score against existing listings. An image hasher compares perceptual fingerprints of listing photos. The result is a transparent, auditable decision — not a black box.

Ingestion Queue

RabbitMQ-style job queue

Normalizer

Strip noise, standardize fields

Scorer + Hasher

Levenshtein + Haversine + pHash

Decision Router

Approve / Review / Reject

Architecture

How the pipeline works

Agent CRMManual EntryCSV Import3rd Party APIIngestion QueueRabbitMQ-style job queueDedup EngineNormalizerlowercase, strip,formatSimilarity Scorerfuzzy title + price/ size / geoImage Hasherperceptual hashphoto comparisonDecision Routerscore ≥ 0.8 → duplicate | score 0.6–0.8 → review | score < 0.6 → uniqueFlag & Mergenotify agent, merge or rejectAudit Logevery decision storedApprovedwrite to DB, index for search

Scoring Breakdown

SignalWeightMethodTolerance
Title similarity35%Levenshtein distance ratio
Price proximity25%Numeric decay function±5%
Size proximity20%Numeric decay function±8%
Geo proximity15%Haversine formula200m full score, 0 at 2km
Bedroom count5%Exact match

Score ≥ 0.80Duplicate— listing rejected, agent notified

Score 0.60–0.79Review— listing held for human moderation

Score < 0.60Approved— listing indexed and live

Normalizer Detail

Before normalization

"Amazing 2 Bedroom Apartment
Dubai Marina Sea View
Motivated Seller"

Price: 1,199,999 AED
Size: 1,155 sqft

After normalization

"2br apartment dubai marina sea view"

Price: 1,200,000 AED  (rounded to nearest 1,000)
Size: 1,160 sqft     (rounded to nearest 10)
API

Seven endpoints. That's the whole surface area.

Interactive docs at http://localhost:3000/api (Swagger UI).

Scope of Work

Built in a few hours. Here's everything that went into it.

Dedup Pipeline

Three-stage pipeline — Normalizer, Scorer, Hasher — each as its own injectable NestJS service with a single responsibility.

Weighted Scoring Engine

Five-signal composite score using Levenshtein distance, Haversine geo-proximity formula, numeric decay functions, and perceptual hash comparison.

Job Queue

Custom in-memory queue service simulating RabbitMQ semantics — job tracking, failed job capture, sequential processing with an isProcessing guard.

Input Validation

NestJS ValidationPipe with class-validator DTOs. Whitelist mode enabled. Geo coordinates validated with min/max bounds. Invalid payloads rejected before reaching the pipeline.

💾

State Persistence

PersistenceService saves the full approved index, pending review queue, and all dedup results to data/state.json on shutdown and restores on restart.

👁

Human Review Queue

Listings scoring 0.60–0.79 aren't silently rejected — they're held in a pending queue. Reviewers can PATCH /listings/:id/approve to index or PATCH /listings/:id/reject to drop them.

Code

Clean module boundaries. One job per service.

src/
├── dedup/
│   ├── dedup.types.ts         ← shared interfaces
│   ├── dedup.module.ts
│   ├── dedup.service.ts       ← pipeline orchestrator
│   ├── normalizer.service.ts  ← data cleaning
│   ├── scorer.service.ts      ← similarity scoring
│   └── hasher.service.ts      ← perceptual hash comparison
├── listings/
│   ├── dto/
│   │   └── create-listing.dto.ts  ← validated input
│   ├── listings.controller.ts
│   ├── listings.module.ts
│   └── listings.service.ts
├── queue/
│   └── queue.service.ts       ← job queue with failure tracking
├── persistence/
│   ├── persistence.module.ts
│   └── persistence.service.ts ← state save/restore
└── main.ts
data/
└── seed-listings.json         ← 5 UAE listings pre-loaded

Each module is independently scoped. The DedupModule knows nothing about HTTP or queuing — it only processes listings. The ListingsModule orchestrates the HTTP layer and queue registration. This mirrors how I'd structure this in a real microservices environment.

Production Notes

This is a prototype. Here's what production looks like.

ConcernThis PrototypeProduction
StorageIn-memory arrayPostgreSQL + Elasticsearch
Candidate retrievalO(n) linear scanGeo bounding box pre-filter (~2km radius) reduces 350k → ~200 candidates
QueueCustom in-memory (async pattern implemented, not durable)RabbitMQ / SQS — durable broker, DLQ for failed jobs, horizontally scalable workers
Image hashingFNV hash simulationReal pHash via sharp + DCT (64-bit hash)
AuthNoneAgent JWT + API key per integration
ScalabilitySingle processHorizontally scalable workers consuming from queue
Dedup indexIn-memoryElasticsearch with geo_distance + text similarity queries

Built by Yazan Ali

I built this case study in a few hours to apply for a software engineering role at PropertyFinder. I've never done a case study before, I chose to build one because I wanted to show how I actually think about engineering problems, not just what I've done before.

The problem I picked is real. The architecture is production-informed. The code is clean, tested against lint, and structured the way I'd structure real work.