Fixing PropertyFinder's
Listing Quality Problem
A backend deduplication service built in a few hours to detect and reject duplicate property listings before they hit the database.
Duplicate listings are PropertyFinder's most complained-about issue
PropertyFinder now hosts well over 400,000 listings across the UAE, and the pace of activity has only accelerated. In Dubai, transaction volumes pushed past 200,000 in 2025, continuing strong double-digit growth. As supply scales, listing quality is under increasing pressure.
Agents frequently re-upload the same unit with minor variations. Titles get reworded, prices shift slightly within a narrow band, and identical photos are reused across entries. The outcome is a flood of near-duplicate listings that clutter search results and dilute relevance.
This goes beyond a surface-level UX issue. It directly impacts trust, distorts agent performance metrics, and undermines PropertyFinder's core value of helping users discover genuine, available homes. App store feedback and user sentiment throughout 2025 consistently highlight duplicate and misleading listings as a primary frustration.
The engineering challenge is to stop the problem at the source. Detection needs to happen at ingestion time, before listings enter the system. A robust multi-signal scoring model is required — one that combines text similarity, image hashing, geospatial proximity, and price variance. It must be precise enough to catch sophisticated duplicates while avoiding false positives on legitimately similar properties within the same building or community.
400,000+
Active listings in UAE
200,000+
Dubai transactions in 2025
#1
User complaint: fake & duplicate listings
A multi-signal deduplication pipeline built on NestJS
Every incoming listing passes through a three-stage pipeline before being approved. A normalizer strips noise from the raw data. A scorer computes a weighted composite similarity score against existing listings. An image hasher compares perceptual fingerprints of listing photos. The result is a transparent, auditable decision — not a black box.
Ingestion Queue
RabbitMQ-style job queue
Normalizer
Strip noise, standardize fields
Scorer + Hasher
Levenshtein + Haversine + pHash
Decision Router
Approve / Review / Reject
Ingestion Queue
RabbitMQ-style job queue
Normalizer
Strip noise, standardize fields
Scorer + Hasher
Levenshtein + Haversine + pHash
Decision Router
Approve / Review / Reject
How the pipeline works
Scoring Breakdown
| Signal | Weight | Method | Tolerance |
|---|---|---|---|
| Title similarity | 35% | Levenshtein distance ratio | — |
| Price proximity | 25% | Numeric decay function | ±5% |
| Size proximity | 20% | Numeric decay function | ±8% |
| Geo proximity | 15% | Haversine formula | 200m full score, 0 at 2km |
| Bedroom count | 5% | Exact match | — |
Score ≥ 0.80→Duplicate— listing rejected, agent notified
Score 0.60–0.79→Review— listing held for human moderation
Score < 0.60→Approved— listing indexed and live
Normalizer Detail
Before normalization
"Amazing 2 Bedroom Apartment Dubai Marina Sea View Motivated Seller" Price: 1,199,999 AED Size: 1,155 sqft
After normalization
"2br apartment dubai marina sea view" Price: 1,200,000 AED (rounded to nearest 1,000) Size: 1,160 sqft (rounded to nearest 10)
Seven endpoints. That's the whole surface area.
Interactive docs at http://localhost:3000/api (Swagger UI).
Built in a few hours. Here's everything that went into it.
Dedup Pipeline
Three-stage pipeline — Normalizer, Scorer, Hasher — each as its own injectable NestJS service with a single responsibility.
Weighted Scoring Engine
Five-signal composite score using Levenshtein distance, Haversine geo-proximity formula, numeric decay functions, and perceptual hash comparison.
Job Queue
Custom in-memory queue service simulating RabbitMQ semantics — job tracking, failed job capture, sequential processing with an isProcessing guard.
Input Validation
NestJS ValidationPipe with class-validator DTOs. Whitelist mode enabled. Geo coordinates validated with min/max bounds. Invalid payloads rejected before reaching the pipeline.
State Persistence
PersistenceService saves the full approved index, pending review queue, and all dedup results to data/state.json on shutdown and restores on restart.
Human Review Queue
Listings scoring 0.60–0.79 aren't silently rejected — they're held in a pending queue. Reviewers can PATCH /listings/:id/approve to index or PATCH /listings/:id/reject to drop them.
Clean module boundaries. One job per service.
src/ ├── dedup/ │ ├── dedup.types.ts ← shared interfaces │ ├── dedup.module.ts │ ├── dedup.service.ts ← pipeline orchestrator │ ├── normalizer.service.ts ← data cleaning │ ├── scorer.service.ts ← similarity scoring │ └── hasher.service.ts ← perceptual hash comparison ├── listings/ │ ├── dto/ │ │ └── create-listing.dto.ts ← validated input │ ├── listings.controller.ts │ ├── listings.module.ts │ └── listings.service.ts ├── queue/ │ └── queue.service.ts ← job queue with failure tracking ├── persistence/ │ ├── persistence.module.ts │ └── persistence.service.ts ← state save/restore └── main.ts data/ └── seed-listings.json ← 5 UAE listings pre-loaded
Each module is independently scoped. The DedupModule knows nothing about HTTP or queuing — it only processes listings. The ListingsModule orchestrates the HTTP layer and queue registration. This mirrors how I'd structure this in a real microservices environment.
This is a prototype. Here's what production looks like.
| Concern | This Prototype | Production |
|---|---|---|
| Storage | In-memory array | PostgreSQL + Elasticsearch |
| Candidate retrieval | O(n) linear scan | Geo bounding box pre-filter (~2km radius) reduces 350k → ~200 candidates |
| Queue | Custom in-memory (async pattern implemented, not durable) | RabbitMQ / SQS — durable broker, DLQ for failed jobs, horizontally scalable workers |
| Image hashing | FNV hash simulation | Real pHash via sharp + DCT (64-bit hash) |
| Auth | None | Agent JWT + API key per integration |
| Scalability | Single process | Horizontally scalable workers consuming from queue |
| Dedup index | In-memory | Elasticsearch with geo_distance + text similarity queries |
Built by Yazan Ali
I built this case study in a few hours to apply for a software engineering role at PropertyFinder. I've never done a case study before, I chose to build one because I wanted to show how I actually think about engineering problems, not just what I've done before.
The problem I picked is real. The architecture is production-informed. The code is clean, tested against lint, and structured the way I'd structure real work.