Case Study · Property Finder UAE

Fixing PropertyFinder's
Listing Quality Problem

A backend deduplication service built in a few hours to detect and reject duplicate property listings before they hit the database.

View on GitHub Jump to Architecture

bash — dedup-service

The Problem

Duplicate listings are PropertyFinder's most complained-about issue

PropertyFinder now hosts well over 400,000 listings across the UAE, and the pace of activity has only accelerated. In Dubai, transaction volumes pushed past 200,000 in 2025, continuing strong double-digit growth. As supply scales, listing quality is under increasing pressure.

Agents frequently re-upload the same unit with minor variations. Titles get reworded, prices shift slightly within a narrow band, and identical photos are reused across entries. The outcome is a flood of near-duplicate listings that clutter search results and dilute relevance.

This goes beyond a surface-level UX issue. It directly impacts trust, distorts agent performance metrics, and undermines PropertyFinder's core value of helping users discover genuine, available homes. App store feedback and user sentiment throughout 2025 consistently highlight duplicate and misleading listings as a primary frustration.

The engineering challenge is to stop the problem at the source. Detection needs to happen at ingestion time, before listings enter the system. A robust multi-signal scoring model is required — one that combines text similarity, image hashing, geospatial proximity, and price variance. It must be precise enough to catch sophisticated duplicates while avoiding false positives on legitimately similar properties within the same building or community.

400,000+

Active listings in UAE

200,000+

Dubai transactions in 2025

User complaint: fake & duplicate listings

The Solution

A multi-signal deduplication pipeline built on NestJS

Every incoming listing passes through a three-stage pipeline before being approved. A normalizer strips noise from the raw data. A scorer computes a weighted composite similarity score against existing listings. An image hasher compares perceptual fingerprints of listing photos. The result is a transparent, auditable decision — not a black box.

▤

Ingestion Queue

RabbitMQ-style job queue

⊟

Normalizer

Strip noise, standardize fields

⚖

Scorer + Hasher

Levenshtein + Haversine + pHash

⑂

Decision Router

Approve / Review / Reject

▤

Ingestion Queue

RabbitMQ-style job queue

⊟

Normalizer

Strip noise, standardize fields

⚖

Scorer + Hasher

Levenshtein + Haversine + pHash

⑂

Decision Router

Approve / Review / Reject

Architecture

How the pipeline works

Scoring Breakdown

Signal	Weight	Method	Tolerance
Title similarity	35%	Levenshtein distance ratio	—
Price proximity	25%	Numeric decay function	±5%
Size proximity	20%	Numeric decay function	±8%
Geo proximity	15%	Haversine formula	200m full score, 0 at 2km
Bedroom count	5%	Exact match	—

Score ≥ 0.80→Duplicate— listing rejected, agent notified

Score 0.60–0.79→Review— listing held for human moderation

Score < 0.60→Approved— listing indexed and live

Normalizer Detail

Before normalization

"Amazing 2 Bedroom Apartment
Dubai Marina Sea View
Motivated Seller"

Price: 1,199,999 AED
Size: 1,155 sqft

→

After normalization

"2br apartment dubai marina sea view"

Price: 1,200,000 AED  (rounded to nearest 1,000)
Size: 1,160 sqft     (rounded to nearest 10)

API

Seven endpoints. That's the whole surface area.

Interactive docs at http://localhost:3000/api (Swagger UI).

Scope of Work

Built in a few hours. Here's everything that went into it.

⚙

Dedup Pipeline

Three-stage pipeline — Normalizer, Scorer, Hasher — each as its own injectable NestJS service with a single responsibility.

⚖

Weighted Scoring Engine

Five-signal composite score using Levenshtein distance, Haversine geo-proximity formula, numeric decay functions, and perceptual hash comparison.

⬡

Job Queue

Custom in-memory queue service simulating RabbitMQ semantics — job tracking, failed job capture, sequential processing with an isProcessing guard.

✓

Input Validation

NestJS ValidationPipe with class-validator DTOs. Whitelist mode enabled. Geo coordinates validated with min/max bounds. Invalid payloads rejected before reaching the pipeline.

💾

State Persistence

PersistenceService saves the full approved index, pending review queue, and all dedup results to data/state.json on shutdown and restores on restart.

👁

Human Review Queue

Listings scoring 0.60–0.79 aren't silently rejected — they're held in a pending queue. Reviewers can PATCH /listings/:id/approve to index or PATCH /listings/:id/reject to drop them.

Code

Clean module boundaries. One job per service.

src/
├── dedup/
│   ├── dedup.types.ts         ← shared interfaces
│   ├── dedup.module.ts
│   ├── dedup.service.ts       ← pipeline orchestrator
│   ├── normalizer.service.ts  ← data cleaning
│   ├── scorer.service.ts      ← similarity scoring
│   └── hasher.service.ts      ← perceptual hash comparison
├── listings/
│   ├── dto/
│   │   └── create-listing.dto.ts  ← validated input
│   ├── listings.controller.ts
│   ├── listings.module.ts
│   └── listings.service.ts
├── queue/
│   └── queue.service.ts       ← job queue with failure tracking
├── persistence/
│   ├── persistence.module.ts
│   └── persistence.service.ts ← state save/restore
└── main.ts
data/
└── seed-listings.json         ← 5 UAE listings pre-loaded

Each module is independently scoped. The DedupModule knows nothing about HTTP or queuing — it only processes listings. The ListingsModule orchestrates the HTTP layer and queue registration. This mirrors how I'd structure this in a real microservices environment.

Production Notes

This is a prototype. Here's what production looks like.

Concern	This Prototype	Production
Storage	In-memory array	PostgreSQL + Elasticsearch
Candidate retrieval	O(n) linear scan	Geo bounding box pre-filter (~2km radius) reduces 350k → ~200 candidates
Queue	Custom in-memory (async pattern implemented, not durable)	RabbitMQ / SQS — durable broker, DLQ for failed jobs, horizontally scalable workers
Image hashing	FNV hash simulation	Real pHash via sharp + DCT (64-bit hash)
Auth	None	Agent JWT + API key per integration
Scalability	Single process	Horizontally scalable workers consuming from queue
Dedup index	In-memory	Elasticsearch with geo_distance + text similarity queries

Built by Yazan Ali

I built this case study in a few hours to apply for a software engineering role at PropertyFinder. I've never done a case study before, I chose to build one because I wanted to show how I actually think about engineering problems, not just what I've done before.

The problem I picked is real. The architecture is production-informed. The code is clean, tested against lint, and structured the way I'd structure real work.

View the GitHub Repo →Connect on LinkedIn →

Fixing PropertyFinder'sListing Quality Problem