← Back to Projects

AI Parfums Scraper

Production-grade web intelligence system with anti-bot evasion, proxy rotation, and relational data storage.

1,000+
Verified Records
100%
Autonomous Sync
<0.1%
Ingestion Errors

Objective

Build an automated indexing service that continuously extracts product specifications (brand, olfactory notes, concentrations), prices, and historical review scores from perfume catalogs.

Anti-bot evasion

Target platforms use Cloudflare JS challenges and IP profiling. I built a Playwright Stealth system with randomized user agents, mouse motion paths, dynamic rate limiting, and premium proxy node rotation.

Data integrity

Scraped records pass through a validation layer that rejects malformed entries. PostgreSQL constraints prevent duplicates.

CREATE TABLE perfumes (
  id SERIAL PRIMARY KEY,
  brand VARCHAR(100) NOT NULL,
  name VARCHAR(150) NOT NULL,
  concentration VARCHAR(50),
  top_notes TEXT[],
  heart_notes TEXT[],
  base_notes TEXT[],
  price NUMERIC(10, 2),
  scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT unique_profile UNIQUE (brand, name, concentration)
);

Result

1,000+ verified products indexed. The structured dataset is ready for vector embeddings to enable semantic product recommendations based on ingredient notes.

🤖
Denis's AI Agent
online
Ask me about the scraper's evasion techniques!