Forex Lead Finder — Muraduzzaman

01 · The problem

Manual lead hunting burns a full workday per week.

Target buyer: anyone selling forex signals, courses, or brokerage services. Their bottleneck: finding qualified traders — people with MT4/MT5 bios, trading-pair posts, or telltale hashtags. That audience is scattered across Reddit, TradingView, BabyPips, ForexFactory, and LinkedIn. Existing tools: cover one platform at a time, require constant maintenance, and return unscored data that still needs a human to filter.

02 · The approach

Five scrapers in parallel, scored at ingestion, sales-ready on export.

What ships: five platform-specific Python collectors running concurrently; a 15-point keyword classifier gating every candidate (bio keyword +10, post keyword +10, trading pair +10, MT4/MT5 +5, hashtag +5, threshold 15); SQLite warehouse keyed by profile URL (idempotent re-runs, no duplicates); Flask UI for operator review; REST endpoints for downstream CRM / email tools; APScheduler 6-hour loop; XLSX / CSV / JSON export.

What it replaces: a junior researcher spending ~10 hrs/week scraping names manually (typical market cost in Dhaka: $800–1,200/month; US: $3,500–5,000/month). This engine compresses that to zero human time after initial tuning.

03 · Architecture

How it fits together.

01 / Collectors

Multi-source Python

Five independent scrapers (Reddit, TradingView, BabyPips, ForexFactory, LinkedIn) run in parallel. Playwright handles the JS-heavy ones.

02 / Classifier

15-point scoring

Bio keywords (+10), post keywords (+10), trading pairs (+10), MT4/MT5 (+5), hashtags (+5). ≥15 keeps the lead; below discards it.

03 / Storage

SQLite warehouse

Profiles, posts, emails, websites — all normalised into a single store, keyed by profile URL. Idempotent re-runs, no duplicates.

04 / Delivery

Flask UI + REST

Operators use the Flask UI to browse. API consumers (CRMs, email tools) hit REST endpoints for /api/leads and /api/stats.

04 · The work beneath the surface

Non-obvious decisions.

Rate-limit isolation — each platform has its own throttle config. A spike on one source never triggers a ban on another.
SPA vs static detection — Playwright only boots where the platform requires JS render; the rest run through requests + BS4 for 10× speed.
Classifier calibration — the 15-point threshold was tuned against a labelled seed set; accepts roughly 26% of scraped profiles and rejects 74%.

05 · Stack

What it's built with.

Runtime

Python 3.11 · Flask · APScheduler

Scraping

Playwright · Requests · BeautifulSoup4

Storage

SQLite · XLSX / CSV / JSON export pipelines

Ops

Tkinter GUI launcher for non-dev operators · 6-hour scheduler loop

06 · Takeaway

Why this one matters.

A single operator ships a Python system that replaces the work of two or three manual lead researchers. The product proves the hybrid-SEO-plus-engineering positioning: commercial keyword knowledge (what makes a trader "qualified") fused with the Python chops to automate the capture.

Forex Lead Finder.