Jumia Product Scraper

Advanced Selenium-Based Web Scraping Solution with MongoDB Integration

Python Selenium MongoDB Firefox Data Analytics

Project Overview

This project implements a sophisticated automated web scraper for Jumia, Morocco's leading e-commerce platform. Built with Selenium WebDriver and Python, it extracts structured product data from multiple categories with precise pricing analysis, smart discount calculations, and intelligent update handling. The solution features robust popup management, efficient pagination navigation, and seamless MongoDB integration for scalable data storage and analysis.

Extracted Data Fields

product_title

Complete product name extracted from listing pages

product_url

Direct link to the product detail page for reference

current_price

Current listed price with currency normalization

old_price

Original price before discount application (if available)

discount_percentage

Calculated percentage discount for promotional analysis

discount_quantity

Absolute discount value in local currency

inserted_at

Timestamp of initial data insertion for tracking

updated_at

Timestamp of most recent update for freshness tracking

published_at

Boolean field for custom publication workflow logic

Core Features & Implementation

Smart Browser Automation

Selenium WebDriver with configurable Firefox profiles, headless mode support, and optimized performance settings

Popup Management

Intelligent detection and handling of promotional popups and overlay advertisements

Efficient Navigation

Smart category traversal and paginated result processing with configurable depth limits

Robust Price Parsing

Advanced regex-based price extraction handling multiple currency formats and localization

Data Validation

Comprehensive deduplication logic and data integrity checks with MongoDB queries

Price Update Tracking

Automatic price change detection with historical preservation and timestamp management

Modular Architecture

Clean separation of concerns: configuration, scraping logic, parsing utilities, and database operations

MongoDB Integration

Cloud-ready database storage with MongoDB Atlas, optimized queries, and efficient data operations

Scraping Results & Performance

The scraper efficiently targets high-traffic categories including Phones & Tablets and Electronics, processing up to 5 pages per category. The system demonstrates excellent performance with minimal redundancy and robust update handling.

2
Target Categories
5
Pages per Category
1000+
Products Extracted
9
Data Fields per Product

Technical Skills Demonstrated

Advanced Selenium Scripting
HTML Parsing & DOM Navigation
Data Normalization
Automated Browser Control
Clean, Modular Python Code
MongoDB Integration
Error-Tolerant Execution
Performance Optimization

MongoDB Atlas Integration

Connection String:
mongodb+srv://username:password@cluster.mongodb.net/

Collection: jumia.products
Operations: Insert, Update, Query, Deduplication